• No results found

Stress-induced reinforcement learning : the influence of Cortisol

N/A
N/A
Protected

Academic year: 2021

Share "Stress-induced reinforcement learning : the influence of Cortisol"

Copied!
55
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Stress-Induced Reinforcement Learning:

the Influence of Cortisol

Jovanka S. H. Hoeboer

University: Universiteit van Amsterdam Student number: 10010866

(2)

2

Abstract

Stress has widely been linked to the development of addiction (e.g. Uhart & Wand, 2009). The stress hormone cortisol may play a role in this, by mediating activity of the brain’s reward system (Oei, Both, van Heemst, van der Grond, 2014). Furthermore, stress seems to alter reinforcement learning through the effect of cortisol, although there has not yet been a human study investigating this. Therefore, in the current study it was tested whether cortisol mediated the effect of stress on reinforcement learning. 114 healthy, young males were randomly assigned to either the Trier Social Stress Test (TSST) or a control condition, after which they performed a probabilistic reinforcement learning task. Cortisol, blood pressure, heart rate and subjective stress were measured at baseline, after speech preparation, after presentation and after training. A moderated mediation analysis revealed that stress did not have a direct effect on reinforcement learning. Furthermore, cortisol negatively mediated the effect of stress on reward learning, but cortisol did not mediate the effect of stress on punishment avoidance learning. Overall, individuals with a high cortisol response to stress showed worse reward learning than those with a low cortisol response to stress. Combined with findings from previous studies (e.g. Kim, Yoon, Kim & Hamann, 2015) it seems as if positive and negative reinforcement learning involve two different mechanisms. A high cortisol response to stress may serve as a protective factor against the development of addiction, yet act as a risk factor for depressive symptoms. Future studies should attempt to gain clarity in reinforcement terminology and PET studies are advised in order to draw solid conclusions.

(3)

3 Contents Abstract ... 2 Introduction ... 4 Method ... 10 Participants ... 10 Materials ... 12 Stress manipulation. ... 12

Probabilistic reinforcement learning task. ... 12

Stress measures. ... 14

Procedure ... 15

Data analysis ... 17

Stress ... 17

Reinforcement learning task ... 17

Mediation analysis ... 18

Stimulus selection after feedback in the training phase ... 18

Results ... 20 Stress measures ... 20 Cortisol... 20 Blood pressure. ... 21 Heart rate. ... 23 Subjective stress. ... 23

Reinforcement learning task ... 25

Mediation analysis ... 25

Stimulus selection after feedback in the learning phase ... 27

Conclusion and Discussion ... 31

(4)

4

Introduction

Stress is widely known as a risk factor for becoming addicted (Uhart & Wand, 2009; Kosten, 2011; Herman, Stinus & Le Moal, 1984) and for drug relapse (Brown, Vik, McQuiad, Patterson, Irwin & Grant, 1990; Brown, Vik, Patterson, Grant & Schuckit, 1995; Ouimette, Coolhart, Funderburk, Wade & Brown, 2007). Chronic as well as acute stressors have been shown to be related to increased addiction vulnerability (Teusch, 2001; Dewart, Frank, Schmeidler, 2006) and studies on patients with disorders related to stressful life events, such as anxiety disorders, show a higher addiction rate (Ford et al., 2009). However, not everyone who is exposed to stress eventually becomes addicted. The question arises as to what determines whether someone under stress develops an addiction. Animal studies suggest that cortisol and dopamine (DA) levels after stress may play a role in vulnerability to addiction by activating the brain’s reward system (Campioni, Xu, & McGehee, 2009; Rougé-Pont, Marinelli, Le Moal, Simon & Piazza, 1995), but so far there has not been a human study investigating the possible causal relation between stress, cortisol, DA and reward related behavior or learning.

Stress can be defined as a response to a negative or aversive stimulus (Cannon, 1937). There are two types of bodily responses to stress: a fast physiological response and a slower hormonal response. The fast physiological response is seen as an ‘emergency reaction’ and includes sweating, heart racing, palpitations, acceleration of lung action, dilation of pupils and shaking. These physiological responses are regulated by the autonomic nervous system, through catecholamine hormones such as adrenaline and noradrenaline and occur in order to prepare the body for fighting or running. The slower hormonal response to stress releases several hormones, such as cortisol and DA (Sapolsky, Romero & Munck, 2000).

It can be argued that the individual response to stress may influence the way people respond to reward. Positron emission tomography (PET) studies have shown that stress induces DA release in the nucleus accumbens (NAcc) - the key region in the brain’s reward system, located in the

(5)

5 ventral striatum (Pruessner, Champagne, Meaney, & Dagher, 2004; Wand et al., 2007). With functional magnetic resonance imaging (fMRI) it was found that the NAcc responds to rewarding stimuli (Childress et al., 2008; Gillath and Canterberry, 2012; Oei, Rombouts, Soeter, van Gerven & Both, 2012). The response of the NAcc to rewarding stimuli is modulated by DA: when DA levels are increased, observing a reward is related to increased NAcc activation and when DA levels are decreased, reward is related to decreased NAcc activation (Oei et al., 2012). Increased activation in the NAcc in response to stress thus seems thought to go hand in hand with increased DA levels. Since stress seems to activate the NAcc through DA, individual responses to stress could have different effects on the way people respond to reward.

Elaborating on this, according to the incentive-sensitization theory of addiction of Robinson and Berridge (1993; 2000), DA is thought to increase the ‘incentive salience’ of rewarding stimuli. Incentive salience refers to the change from a stimulus with mere informational value into a stimulus that attracts attention and causes ‘wanting’. ‘Wanting’ is defined as the unconscious motivation for the pursuit of rewards. Increased levels of DA are necessary for attributing incentive salience to the rewarding stimulus and thus for causing ‘wanting’ (Robinson and Berridge, 1993; 2000). According to Robinson and Berridge (1993; 2000), addictive drugs sensitize brain mechanisms that attribute incentive salience to reward, resulting in a further increase of the incentive salience of this substance which could eventually cause compulsive ‘wanting’ in addiction. Thus, stress might increase the incentive salience of rewarding stimuli, through NAcc-DA elevations.

Apart from the suggested role of DA in incentive salience, there are indications that heightened DA leads to altered reinforcement learning (Bódi et al., 2009). Reinforcement learning is defined as learning by processing positive or negative feedback, i.e. learning from reward (reward learning) or punishment (punishment avoidance learning) (Cavanagh, Frank & Allen, 2011). Reward learning is often measured as the accuracy on tasks that involve the processing of feedback

(6)

6 indicating a ‘correct’ response and punishment avoidance learning is often measured as accuracy on tasks that involve the processing of feedback indicating an ‘incorrect’ response (Cavanagh et al., 2011). For example, Bódi and colleagues (2009) investigated the influence of DA on reward and punishment processing using a feedback-based probabilistic classification task in Parkinson’s disease patients receiving DA receptor agonists (L-Dopa), patients receiving no medication and healthy participants. Participants were shown images of shapes that represented nothing and then had to choose to which category (A or B) the image belonged. Whether a stimulus belonged to A or B was probabilistic. To indicate a correct answer points were given (reward) and to indicate an incorrect answer points were taken away (punishment). Bódi and colleagues (2009) found that higher DA (due to medication) led to better reward learning1, while lowered DA (in non-medicated Parkinson patients) led to better punishment avoidance learning2. Furthermore, in an instrumental learning task with monetary gains and losses it was found that healthy individuals receiving DA-enhancing medication (L-Dopa) showed a greater tendency to choose the most rewarding action than those receiving DA-reducing medication (haloperidol) (Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006).

Although PET studies suggest increased levels of DA after stress (Pruessner et al., 2004; Wand et al., 2007) and increased DA-levels might lead to better reward learning (Bódi et al., 2009), only one study found better reward learning after stress (Lighthall, Gorlick, Schoeke, Frank, & Mather, 2013). In another study researchers created groups based on the Behavioral Inhibition Scale (BIS) (Cavanagh et al., 2011). They found that stress led to better reward learning and worse punishment avoidance learning in people with low trait-level punishment sensitivity and the opposite in people with high punishment sensitivity. These results are, however, hardly comparable to results from other reinforcement learning studies investigating stress effects, since groups were

1 Compared to non-medicated Parkinson patients. 2 Compared to recently medicated Parkinson patients.

(7)

7 created using the BIS. On the other hand, several studies found no effect on reward learning at all (Berghorst, Bogdan, Frank, & Pizzagalli, 2013; Petzold, Plessow, Goschke & Kirschbaum, 2010) or worse reward learning after stress (Bogdan & Pizzagalli, 2006). Since elevated DA levels might lead to better reward learning, it seems as if stress in these studies did not lead to increased DA levels.

One possible explanation for the unexpected decrease of NAcc activation and reward learning after stress in these studies, could be the individual differences in the stress hormone cortisol. As mentioned, PET studies show increased levels of NAcc-DA after stress (Pruessner et al., 2004; Wand et al., 2007), but most fMRI studies show a decrease of NAcc activation after stress (Oei, Both, van Heemst & van der Grond, 2014) or no effect of stress on NAcc activation (Ossewaarde, Qin, van Marle, van Wingen, Fernandez & Hermans, 2011; Porcelli, Lewis, & Delgado, 2012). In animals, higher NAcc-DA levels are associated with higher levels of the stress hormone cortisol (Campioni et al., 2009; Rougé-Pont et al., 1995). Oei, Both, van Heemst & van der Grond (2014) examined the relation between stress, individual cortisol levels and NAcc-activation to subliminally presented reward stimuli, using moderated mediation analysis. They found that when stress lead to high levels of cortisol, the NAcc was more active, and when stress lead to less elevated cortisol levels, the NAcc was less active in response to rewarding stimuli. Accordingly, in the PET-studies finding an increase of NAcc-DA-release after stress (Pruessner et al., 2004; Wand et al., 2007), NAcc-DA-secretion was significantly related to cortisol increase. Since laboratory stressors do not always lead to increases in cortisol levels and there is high individual variability in the cortisol response to stress (Dickerson and Kemeny, 2004), the type of stressor that was used in studies possibly influenced DA results. A PET-study that found no NAcc-DA increase, used a mental arithmetic task as a stressor that did not elicit a cortisol response in the stress condition (Montgomery, Mehta & Grasby, 2006). The fMRI studies that found no or lower NAcc activation in response to stress used stressors that caused no or only mild cortisol increases,

(8)

8 such as aversive movie clips (Ossewaarde et al., 2011) or a cold pressor task (Porcelli et al., 2012). Altogether, it seems that individuals with a low cortisol response to stress show lower NAcc-DA and lower NAcc activation, whereas individuals who respond to stress with high cortisol levels show higher NAcc-DA and higher NAcc activation. Since higher NAcc-DA is argued to lead to better reward learning and lower NAcc-DA to better punishment avoidance learning, the cortisol response to stress might influence the direction of reinforcement learning.

Deriving from this, it could be expected that in studies finding worse reward learning or no effect on reward learning, cortisol levels were not taken into account. In line with this assumption, studies reporting worse reward learning or no effect of stress on reward learning accuracy used a threat-of-shock stressor associated with no cortisol differences (Berghorst et al., 2013; Bogdan et al., 2006) (but see Petzold et al., 2010). A study investigating the effects of cortisol medication (40 mg) on reward-driven behavior further showed that cortisol increased risky decision making when a big reward was possibly provided, suggesting that cortisol promotes behavior towards reward (Putman, Antypa, Crysovergi, & van der Does, 2010). Concerning punishment avoidance learning, no effect of stress was found using a threat-of-shock stressor with no cortisol differences (Berghorst et al., 2013). A study using the Trier Social Stress Test (TSST) - known for its high cortisol responses (Dickerson & Kemeny, 2004) - found worse punishment avoidance learning after stress (Petzold et al., 2010) and a trend toward a significant negative correlation with cortisol before learning (r = -.28, p = .06). Altogether, research suggests that the individual cortisol response to stress might play a key role in the direction of reinforcement learning; whether one will learn more from reward learning or from punishment avoidance learning.

To investigate this suggested mediating role of cortisol, in the current study one group will be mildly stressed using a stressor known to elicit high cortisol responses, whereas the other group will be put under control conditions. Based on the fact that stress studies often found no effect of reinforcement learning when cortisol levels were not taken into account (Berghorst et al., 2013;

(9)

9 Petzold et al., 2010), it is expected that stress will have no direct effect on reward learning and no direct effect on punishment avoidance learning. If a direct effect of stress on reward learning is found, the effect is expected to be negative (based on Bogdan & Pizzagalli, 2006). Concerning the mediation it is expected that (1) individuals with a high cortisol response to stress will learn more from reward learning (positive reinforcement), while (2) individuals with a low cortisol response to stress will learn more from punishment avoidance learning (negative reinforcement).

To gain further insight in the process of learning from reward and punishment, the tendency to select the same or alternative symbol after positive and negative feedback will be analyzed exploratory. According to Lighthall and colleagues (2013) this provides information about sensitivity to feedback.

(10)

10

Method Participants

Healthy, male volunteers (n = 131) in the age of 18 to 24 were recruited through advertisement and by approaching potential candidates in public places. Inclusion of participants was determined prior to participation, through screening over the phone. Inclusion criteria were right-handedness, with no color-blindness, no current or past psychiatric problems, no current or past medical problems, no use of (corticosteroid and psychopharmacological) medication and a Body Mass Index (BMI) between 19 and 25. Furthermore, excessive drug- or alcohol usage was reason for exclusion: participants were not included if they drank more than 20 alcoholic units per week, used hard drugs more than one time per month or used soft drugs more than one time per week. All participants gave informed consent and the study was approved by the Ethics Committee of the University of Amsterdam (project: 2014-DP-3649). Participants were randomly assigned to either the stress condition (n = 67) or control condition (n = 64). Data from three participant was excluded because of a cyst in the brain, excessively high cortisol levels and a malfunction of equipment during testing. Furthermore, tenparticipants did not reach performance criteria on the reinforcement learning task (for a more detailed description, see “probabilistic reinforcement learning task” under data analysis); data from these participants was also excluded from analyses. Finally, six participants were excluded from analysis due to their scores on the Symptom CheckList – 90 (above 1833) and two on the Beck Depression Inventory (above 294). Because some participants were already excluded due to the 50% criterion, the questionnaires excluded four extra participants. In total, 114 participants were included in the analyses: 61 in the stress condition and 53 in the control condition. There were no group differences inthe level of depression, as measured with the Beck Depression Inventory (BDI); the level of psychological distress, as measured with

3 Arrindell & Ettema, 2003

(11)

11 the Symptom CheckList-90 (SCL-90); the level of trait-anxiety, as measures with the State-Trait Anxiety Inventory (STAI); the use of motivational systems, as measured with the Behavioral Inhibition Scales and Behavioral Activation Scales (BIS/BAS); perceived stress, as measured with the Perceived Stress Scale (PSS) and porn craving, as measured with the Porn Craving Questionnaire (PCQ). Comparison between the two groups on alcohol use however, measured with the Alcohol Use Disorder Identification Test (AUDIT), revealed a trend, p = .057, with a higher mean in the stress group. For means and standard deviations of both groups, see table 1. There were no group differences in age, p = .342 (Mcontrol =20.60; SDcontrol = 1.72) (Mstress = 20.31;

SDstress = 1.54). Participants were paid 50 euros for participation.

Table 1

Means and Standard Deviations of Questionnaires for Control Group and Stress Group and Corresponding p values.

Control group Stress group

Questionnaire M ± SD M ± SD p-value AUDIT 9.81 ± 4.41 11.46 ± 4.70 .057 CUDIT 10.69 ± 3.02 10.88 ± 3.86 .780 BDI 5.53 ± 3.98 6.52 ± 4.60 .223 SCL-90 120.43 ± 17.47 126.28 ± 21.46 .117 STAI 34.13 ± 6.67 33.57 ± 7.43 .676 BIS 15.28 ± 2.50 15.51 ± 3.06 .671 BAS-R 8.04 ± 1.75 8.16 ± 1.69 .696 BAS-F 7.09 ± 2.03 6.72 ± 1.59 .275 BAS-D 6.62 ± 2.25 6.92 ± 2.11 .471 PSS 12.89 ± 5.37 12.75 ± 4.73 .889 PCQ 29.38 ± 15.73 31.00 ± 13.31 .552

Note: AUDIT = Alcohol Use Disorder Identification Test, CUDIT = Cannabis Use Disorder

Identification Test, BDI = Beck Depression Inventory, SCL-90 = Symptom CheckList-90, STAI = State-Trait Anxiety Inventory, BIS = Behavioral Inhibition Scales, BAS = Behavioral

Activation Scales (R = reward responsiveness, F = fun seeking, D = drive) and PSS = Perceived Stress Scales.

(12)

12

Materials

Stress manipulation. The TSST was used to elicit stress (Kirschbaum, Pirke &

Hellhammer, 1993). The TSST is a standardized protocol to evoke moderate social stress in laboratory settings (Kirschbaum et al., 1993). It is one of the most effective tests in combining uncontrollability and high social-evaluative threat. Tasks that combine these two elements provoke the largest cortisol responses and the longest time to recover (Dickerson & Kemeny, 2004).

The test in total took 20 minutes. First, the participant was asked to prepare a 5 minute presentation for a job interview in front of a committee of three specialists, trained to detect non-verbal signs of stress. The participant was told he was going to be recorded by camera and voice recorder. After this instruction, the participant was given 10 minutes of preparation time, in which he was allowed to outline his speech on paper, but he was not allowed to bring these notes to the presentation. After preparation, the participant entered a room with the three committee members, the camera and recorder. At this point, the participant was asked to start his presentation. After the 5 minute presentation, the committee told the participant to count back from 1033 to zero in steps of 13. When the participant made a mistake, one of the members of the committee said “wrong, start over at 1033”. After 5 minutes, the committee told the participant to stop and the members left the room.

In the control condition, the participant was asked to prepare a 5 minute presentation about a book or movie of their own choice, this time in a room without audience. The preparation time was again 10 minutes. After the presentation, the participant was asked to count back from 50 to zero for five minutes and start over at 50 every time they got to zero (Het, Rohleder, Schoofs, Kirschbaum and Wolf, 2009).

Probabilistic reinforcement learning task. To measure how well participants learned

(13)

13 selection task was used (Frank, Seeberger, O’Reilly, 2004). This task consisted of two phases: a training phase and a testing phase. In the training phase of this task, a jitter was shown for a random duration of 400 to 2000 ms, after which three different pairs of stimuli (AB, CD, EF) were shown a total amount of 360 times (120 times per pair) (see Figure 1 below). Every time a pair was shown, the participant had to choose one of the two stimuli by clicking the left or right button on a device within 1700 ms: the left button was pressed to select the left stimulus and the right button to select the right stimulus. After the participant chose a stimulus, the choice was highlighted for 300 ms and then a picture followed for 700 ms to indicate whether the choice was correct or incorrect: a pleasant – usually erotic – picture indicated a correct answer and an aversive, unpleasant picture indicated an incorrect response. Whether a stimulus was correct or incorrect – the feedback given by pictures – was however probabilistic. The probability of reward was as follows: A/B (80%/20%), C/D (70%/30%), E/F (60%/40%). Participants usually learn to choose the optimal stimulus by learning from the feedback (Cavanagh et al., 2011). Thus, participants learn to choose A, C and E more often than B, D and F.

The pairs were presented conversely (thus B/A instead of A/B) half of the time. There were three versions of this task; every version had other symbols to represent A, B, C, D, E and F.

To measure how well participants learned from reward learning and punishment avoidance learning, participants underwent a testing phase – after the training phase. In this phase of the test, participants were shown the same stimuli as in the training phase, only this time all possible stimulus pairs were presented. This lead to 15 combinations of stimuli (A/B, A/C, A/D, A/E, A/F, B/C, B/D, B/E, B/F, C/D, C/E, C/F, D/E, D/F and E/F). Each pair was presented 12 time (180 trials in total), but this time no feedback picture was shown after choosing a stimulus. Reward learning was defined as choosing A over C/D/E/F (i.e. “choose A” accuracy) and punishment avoidance learning was defined as choosing C/D/E/F over B (i.e. “avoid B” accuracy).

(14)

14

Figure 1. Probabilistic reinforcement learning task.

Based on Cavanagh, J. F., Frank M. J. & Allen, J. J. (2011). Social stress reactivity alters reward and punishment learning. Social Cognitive and Affective Neuroscience, 6, 311-320.

The pleasant and unpleasant feedback-pictures were chosen from the International Affective Picture System (IAPS) (Lang, Bradley & Cuthbert, 2008). The valence and arousal of pictures were rated using a 9-point Likert scale and the pictures with equal valence and arousal were chosen (positive arousal for the pleasant pictures and negative arousal for the unpleasant pictures were equal): pleasant pictures valence (M 7.64 ± SD 1.37), arousal (M 6.59 ± SD 1.97); unpleasant pictures valence (M 1.92 ± SD 1.39) arousal (M 6.27 ± SD 2.31).

Stress measures. Stress was measured through cortisol levels, blood pressure, heart rate

and subjective stress. Cortisol levels, blood pressure, heart rate and subjective stress were measured four times: at baseline (T1), after speech preparation (T2), after presentation (T3) and after training (T4):

(15)

15 Cortisol levels were measured using Salivettes (Sarstedt, Germany). Participants were asked to put the salivette (a cotton roll) in their mouth and keep it there for approximately 90 seconds. After the salivette was soaked in saliva, the participant placed it in the plastic salivette-tube. The salivettes were stored at -20°C until assayed by Prof Kirschbaum’s laboratory (http://biopsychologie.tu-dresden.de). Cortisol levels were measured using a chemiluminescence-immunoassay kit (IBL, Hamburg, Germany). This kit is known to have a very good analytical and functional sensitivity (Westermann, Demir, & Herbst, 2004).

Systolic blood pressure, diastolic blood pressure and heart rate were measured using a fully automatic digital blood pressure monitor (OMRON, M7).

Subjective stress was determined based on a Visual Analogue Scale (VAS). Participants were asked to rate how tense or stressed they felt on a scale of zero to ten, zero being totally relaxed and ten being extremely tense or stressed.

Procedure

All participants arrived in the morning, two participants per day: the first one at 08:00 AM and the second one at 09:15 AM. Participants were either both in the control condition or both in the stress condition. Whether it was stress-day or control-day was balanced and participants were randomly assigned to one of these days to keep cortisol levels equal over groups, since cortisol levels are known to decrease during the day. After the participants arrived they were seated in a quiet and private room. First it was confirmed that participants did not eat, did not drink any coffee- or sugar-containing drinks, did not smoke and did not exercise in the past one and a half hour, and did not drink alcohol since 08:00 P.M. the previous evening, as instructed before participating. After this, information about the study in general and about the probabilistic reinforcement learning task in specific was given. Participants were told they had to repeatedly choose one of two stimuli and their choice was either correct or incorrect, as indicated by feedback

(16)

16 pictures. They were told that the feedback was probabilistic and were instructed to find out which of the stimuli had the biggest chance of being correct or incorrect. After the instruction participants got the opportunity to ask questions and informed consent was given. Subsequently, the stress manipulation protocol (the TSST) - or the no-stress protocol in control group - started: first the instructions, then the 10 minutes of preparation, then the 5 minute presentation and finally the 5 minute calculation task. After the stress manipulation, participants were brought to an MRI-scanner as part of a larger fMRI-study. In the scanner the participants were asked whether they had fully understood the instructions about the reinforcement learning task. If not, questions were answered, after which the training phase of the reinforcement learning task began. Saliva was collected and blood pressure, heart rate and subjective stress were measured four times during the study: before TSST instructions (‘baseline’), after TSST instructions (‘pre-TSST’), immediately after TSST instructions (‘post-TSST’) and after the training phase of the reinforcement learning task, immediately after the scan protocol (‘after training’). After leaving the scanner, participants returned to the quiet, private room and in here they performed the testing phase of the reinforcement learning task. Hereafter, participants filled out questionnaires, on the computer as well as on paper. Finally, an exit interview and a debriefing regarding the TSST followed, participants filled in a declaration form to receive financial compensation and were thanked for participation.

(17)

17

Data analysis Stress

To assess whether stress was successfully induced, repeated measures ANOVAs were used with Group as between-subjects factor, Time as within-subjects factor and cortisol, blood pressure, heart rate and subjective stress as dependent variables. If Mauchly’s Test of Sphericity showed the assumption of sphericity had been violated, a Greenhouse-Geisser correction was applied. ANOVAs were followed by independent t-tests. An interaction between Time and Group on the dependent variables was expected, with no differences at T1, while at T2, T3 and T4 the stress group was expected to be significantly higher than the control group in mean cortisol level, systolic and diastolic blood pressure, heart rate and subjective stress.

Reinforcement learning task

First, it was checked whether participants selected the most rewarding stimulus (A) over the most punishing stimulus (B) in the testing phase 50% or more of the time. Participants who did not reach this basic criterion could not be included in the analysis (Frank, Woroch & Curran, 2005). Also, it was checked whether there were no differences in how well participants in the stress and no-stress group learned the basic task. This was done by comparing the accuracy on AB trials in the training phase in the two groups, using a t-test. Furthermore, a t-test was done to compare the groups on accuracy on AB trials in the testing phase, to make sure that learning carried over from the training phase to the basic testing phase and that there were no differences in how well participants in the stress and no-stress group were able to transfer what they had learned in the training phase to the basic testing phase.

(18)

18

Mediation analysis

To test whether people’s individual cortisol response to stress plays a mediating role in reinforcement learning, mediation was investigated by testing whether there was a significant indirect effect of Group (independent variable) on reinforcement learning (dependent variable) through cortisol (mediator). Mediation was defined as the product of the effect of Group on cortisol and the effect of cortisol on reinforcement learning, while partialling out the effect of Group. To test this mediation, the syntax of the macro for SPSS written by Dr. A. Hayes (http://afhayes.com/spss-sas-and-mplus-macros-and-code.html) was used (Preacher and Hayes, 2008). This script uses the Sobel test to calculate the indirect effects and its bootstrap confidence intervals (Preacher and Hayes, 2004). A point estimate of the effects was calculated based on 5000 bootstrapped samples. The mediation effect was considered significant if zero did not fall within the given confidence interval.

Stimulus selection after feedback in the training phase

To gain further insight in the process of learning from reward and punishment, the tendency to stay with the same stimulus or to select the alternative one after positive and negative feedback in the learning phase was analyzed. Furthermore, the influence of stress on the learning process was analyzed. A repeated measures ANOVA was used with Group as between-subjects factor, and stimulus choice (stay or shift) and feedback picture (positive or negative) as within-subjects factors. This created four within-within-subjects variables: positive-stay (the amount of times a participant stayed with the previous choice after a positive picture), positive-shift (the amount of times a participant shifted from the previous choice after a positive picture), negative-stay (the amount of times a participant stayed with the previous choice after a negative picture) and negative-shift (the amount of times a participant negative-shifted from the previous choice after a negative picture). Since the three stimulus pairs all had different probabilities (“A/B” 80/20, “C/D” 70/30 and

(19)

19 “E/F” 60/40) and it is likely that behavior on these stimuli differed, data from these pairs were compared, by adding stimulus pair (AB, CD or EF) as a within-subject factor. A post-hoc pairwise comparison was done to see the direction of the interaction between feedback picture and stimulus choice and to see if this differed between stimulus pairs.

(20)

20

Results Stress measures

See Table 2 below for means and standard deviations of cortisol, blood pressure, heart rate and subjective stress.

Cortisol. There was a significant within-subjects effect of Time, F(1.90; 185.95) = 38.96, p

< .0005 (see Figure 2). A significant interaction between Time and Group was found, F(3, 29) = 7.35, p < .0005, which means that the stress manipulation led to significantly changes in cortisol levels over time. Independent t-tests performed at baseline, after speech preparation, after presentation and after training showed that the stress group and control group did not differ at baseline, t(104) = .07, p = .944 and after speech preparation, t(103) = -1.11, p = .272. After presentation the cortisol levels of the stress group were significantly higher compared to the control group, t(101) = -3.50, p = .001, and this remained the case after training, t(106) = -2.87, p = .005. The between-subjects factor Group was significant, F(1, 98) = 4.50, p = .036.

Figure 2. Mean cortisol levels in nmol/L and standard error in the stress and control group.

Nmol/L = nanomole per liter. Measure moments: 1 = after baseline, 2 = after speech preparation (10 minutes after baseline), 3 = after presentation (20 minutes after baseline), 4 = after training (65 minutes after baseline). * p < .05

(21)

21

Blood pressure.

Systolic blood pressure (SBP). There was a significant within-subjects effect of Time,

F(3, 336) = 7.27, p < .0005 (see Figure 3). A significant interaction between Time and Group was

found, F(3, 336) = 16.71, p < .0005, which means that the stress manipulation lead to significantly changes in SBP over time. Independent t-tests performed at baseline, after speech preparation, after presentation and after training showed that the stress group and control group did not differ at baseline, t(112) = -.76, p = .450. After speech preparation the SBP levels of the stress group were significantly higher compared to the control group, t(112) = -2.53, p = .013, as well as after presentation t(112) = -6.15, p < .0005 and after training, t(112) = -2.91, p = .004. The between-subjects factor Group was significant, F(1, 112) = 14.30, p < .0005.

Figure 3. Mean Systolic Blood Pressure in mmHg and standard error in the stress and control group.

MmHg = millimeter of mercury. Measure moments: 1 = after baseline, 2 = after speech preparation (10 minutes after baseline), 3 = after presentation (20 minutes after baseline), 4 = after training (65 minutes after baseline). * p < .05

(22)

22

Diastolic blood pressure (DBP). There was a significant within-subjects effect of Time,

F(3, 336) = 16.62, p < .0005 (see Figure 4). A significant interaction between Time and Group was

found, F(3, 336) = 8.36, p < .0005, which means that the stress manipulation lead to significantly changes in DBP over time. Independent t-tests performed at baseline, after speech preparation, after presentation and after training showed that the stress group and control group did not differ at baseline, t(112) = -.81, p = .419. After speech preparation the DBP levels of the stress group were significantly higher compared to the control group, t(112) = -3.77, p < .0005, and this remained the case after presentation, t(112) = -5.37, p < .0005 and after training, t(112) = -2.51, p = .013. The between-subjects factor Group was significant, F(1, 112) = 16.71, p < .0005.

Figure 4. Mean Diastolic Blood Pressure in mmHg and standard error in the stress and control

group.

MmHg = millimeter of mercury. Measure moments: 1 = after baseline, 2 = after speech preparation (10 minutes after baseline), 3 = after presentation (20 minutes after baseline), 4 = after training (65 minutes after baseline). * p < .05

(23)

23

Heart rate. There was a significant within-subjects effect of Time, F(3, 336) = 4.50, p =

.004 (see Figure 5). A trend between Time and Group was found, F(3, 336) = 2.33, p = .074, which means that the stress manipulation lead to changes in heart rate over time. Independent t-tests performed at baseline, after speech preparation, after presentation and after training showed that heart rates of the stress group were significantly higher compared to the control group at baseline,

t(112) = -2.25, p = .026, after speech preparation, t(112) = -4.30, p < .0005, after presentation, t(112) = -2.82, p = .006 and after training, t(112) = -4.15, p < .0005. The between-subjects factor

Group was significant, F(1, 112) = 14.21, p < .0005.

Figure 5. Mean Heart Rate in pulse/min and standard error in the stress and control group.

Pulse/min = pulse per minute. Measure moments: 1 = after baseline, 2 = after speech preparation (10 minutes after baseline), 3 = after presentation (20 minutes after baseline), 4 = after training (65 minutes after baseline). * p < .05

Subjective stress. There was a significant effect of Time, F(2.48, 275.41) = 37,13, p <

.0005 (see Figure 6). A significant interaction between Time and Group was found, F(3, 333) = 45.83, p < .0005, which means that the stress manipulation lead to significantly changes in

(24)

24 subjective stress over time. Independent t-tests performed at baseline, after speech preparation, after presentation and after training showed that the stress group and control group did not differ at baseline, t(108.88) = -.84, p = .401. After speech preparation the subjective stress of the stress group was significantly higher compared to the control group, t(106.63) = -8.26, p < .0005, and this remained the case after presentation, t(105.68) = -9.82, p < .0005. After training the subjective stress did not differ between groups, t(111) = -.74, p = .463. The between-subjects factor Group was significant, F(1, 111) = 38.48, p < .0005.

Overall it can be concluded that the stress manipulation succeeded.

Figure 6. Mean Subjective Stress levels and standard error in the stress and control group.

VAS = visual analogue scale. Measure moments: 1 = after baseline, 2 = after speech preparation (10 minutes after baseline), 3 = after presentation (20 minutes after baseline), 4 = after training (65 minutes after baseline). * p < .05

(25)

25 Table 2

Means and Standard Deviations of Physiological and Subjective Stress Measures.

T1 T2 T3 T4 M ± SD M ± SD M ± SD M ± SD Control group SBP 127.49 ± 11.35 125.60 ± 12.33 124.68 ± 12.51 125.96 ± 10.41 DBP 75.09 ± 7.18 73.75 ± 6.92 75.40 ± 8.63 78.66 ± 8.04 HR 59.81 ± 9.94 58.51 ± 9.26 61.40 ± 10.49 58.58 ± 8.41 Cortisol 33.54 ± 11.77 31.64 ± 11.90 29.87 ± 11.05 21.53 ± 11.45 Subjective stress 2.32 ± 1.17 2.45 ± 1.08 2.15 ± 1.32 2.45 ± 1.59 Stress group SBP 129.15 ± 11.87 131.98 ± 14.29 140.48 ± 14.63 131.89 ± 11.20 DBP 76.44 ± 10.08 79.48 ± 8.98 84.92 ± 10.09 82.44 ± 8.01 HR 63.93 ± 9.59 66.15 ± 8.94 66.85 ± 10.16 65.54 ± 9.34 Cortisol 33.36 ± 14.26 34.34 ± 13.11 39.00 ± 15.02 27.78 ± 11.21 Subjective stress 2.54 ± 1.61 4.52 ± 1.58 5.20 ± 1.97 2.68 ± 1.72

Note. SBP = systolic blood pressure (mmHg); DBP = diastolic blood pressure (mmHg); HR =

heart rate (bpm); cortisol (nmol/L).

Reinforcement learning task

There were no differences in how well participants in the stress and no-stress group learned the basic task, t(112) = .49, p = .624. Furthermore, there were no differences in how well both groups were able to transfer what they had learned in the training phase to the basic testing phase

t(112) = .56, p = .587.

Mediation analysis

A mediation analysis with 5000 bootstrapped samples was used to test if cortisol acted as a mediator in the relation between stress and reinforcement learning (see figure 7 for reward learning and figure 8 for punishment avoidance learning). The analysis showed no significant total effect of

(26)

26 Group on reward learning (TE = -1.04, SE = 3.54, p = .770). The direct effect of Group on reward learning was also not significant (DE = 1.92, SE = 3.80, p = .614). The indirect effect, however, was significant. Cortisol did mediate the relationship between Group and reward learning (IE lower 95% BCA-CI = -6.2996, IE upper 95% BCA-CI = -.3111), which is considered significant since zero did not fall within the 5000 bootstrap confidence intervals. Participants with low cortisol levels tended to learn more from reward learning than those with high levels of cortisol.

The total effect of Group on punishment avoidance learning was not significant (TE = 1.63, SE = 2.30, p = .480). The direct effect of Group on punishment avoidance learning was not significant (DE = 1.99, SE = 2.51, p = .43), even as the indirect effect of Group on punishment avoidance learning with cortisol as mediator (IE lower 95% BCA-CI = -2.4321, IE upper 95% BCA-CI = 1.5217), since zero did fall within the 5000 bootstrap confidence intervals.

Figure 7. Mediation model reward learning.

Unstandardized regression coefficients for the relation between Group (stress and control) and reward learning accuracy, as mediated by cortisol levels. In parentheses is the unstandardized regression coefficient for the indirect effect. * p < .05.

(27)

27

Figure 8. Mediation model punishment avoidance learning.

Unstandardized regression coefficients for the relation between Group (stress and control) and punishment avoidance learning accuracy, as mediated by cortisol levels. In parentheses is the unstandardized regression coefficient for the indirect effect. * p < .05.

Stimulus selection after feedback in the learning phase

A repeated measures ANOVA’s revealed a main effect of stimulus choice (stay or shift): F (1, 112) = 48718, p <.001: participants were more likely to stay with their previously selected stimulus than to shift to the other stimulus (see figure 9, 10 and 11 for percentages). There was an interaction between feedback pictures (positive or negative) and stimulus choice: F (1, 112) = 601.01, p < .0005, which meant that stay and shift behavior was different for positive and negative feedback pictures. Surprisingly, pairwise comparison showed that participants did not only stay more than shift after positive feedback (Mean Difference= 43.92; p < .0005) (Mpositive-stay = 57.41,

Mpositive-shift =13.49) but also after negative feedback (Mean Difference= 13.02; p < .0005) (M

negative-stay = 30.19, Mnegative-shift = 17.17). Group did not interact with stimulus choice: F (1, 112) = 0.61,

p = .436, which meant that the stress and control group did not differ in their response to feedback

pictures. There was a significant interaction between stimulus choice and stimulus pair: F (2, 224) = 13.96, p < .0005. Pairwise comparison showed that participants shifted more in stimulus pair CD

(28)

28 than in AB (Mean Difference= 3.24; p < .0005) (MABshift = 12.91, MCDshift = 16.15) and stayed

more in AB than in CD (Mean Difference= 3.26; p < .0005) (MABstay = 46.26, MCDstay = 43.00).

Participants shifted more in EF than in AB (Mean Difference= 4.03; p < .0005) (MABshift = 12.91,

MEFshift = 16.94) and stayed more in AB than in EF (Mean Difference= 4.11; p < .0005) (MABstay

= 46.26, MEFstay = 42.15). Between CD and EF there was no difference in shifting (Mean

Difference= .79; p = .327) and staying (Mean Difference= .82; p = .301). There was a significant interaction between feedback pictures, stimulus choice and stimulus pair: F (2, 224) = 129.92, p < .0005. In all stimulus pairs participants stayed more than they shifted, for positive as well as for negative feedback pictures (see table 3 for means and p-values).

Overall, stress did not have an effect on stay or shift behavior in the training phase. In all stimulus pairs, participants were more likely to stay with the previously selected stimulus than to shift to the other stimulus; this was the case after receiving positive as well as negative feedback.

Figure 9. Percentages of staying with the previously selected stimulus and shifting to the other

(29)

29

Figure 10. Percentages of staying with the previously selected stimulus and shifting to the other

stimulus for negative and positive feedback pictures.

Figure 11. Percentages of staying with the previously selected stimulus and shifting to the other

(30)

30 Table 3

Pairwise Comparisons of Feedback Picture, Stimulus Pair and Stimulus Choice with Means, Standard Errors, Mean Difference and Corresponding p values.

Stimulus

Pair Feedback picture Shift Stay Difference p-value Mean

M SE M SE AB negative 13,94 .87 23,18 .40 9,24 < .0005 positive 11,88 .81 69,34 1.76 57,46 < .0005 CD negative 17,86 .87 30,51 .73 12,65 < .0005 positive 14,43 .91 55,5 1.53 41,06 < .0005 EF negative 19,71 .86 36,9 .70 17,18 < .0005 positive 14,16 .73 47,41 1.09 33,24 < .0005

(31)

31

Conclusion and Discussion

The present study investigated the effect of social stress on reinforcement learning by randomly allocating 114 young males to a stress or control condition. The stress manipulation succeeded: the TSST induced higher cortisol levels, heart rate, blood pressure and subjective stress in the stress group compared to the control group. After stress induction, a probabilistic reinforcement learning task was performed. No direct effect of stress on reinforcement learning was expected. It was expected that cortisol would positively mediate the relationship between social stress and reward learning. Results showed no direct effect of stress on reinforcement learning. Cortisol indeed mediated the effect of stress on reward learning. However, stress did not increase reward learning accuracy through cortisol, but reduced it. With regard to punishment avoidance learning, it was expected that cortisol would negatively mediate the relationship between social stress and punishment avoidance learning accuracy. Results however showed that cortisol did not function as a mediator in the relationship between stress and punishment avoidance learning. Below, the main results are discussed more extensively. In addition, limitations of the current study are described and recommendations for future research are proposed.

The findings of the current study suggest that positive and negative reinforcement are not two extremes on one scale, but rather involve two different mechanisms. Stress studies on reinforcement learning seem to provide more evidence for an effect of stress on reward learning than on punishment avoidance learning. For example, Lighthall and colleagues (2013) reported an effect of stress on reward learning, but found no effect on punishment avoidance learning. In rats, it was found that pretreating rats with a DA influencing drug (methamphetamine) had an effect on positive feedback accuracy, but not on negative feedback accuracy (Stolyarova, O’Dell, Marshall and Izquierdo, 2014). Furthermore, Schmidt, Braun, Wager and Shohamy (2014) studied the effects of dopaminergic medication and placebo on reward learning and found that DA-medication influenced reward learning, while there was no effect on punishment avoidance learning. On the

(32)

32 other hand, one study using DA-medication (L-Dopa) showed an effect on reward learning as well as punishment avoidance learning (Bódi et al., 2009). However, it is questionable whether higher levels of DA in Parkinson patients as a result of DA-medication can be compared with high DA levels in healthy patients. For example, L-Dopa is thought to bring DA to “normal, healthy” levels in Parkinson patients instead of elevated levels. Apart from the behavioral findings suggesting that punishment avoidance might be less affected by stress than reward learning, findings from an fMRI study corroborate the notion that reward learning and punishment avoidance learning involve different mechanisms and are dependent on different neural regions. Using a monetary reinforcement learning task, Kim, Yoon, Kim & Hamann, (2015) showed that reward and punishment trials activated different brain regions, with activation of the ventral and dorsal striatum during reward trials and activation of the dorsal striatum and prefrontal regions during avoidance trials. This suggests that stress only has an effect on reward learning and not on punishment avoidance learning, since stress is thought to lead to higher NAcc-DA-levels (when accompanied by higher cortisol levels) and the NAcc lies within the ventral striatum. Overall, the absence of an effect of stress on punishment avoidance learning appears to be consistent with previous literature. Although cortisol elevations in the present study were related to reward learning, the direction of the association was negative: contrary to our expectations, cortisol elevations were associated with lower reward learning accuracy instead of increased learning from reward. However, it is not the first time that this negative relation of cortisol with reward learning was found. Another stress study using a similar probabilistic reinforcement learning task also found that high cortisol levels after stress showed reduced reward learning accuracy compared to no stress, when post hoc grouping was based on cortisol levels and self-reported anxiety5 (Berghorst

5 Berghorst and colleagues (2013) created a “stress reactive” group based on standardized T2-T1 change scores in

cortisol levels and self-reported state anxiety as assessed by STAI scores, by dividing participants into three tiers and then reselecting these cut-off scores so that each tier represented approximately 1/3 of the participants. Participants from the stress group who were relatively high stress responders with regard to cortisol levels and self-reported state anxiety were selected for the “stress reactive” group and this group was compared to the no stress group.

(33)

33 et al., 2013). Other stress studies using a probabilistic reinforcement learning task found inconsistent results (Cavanagh et al., 2011; Lighthall et al., 2013; Petzold et al., 2010), yet several methodological factors may have led to this dissimilarity in literature. For instance, Cavanagh and colleagues (2011) created groups based on the Behavioral Inhibition Scale: stress lead to more reward learning (“choose A” accuracy) and less punishment avoidance learning (“avoid B” accuracy) in people with what they refer to as low trait-level punishment sensitivity and the opposite in people with high punishment sensitivity. Since analyses in this study are based on the BIS scale, comparison with our results cannot be made. Lighthall and colleagues (2013) found that stress enhanced reward learning (“choose A” accuracy) and found no effect for punishment avoidance learning (“avoid B” accuracy). Lighthall and colleagues (2013) used a cold pressor stress manipulation, which does not include social evaluative threat and is known to have a small effect on cortisol. It is therefore probable that the variance in cortisol levels in this study was low and that results are better explainable by physiological responses or factors other than cortisol levels. For example, noradrenaline is thought to influence NAcc-DA (e.g. Puglisi-Allegra & Ventura, 2012) but also genetic variation seems to influence NAcc activity in response to reward (Forbes, Brown, Kimak, Ferrel, Manuck & Hariri, 2009). Furthermore, the beginning of the probabilistic stimulus selection task (PSST) started almost fifteen minutes after the stress manipulation, which raises the possibility that participants felt relieved by the time they started the learning phase of the PSST (Berghorst et al., 2013). Relief from stressors is thought to activate reward regions in the brain (Leknes, Lee, Berna, Andersson & Tracey, 2011) and in rats relief from pain was found to lead to increases in DA levels (Navratilova et al., 2012); which could have had an effect on the reinforcement learning results. Petzold and colleagues (2010) found that stress led to less punishment avoidance learning (avoid B accuracy); they did not find an effect of stress on reward learning (choose A accuracy). Petzold and colleagues (2010) had a small sample size (N = 23) which casts doubts on the reliability of the results. Altogether, despite inconsistent results – possibly due

(34)

34 to methodological issues – there seems to be evidence for a negative effect of cortisol on reward learning accuracy.

There are further indications that stress-induced high NAcc-DA leads to decreased reward learning. Although - to our knowledge - there has not yet been a PET-study investigating the relationship between NAcc-DA and reward learning in healthy humans, increasing DA-levels by administering L-Dopa deteriorated probabilistic learning in healthy individuals (Cools, Barker, Sahakian & Robbins, 2001) associated with the ventral striatum. The ‘probabilistic reversal learning paradigm’ requires to alter behavior in response to changing reinforcement stimuli. As explained by Cools and colleagues (2001), this type of learning is impaired in people with lesions of the orbitofrontal cortex (OFC) and the ventral striatum circuitry (Rolls, 1999). Because L-Dopa has an effect on DA-levels in the striatum, the impairing effect of L-Dopa on this type of probabilistic learning is probably caused by high doses of DA in the ventral striatum (Cools et al., 2001). In another study L-Dopa led to more mistakes in predicting which stimuli would be followed by reward on an incremental learning task (Shohamy, Myers, Geghman, Sage & Gluck, 2006). Interestingly, in mice, striatal neurons expressing D1 receptors have been shown to enhance reward learning, whereas striatal neurons expressing D2 receptors have been shown to reduce reward learning when activated (Kravitz, Tye, Kreitzer, 2012; Lobo et al., 2010). Studies investigating stress effects on D1 and D2 expression are very scarce, but so far chronic stress has been shown to increase D2r-binding in rats (Lucas, Wang, McCall & McEwen, 2007) and repeated stress in another study led to increased D2 as well as D1 receptor density in the NAcc of rats (Cabib, Giardino, Calzá, Zanni, Mele & Puglisi-Allegra, 1998). Future research is necessary in order to say whether the effects of stress on D2 receptors may have a negative effect on reward learning. In short, evidence from previous literature in combination with the results from the current study seems to indicate that DA and cortisol elevations might be inversely related to reward learning.

(35)

35

Terminology sensitivity vs seeking vs learning

A problem that arises when looking at the existing research concerns the reinforcement terminology. There are several terms researchers use to state their findings. For example, “choose A” accuracy – the amount of times a participant chooses the most rewarding stimulus in a probabilistic reinforcement learning task – is defined as reward learning, reward processing (Lighthall et al., 2013), positive feedback use, positive feedback-based learning, positive feedback processing (Petzold et al., 2010), reward seeking, reward learning (Cavanagh et al., 2011), positive feedback and reward processing (Berghorst et al., 2013). Furthermore, in most studies the theoretical framework is built upon the combination of studies about both “choose A” accuracy and reward sensitivity, as if these two represent the same construct. Berghorst and colleagues (2013) draw the conclusion that a reduction of “choose A” accuracy “may reflect reduced sensitivity to positive feedback”. To our best knowledge there has not yet been a study directly linking “choose A” accuracy and reward sensitivity, and the thought that these constructs are positively related seems to be based on common sense more than on founded research. Kim, Yoon, Kim and Hamann (2015) on the other hand use the terms reward learning and reward sensitivity to indicate two different constructs. They refer to sensitivity to reward as “the degree to which an individual’s behavior is motivated by reward relevant stimuli which is believed to be regulated by the behavioral activation system (BAS)” and to reinforcement learning as "the ability of individuals to acquire knowledge that allows them to maximize rewards and avoid punishment”. Lighthall and colleagues (2013) also explain reward learning and reward sensitivity as two different phenomena, but in contrast to Kim and colleagues (2015) they do not use the BIS/BAS to identify sensitivity; they claim that sensitivity can be measured by “evaluating the tendency to select the same symbol after a win (win-stay) and the alternative symbol following a loss (lose-shift).”. However, they do not propose a clear definition of what reward sensitivity is, nor do they explain how this behavioral measure reflects the same outcome as the BIS/BAS scales. Sometimes the Sensitivity to

(36)

36 Punishment and Sensitivity to Reward Questionnaire (SPSRQ) is used to assess reward sensitivity (e.g. Franken & Muris, 2005; Genovese & Wallace, 2007; Glashouwer, Bloot, Veenstra, Franken & de Jong, 2014). More than once, the term reward sensitivity has been used to indicate sensitivity of the brain’s reward circuitry (e.g. Volkow, Wang, Fowler, Tomasi, Telang & Baler, 2010). Clearly, there is a lack of consistency in the use of reinforcement terminology between studies, as well as within studies. This makes interpretation and comparison of existing literature complicated. Therefore, the field of reinforcement research is in need of a clear review explaining the different constructs and terms concerning this subject.

Limitations

A limitation of this study is that – despite an extensive screening protocol – 59 out of 114 participants had a score on the Alcohol Use Disorder Identification Test (AUDIT) that indicated hazardous and harmful alcohol use6. Furthermore, there was a trend for the stress group to use more alcohol than the control group. Alcohol relapsers and abstainers (short-term as well as long-term abstinent alcoholics) are known to show volume reductions in the brain reward system (Durazzo, Tosun, Buckley, Gazdzinski, Mon, Fryer and Meyerhoff, 2011; Makris et al., 2008). Wrase and colleagues (2007) and Beck and colleagues (2009) reported reduced activation of the ventral striatum during anticipation of nonalcoholic reward (monetary gain) compared to healthy controls. Overall, it is possible that the alcohol use of participants influenced the results of the current study. Results should therefore be interpreted carefully.

Another limitation of this study is that we only used male participants. Previous research revealed gender-differences in HPA responsiveness to social stress (Kirschbaum, Kudielka, Gaab, Schommer, & Hellhammer, 1999; Uhart, Chong, Oswald, Lin & Wand, 2006). Males tend to show

6 A cutoff of 10 is used, since this provides greater specificity than the sometimes used cutoff of 8 (Babor,

(37)

37 a higher cortisol response to stress than females and gender-differences in reward processing and reward sensitivity have been reported (Lighthall, Mather & Gorlick, 2009; Lighthall, Sakaki, Vasunilashorn, Nga, Somayajula, Chen, Samii, & Mather, 2012). Therefore, results from the current study cannot be generalized to the entire population. Future studies should include both men and women, taking menstrual cycle and oral contraceptive into account, since these are factors known to influence cortisol levels of women (e.g. Kirschbaum et al., 1999).

The task we used in this research to measure reinforcement learning was an adapted version of the probabilistic reinforcement learning task by Frank and colleagues (2004). The difference between the regular task and our adapted version was that we used positive and negative arousing pictures to indicate correct or incorrect response (instead of smileys or the words “correct” and “incorrect”). Several participants informally indicated that they found the unpleasant images extremely unpleasant and intense. If the negative pictures were indeed experienced as more severe, this could have caused a “freeze” reaction, which might explain the tendency to stay with the previously selected stimulus after negative feedback. Although the positive and negative IAPS pictures had been selected on equal ratings of arousal (Lang, Bradley & Cuthbert, 2008) and participants were not specifically and structurally asked how they perceived the pictures, it would be advisable to investigate the possible different effects of feedback smileys or words and feedback pictures on reinforcement learning tasks.

Addiction

This study was based on the assumption that DA causes the incentive salience of a stimulus to increase, which would lead to approach behavior and would eventually cause compulsive ‘wanting’ in addiction (Robinson and Berridge, 1993; 2000). Since stress induced DA is related to cortisol increase (e.g. Pruessner et al., 2004) we expected that high levels of cortisol after social

(38)

38 stress would lead to increased incentive salience and behavior towards reward, thus more reward learning, and would eventually predict addiction. Since this study found a decrease of reward learning in people with a high cortisol response to social stress, the link between stress, cortisol and addiction needs to be reconsidered.

It is possible that addiction is a result of not increased but decreased DA levels. Volkow and colleagues (2010) suggest that frequent drug exposure reduces phasic DA signaling which is necessary for experiencing a “high” feeling. According to Robinson and Berridge (1993; 2000), this reduced DA level would result in reduced incentive salience and thus an expected decrease of behavior towards these cues. However, Volkow and colleagues (2010) found this decrease of DA in people who become addicted and therefore approach behavior. Furthermore, it was found that forcing these DA receptors (D2) to overproduce, self-administration of cocaine (Thanos, Michaelides, Umegaki & Volkow, 2008) or alcohol (Thanos et al., 2004) decreases. In other words, it seems as if the lowered DA secretion makes these people need more and more rewards (and DA) in order to experience them as such. Accordingly, Spear (2000) argues that reduced activation of the ventral striatum may lead to a less satisfying feeling from reward, which may lead to more DA-related reward seeking behavior. Blum and colleagues (2000) state that when the reward system is less responsive, this may result in compensating for this deficiency by compulsive drug use. This is in correspondence with our findings, that people with a low cortisol response to stress learn more from rewards. In contrast, rat studies have shown that blocking the secretion of corticosterone (rodent equivalent of cortisol) reduces self-administration of cocaine and this effect is partially reversed when corticosterone is replaced (Goeders & Guerin, 1996). The influence of cortisol on addiction might thus work differently for animals than for humans. Overall, in humans, cortisol might be indicative of protection against addiction.

In line with this thought, previous research found that psychological stress responses (e.g. cortisol responses) are decreased in children with a family with a history of alcohol abuse (Moss,

(39)

39 Vanyukov, Martin, 1995; Moss, Vanyukov, Yao, Kirillova, 1999). Moreover, a follow up study of these children showed that decreased cortisol levels were related to increased cigarette and marijuana use later on. Yau, Zubieta, Weiland, Samudra, Zucker and Hetzeg (2012) found reduced ventral striatum activation during reward and loss anticipation in a monetary incentive delay task in children of alcoholics (COA), but reduced activation was only found in those with no current and lifetime problematic drinking behavior; COA’s who did demonstrate a history of alcohol consumption showed increased NAcc activation. It should be noted, however, that participants in this study were selected based on retrospective externalizing behavioral risk in the age of 12 to 14. Although research focusing on the development of problematic substance use in COA’s so far has shown inconclusive results, overall higher levels of cortisol (and associated DA levels) seem to provide a protective factor for addiction and lower levels of cortisol (and DA) seem to be a risk factor for becoming addicted. The theory of Robinson and Berridge (1993; 2000) could be correct by stating that an increase of (phasic) DA results in increased incentive salience (for example of alcohol- or drug-related stimuli), but Volkow and colleagues (2010) explain that this increase of DA is necessary but not sufficient for developing an addiction. Only when DA secretion is lowered, individuals will look for ways to complement these levels, for example by compulsively using substances.

Depression

Apart from the extensively discussed relation between stress and addiction, stress has also been linked to the onset of depression (e.g. Nestler, Gould & Manji, 2002; Hammen, 2005; De Kloet, Joëls & Holsboer, 2005) and chronic stress has been linked to depression relapse (Lethbridge & Allen, 2008) and an increase of depressive symptoms (Leskelä et al., 2006). One of the main symptoms of mood disorders is anhedonia, the reduced ability to experience pleasure in activities. Since the brain’s reward system is critically involved in experiencing stimuli or events as rewarding,

(40)

40 it is not unlikely that altered reward learning underlies depression. Indeed, anhedonia has been linked to reduced reward learning (Pizzagalli, Iosifescu, Hallett, Ratner & Fava, 2008) and reduced reward responsiveness after social stress (Bogdan and Pizzagalli, 2006). Also, a rat study in which the animals received corticosterone injections suggests that repeated exposure to corticosterone increases depression-related behavior (Kalynchuk, Gregus, Boudreau, & Perrot-Sinal, 2004). A meta-analysis revealed higher cortisol levels in clinically depressed patients compared to healthy controls (Knorr, Vinberg, Kessing & Wetterslev, 2010). Stress lead to higher cortisol elevations and increased striatal activation in individuals with recurrent depression in remission compared to healthy controls (Admon et al., 2015). Combining these findings with the results of the current study, a high cortisol response to stress and associated high levels of DA, NAcc activation and reduced reward learning seems to increase the risk of anhedonia and depressive symptoms.

Conclusion

The present study suggests that individual cortisol responses to stress influence reward learning accuracy: individuals with a low cortisol response to social stress were shown to learn more from reward learning. This might have implications for the understanding of addiction and possibly mood disorders: a low cortisol response resulting in increased reward learning accuracy might make individuals more vulnerable to addiction and a high cortisol response and consequential reduced reward learning accuracy could be a factor in the development of depressive symptoms. The field of reinforcement research is in need of a clear review on reinforcement terminology and PET studies in both men and women are advised in order to directly test the relationship between cortisol, NAcc-DA and reward learning.

(41)

41

References

Admon, R., Holsen, L. M., Aizley, H., Remington, A., Whitfield-Gabrieli, S., Goldstein, J. M., & Pizzagalli, D. A. (2015). Striatal hypersensitivity during stress in remitted individuals with recurrent depression. Biological psychiatry, 78(1), 67-76.

Arrindell, W. A. & Ettema, J. H. M. (2003). SCL-90. Symptom Checklist. Handleiding bij een

multidimensionele psychopathologie-indicator. Lisse, Nederland: Swets Test Publishers.

Babor, T. F., Higgins-Biddle, J. C., Saunders, J. B., & Monteiro, M. G. (2001). Audit. The Alcohol

Use Disorders Identification Test (AUDIT): Guidelines for use in primary care.

Beck, A. T., Steer, R. A., & Brown, G. K. (2002). BDI-II-NL Handleiding (second edition). Amsterdam, Netherlands: Harcourt Test Publishers.

Beck, A., Schlagenhauf, F., Wüstenberg, T., Hein, J., Kienast, T., Kahnt, T., Schmack, K., Hägele, C., Knutson, B., Heinz, A., & Wrase, J. (2009). Ventral striatal activation during reward anticipation correlates with impulsivity in alcoholics. Biological psychiatry, 66(8), 734-742.

Berridge, K. C. (2007). The debate over dopamine’s role in reward: the case for incentive salience.

Psychopharmacology, 191, 391–431.

Berghorst, L. H., Bogdan, R., Frank, M. J. & Pizzagalli, D. A. (2013). Acute Stress Selectively Reduces Reward Sensitivity. Frontiers in Human Neuroscience, 7(133), 1-12.

(42)

42 Blum, K., Braverman, E. R., Holder, J. M., Lubar, J. F., Monastra, V. J., Miller, D., Lubar, J. O, Chen, T. J. H., & Comings, D. E. (2000). The reward deficiency syndrome: a biogenetic model for the diagnosis and treatment of impulsive, addictive and compulsive behaviors.

Journal of psychoactive drugs, 32(sup1), 1-112.

Bódi, N., Keri, S., Nagy, H., Moustafa, A., Myers, C.E., Daw, N., Dibo, G., Takats, A., Bereczki, D., & Gluck, M.A. (2009). Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients. Brain, 132, 2385–2395

Bogdan, R., & Pizzagalli, A. (2006). Acute stress reduces reward responsiveness: Implications for depression. Biological Psychiatry, 60, 1147-1154.

Brown, S. A., Vik, P. W., McQuaid, J. R., Patterson, T. L., Irwin, M. R., & Grant, I. (1990). Severity of psychosocial stress and outcome of alcoholism treatment. Journal of Abnormal Psychology,

99, 344-348.

Brown, S.A., Vik, P.W., Patterson, T.L., Grant, I., & Schuckit, M.A. (1995). Stress, vulnerability and adult alcohol relapse. Journal of Studies on Alcohol and Drugs, 56, 538–545.

Cabib, S., Giardino, L., Calza, L., Zanni, M., Mele, A., & Puglisi-Allegra, S. (1998). Stress promotes major changes in dopamine receptor densities within the mesoaccumbens and nigrostriatal systems. Neuroscience, 84(1), 193-200.

Referenties

GERELATEERDE DOCUMENTEN

Meer dan eens kwam tijdens de verdediging van haar proefschrift in me op, dat je altijd duidelijk moet zijn in je doelstelling en communicatief bij je uitleg. Zij deed dat met

Figuur 7 geeft een totaal beeld weer van de gevonden aantallen amfibieën voor en na het baggeren, terwijl Figuur 8 laat zien in welke aantallen de betreffende soorten zijn gevangen

Materials such as ABS, NinjaFlex, and ULTEM 9085 were used in the fabrication process in order to determine the optimal correct stiffness, filling factor, and printing

In summary, individuals with relatively high cortisol responses showed diminished affect congruent approach and avoidance action tendencies towards positive and threatening

Tussen 20 december 2011 en 27 januari 2012 werd door de Archeologische Dienst Antwerpse Kempen (AdAK) in samenwerking met Stad Turnhout een archeologische

The victory followed an algorithm capable of learning to play a range of Atari games on human level performance solely by looking at the pixel values of the game screen and the

Figure 3.9: Difference in points between the two network experiment and previous data, using Max-Boltzmann.. Figure 3.10: Difference in points between the four network experiment

However when using multiple networks to control a sub-set of joints we can improve on the results, even reaching a 100% success rate for both exploration methods, not only showing