• No results found

Habits, overtraining and RRBs : an investigation into the relationship between outcome devaluation sensitivity, overtraining, and repetitive restricted behavior.

N/A
N/A
Protected

Academic year: 2021

Share "Habits, overtraining and RRBs : an investigation into the relationship between outcome devaluation sensitivity, overtraining, and repetitive restricted behavior."

Copied!
27
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Habits, overtraining and RRBs:

An investigation into the relationship between outcome devaluation sensitivity, overtraining, and repetitive restricted behavior.

Jos van Leeuwen 10190899

(2)

Index 2 Index 3 Abstract 3 Introduction 12 Methodology 19 Results 22 Discussion 25 References

(3)

Abstract

This study investigated the effect of outcome value, outcome congruence, training amount, and propensity to exhibit repetitive and restricted behaviors (RRBs) on response selection. To this end, a novel stimulus-reponse task was employed, called the Sneaky Snack Game, which aims to operationalize the distinction between goal-directed and habitual behavior. As was expected, participants were sensitive to the value of an outcome of a response, indicating intentional control of behavior, and made more mistakes when the value of the outcome differed between the training phase and the test phase, indicating reliance on habits. However, contrary to expectations, there was no effect of training amount or propensity to exhibit RRBs on response patterns in the test phase, indicating that these factors do not influence the extent to which participants relied on habits.

Implications of these findings for the validity of the Sneaky Snack Game and possibilities for further research are discussed.

Introduction

Imagine traveling to work or school. Most likely, you will take exactly the same route every day, time and time again. Then, on a certain day, you see a traffic sign telling you that there will be roadwork during the following weeks, which makes it necessary to make a time-consuming detour. You know a different route, which will take less time than the detour, and so you decide that from now on you will take this alternative route, until the roadwork is finished. However, the following day, while traveling to work, you suddenly find yourself at the construction site again, and you realize that you have taken your standard route again without thinking about it at all. You made a "slip of action": instead of doing what is most expedient, given your current beliefs and goals, you did what you were used to do. Instead of acting intentionally, you acted merely out of habit.

This distinction between intentional and unintentional, or goal-directed and habitual behavior, is a prominent part of "folk psychology": the common-sense explanations of cognition and behavior which people use in their daily lives to interpret the actions of the people around them (de Wit & Dickinson, 2009; Knobe, 2006; Gordon 1986). Furthermore, this distinction has been widely used in philosophy and behavioral science, for instance to determine if an action is the subject of moral evaluation. Thus, Aristotle already contrasted voluntary and involuntary actions, regarding only the former as the proper subject of "praise and blame" (350 BC) It is also

fundamental to modern social science. For instance, the famous sociologist Max Weber (1922) distinguished "action", which is subjectively meaningful, from "behavior", which is regarded as

(4)

reactive activity "to which no meaning is attached”.

In this study, a novel task will be tested which attempts to distinguish between intentional and habitual behavior. This task, called the Sneaky Snack Game, is a computer-based stimulus-response task which allows comparing stimulus-responses involving different outcome values, different previously established response patterns, and different amounts of previous training. Several hypotheses about the performance of participants on this task have been established on the basis of earlier research about behavior. These are discussed in the following paragraphs.

The psychology of intentional action

In modern cognitive psychology, goal-directed action is explained through the "belief-desire theory" (Stich, 1978; Georgeff et al. 1998). This theory has been succinctly summarized by De Wit and Dickinson (2009) as the idea that goal-directed action is performed because "an agent desires the goal and believes that the behavior in question will achieve the goal". Thus, a goal-directed behavior has to fulfill two criteria: it must be "controlled by a belief that the action will cause the goal" (the belief criterion), and it must be "controlled by the affective or motivational value of the outcome at the time that the action is performed" (the desire criterion). An example from daily life is studying for an exam: if you want to get a high grade on the test, and you believe that studying will allow you to get a high grade, the rational thing to do is to start studying. Another one is going to the grocery store: if you are hungry, and you know that you can buy something to eat at the grocery store, the rational thing to do is to go there and buy it.

It should be clear from the definition of the “belief criterion” featured above that the beliefs which are involved in goal-directed actions are of a special kind: they are beliefs that a certain action will lead to a certain outcome, or response-outcome (R-O) associations. Such associations are acquired through instrumental learning: a behavior is consistently coupled to a certain outcome, after which this outcome comes to be expected after performance of the behavior. They are not only central to goal-directed action, but also more generally to what Skinner (1938) called operant conditioning, in which behavior is coupled with outcomes with positive values (rewards) or

negative values (punishments), after which performance of the behavior becomes respectively more or less likely. On the basis of these theories, the following hypothesis can be established:

H1: Participants will be sensitive to the value of an outcome. Thus, they will more often respond for valuable (go) outcomes than for non-valuable (no-go) outcomes.

(5)

The psychology of habitual action

In contrast to goal-directed action, habitual behavior is not determined by a current expectation of a certain desired outcome: it is neither controlled by a belief in the efficacy of the action in bringing about a certain goal, nor by a desire for a certain goal. And as it fulfills neither the belief-criterion nor the desire-criterion, it is therefore not goal-directed. It is not emitted with conscious intention, deliberation or awareness, but rather elicited through direct stimulus-response (S-R) associations, simply "triggered by the stimuli in whose presence it has been repeatedly performed" (Dickinson 1985). In other words, such behavior possesses "behavioral autonomy" or "automaticity" (Bargh 1994; Wood & Neal 2007). An example from daily life is always going to bed when the arrows of your watch indicate it is 10 PM. In this case, you do not go to bed because you are tired and decide you need to sleep, but because the position of the arrows on your watch trigger a behavior which you always perform at that moment. Another example is buying icecream from a vendor every time you hear the jingle of his van. In this case, you do not buy the icecream because you were already craving for it and then noticed him, but because hearing the jingle triggers a behavior you have performed many times before.

Still, like goal-directed actions, habitual actions are instrumental. Although they do not involve conscious deliberation about a goal, they usually lead to positively valued outcomes. For instance, going to bed allows one to sleep and gain new energy, and someone normally enjoys the icecream he buys from the vendor. Indeed, habitual actions originate as goal-directed actions, which become automatic as a result of Thorndike's (1905, 1911) “law of association” or “law of effect”: if a behavior has an outcome which is positively valued, and this behavior is performed in the same context over and over again, eventually this contextual stimulus comes to directly trigger the behavior.

The automaticity of habitual behavior allows for acting in a way that leads to desired outcomes effortlessly and smoothly, without the necessity of determining one's goals and calculating which action is most likely to bring these about. But this can also backfire: someone may have a certain goal, while his habitual way of acting is not the most efficient way to reach it. Also, like in the example of taking the wrong route to your work, habits might lead to "slips of action" (de Wit & Dickinson, 2009): although the outcome does not currently have value, but the action is still performed. In such cases, a habit does not correspond to a currently held goal. Based on these theories, the following hypothesis can be proposed:

(6)

H2: Participants will respond for valuable outcomes more often if they are used to do so (congruent valuable) than if they are not used to do so (incongruent valuable). Also, they will more often fail to withhold responding for non-valuable outcomes (i.e. make slips of action) if they are not used to do so (incongruent non-valuable) than if they are used to do so (congruent non-valuable).

Outcome devaluation and overtraining studies

The fact that habitual behavior and goal-directed action are often indistinguishable poses a problem for psychologists, who have both a scientific and a clinical interest in explicating the causes of behavior. One popular research design used to distinguish goal-directed action from habitual behavior is the "outcome devaluation task" which was articulated in its classical form by Adams and Dickinson in the 1980s (cf. Adams & Dickinson, 1981; Adams, 1982; Dickinson, 1985). The outcome-devaluation task involves three stages. First, in the training phase a subject is learned to perform a behavior by coupling it to a reinforcer. This is followed by a devaluation phase, in which the value of the reinforcer for the subject is removed. Such devaluation must occur in absence of performance of the behavior, for only the value of the outcome is supposed to change, while the relation between the action and the outcome must remain constant (Adams, 1982). Thus, devaluation is conducted in an “offline” condition (Dezfouli, Lingawi & Balleine, 2014). Outcome devaluation can be achieved in multiple ways, such as satiation, or giving the subject free access to the reinforcer, and conditioned aversion, or pairing the outcome to an aversive outcome (for a study comparing these procedures, see Collwill & Rescorla, 1985). In human subjects, outcome

devaluation is often accomplished through explicit instruction, for instance by telling the participant that a response is no longer worth points, and thus has no rewarding outcome anymore (cf. de Wit et al., 2007). Finally, in the testing phase, the subject is given the chance to perform the behavior again, but under extinction: it does not lead to the outcome anymore. The behavior then is assumed to be goal-directed to the extent that it is immediately performed less after devaluation of its outcome.

Such a change in performance after outcome devaluation implies that action is goal-directed for two reasons (De Wit & Dickinson, 2009). First, a change in the value of an outcome can only immediately influence behavior if an agent beliefs his action will have this outcome (and thus fulfills the belief criterion). Second, such a change in value can only immediately influence behavior if the agent selects the action because this outcome has a certain value for him (and thus

(7)

fulfills the desire criterion). This is also why it is important that the testing phase is conducted in extinction, for otherwise a decrease in responding could still be the result of a gradual weakening of the stimulus-response relation through punishment, rather than of a deliberative process taking into account its diminished value. Through such procedures, the hypotheses which have been

established in the previous paragraphs can be empirically tested. However, previous versions of the outcome devaluation task have only included two response types: congruent valuable (still-valued) and incongruent non-valuable (devalued). This means that these tasks are unable to test the second hypothesis as it has been formulated here. Therefore, the Sneaky Snack Game also includes congruent non-valuable (never-valued) and incongruent valuable (revalued) outcomes.

The outcome devaluation model can also be extended to include various other conditions, thus allowing the study of action selection and habit formation in different circumstances. One such extension is the “overtraining paradigm”, in which different levels of training are included. Thus, it has been shown that when rats are trained over a great many trials, e.g. 1000 rather than 100, the overtrained group keeps performing the devalued behavior much more often than the moderately trained group (Adams, 1982). In this group, the propensity to respond is not influenced by the value of the outcome anymore, and is therefore not goal-directed but rather habitual. However, the

"overtraining paradigm" has not yet not been extensively studied among human subjects. Research on this effect among human subjects is scant: as of yet, there has been only one demonstration of overtraining leading to S-R habits in human subjects (Tricomi, Balleine & Doherty 2009).

However, it makes intuitive sense that in humans, too, habits become stronger when an action is repeated more often. To use the icecream example again, it is likely that someone who is currently not hungry at all will not approach the icecream vendor as often as someone who is, even if being reminded of buying icecream by the jingle. But if this is something he has done countless times, never deciding not to buy icecream, the chance is higher that he will still approach the icecream vendor, maybe only afterwards realizing that he did not want any icecream in the first place. Based on these studies, the following hypothesis can be proposed:

H3: After a large amount of training subjects will respond more often for congruent valuable outcomes and respond less often for incongruent valuable outcomes than after a small amount of training. Also, after a large amount of training they will more often fail to withhold

responding for incongruent non-valuable outcomes, and less often fail to withhold responding for congruent non-valuable outcomes than after a small amount of training.

(8)

Habits and psychopathology

Earlier, it was already mentioned that reliance on habits does not always have positive consequences. Indeed, the inability to take into account the current value of an outcome is the very characteristic of habitual action that allows distinguishing it from intentional action through use of the outcome devaluation design. And whether an action is intentional or habitual is not merely of theoretical interest, but also has practical importance: in clinical psychology, overreliance on habits is often associated with maladaptive behavior (de Wit & Dickinson, 2009). For instance, Hogarth, Chase and Baess (2010) have shown that overreliance on habits is linked to high impulsivity, which according to these authors is in turn associated with a wide array of disorders and pathological behaviors, such as ADHD, gambling, cocaine use, binge drinking, binge eating, depression, mania, and suicidality. Indeed, one of the very criteria of substance abuse is the inability to quit despite knowledge of adverse physical and psychological consequences (APA, 2013), which closely

matches the description of overreliance on habits as continuing to do something despite its outcome having no current value. Likewise, animal research has shown that although substance-seeking behavior is at first often intentional, at a certain point a loss of flexible control occurs and a switch is made to a habit-based system (Corbit & Janak, 2016).

Rigid, inflexible behavior is also a prominent feature of OCD, in the form of repetitive, ritualistic behaviors (compulsions) and thought patterns (obsessions) (APA 2013). Indeed, it has been shown that people with OCD are more sensitive to the formation of habits (Gillan et al., 2011; 2014). A third disorder which is often associated with impaired ability of goal-directed action is autistic spectrum disorder (ASD). One core aspect of ASD is rigid, repetitive, stereotyped behaviors (RRBs). This characteristic was already included in its first conceptualizations (Asperger, 1944; Kanner, 1943), and is still regarded as a central symptom dimension in the current DSM (APA, 2013). Such RRBs are independent from outcome expectations, and thus appear to fit the

description of habitual behavior offered above. In earlier research, it was found that scores on the Adult Routines Inventory (ARI; Evans, Uljarevic & Lusk, 2016), a questionnaire designed to measure the amount of RRBs someone exhibits, are significantly higher for subjects with ASD or OCD than for subjects drawn from the general population, the same groups for which a decreased sensitivity to outcome devaluation has been found.

RRBs are not limited to autism: they also include the compulsions mentioned just before, and other behaviors which are found in a broad spectrum of disorders, including Tourette,

(9)

Prior, 2011). Furthermore, they occur as traits among the normal population as well. RRBs are often divided into two categories, lower-level and higher-level RRBs (Turner, 1999). Lower-level RRBs compromise repetitive motor behaviors and preoccupation with objects, while higher-level RRBs consist of restricted interests and nonfunctional routines. The fourth category, nonfunctional

routines, includes "inflexible adherence to specific routines or rituals, insistence on particular foods, wearing only certain items of clothing, and resistance to change in the environment" (Leekam, Uljarevic & Prior,; 2011). Such behaviors seem to be non-intentional according to the criteria discussed earlier: they are neither determined by a desire for an outcome, nor by a belief in the effectivity of the behavior in bringing about the outcome. And as could be expected, it has been demonstrated through outcome devaluation studies that many of the disorders thought to involve RRBs indeed involve overreliance on habits—for instance, in the case of OCD (Gillan et al. 2011; 2014), high motor impulsivity (Hogarth, Chase & Baess 2010), substance abuse (Ersche et al., 2016), Tourette ( Delorme, 2016), and Parkinson (De Wit, Barker, Dickinson & Cools, 2011).

However, not every disorder which involves RRBs has been conclusively shown to involve decreased sensitivity to outcome devaluation. Most importantly, Geurts and De Wit (2014) were unable to find a difference in sensitivity to outcome devaluation between children with ASD and children without any diagnosis. Thus, it could be the case that the rigid and inflexible behavior involved in disorders such as ASD is not related to an overreliance on habitual behavior. If such a relation does not exist, this means that RRBs are not the same as habitual behaviors, but are governed by other processes. However, in another study (Alvares et al. 2016), researchers found that participants with ASD and SAD maintained kept responding to devalued outcomes far more often than participants without disorders. This could mean that ASD is related to decreased outcome devaluation sensitivity after all. Based on these studies, the following hypothesis can be proposed:

H4: Participants with more RRBs will respond more often for congruent valuable outcomes and less often for incongruent valuable outcomes than people with less RRBs. Also,

participants with more RRBs will more often fail to withhold responding for incongruent non-valuable outcomes, and less often fail to withhold responding for congruent non-non-valuable outcomes than people with less RRBs.

(10)

Research question, operationalization and hypotheses

My research is about the relation between RRBs and habitual behavior. Specifically, I

investigate if there is a relation between the amount of RRBs someone exhibits and his or her ability to withhold a response after outcome devaluation. In this way, I hope to shed more light on the relation between RRBs and habitual behavior. If such a relation exists, this might provide a mechanism which explains how behavioral repetition renders maladaptive behaviour compulsive. Furthermore, as RRBs are a prominent part of autism, it might provide indirect support for the idea that certain aspects of autism involve overreliance on habits after all. I will also investigate the more general relations between outcome value, outcome congruence, amount of training, and response selection, which is the topic of the broader study of which my research is a part.

The data of this study were gathered in a lab experiment which makes use of the Sneaky Snack Game, an instrumental computer task which will be described in the following section. Using this task, response rates for several response types can be compared. Outcome value was

operationalized by instructing the participants that certain outcomes were valuable and that responding to the stimuli associated with these outcomes would earn them points, while also instructing them that other outcomes were non-valuable, and that responding to the stimuli associated with these outcomes would lead to the subtraction of points. Outcome congruence was operationalized by instructing the participants after the training phase that certain outcomes which were valuable before were now non-valuable (thus constituting an outcome devaluation condition), while other outcomes which were first non-valuable were now valuable. Overtraining was

operationalized as a within-subject variable, by training responses to some stimuli seven times as much as responses to other stimuli. Amount of RRBs was included to investigate individual differences in sensitivity to outcome devaluation, and was measured through use of the Adult Routines Inventory (ARI), a self-report questionnaire which will be described more in depth in the methodology section.

Based on the studies which have been discussed , several hypotheses were proposed about the behavior participants would show during the test phase. The first hypothesis was that there would be an effect of outcome value, i.e. that responses of the subjects would be sensitive to the value of their outcomes. The second hypothesis was that there would be an interaction between outcome value and outcome congruence, i.e. that responses of the subjects would be influenced by previous knowledge of the value of their outcomes. The third hypothesis was that there would be an interaction between outcome value, outcome congruence, and training amount, i.e. that the

(11)

differences in response rates between congruent and incongruent responses would be greater after a larger amount of training than after a smaller amount of training. The fourth hypothesis was that there would be an interaction between outcome value, outcome congruence, and amount of RRBs, i.e. that the differences in response rates between congruent and incongruent responses would be greater for participants exhibiting more RRBs and smaller for participants exhibiting less RRBs.

(12)

Methodology Sample characteristics

For this research, 53 participants have been recruited. Participants were selected by putting a description of the study on the website of the psychology lab of the University of Amsterdam (UvA). There were no exclusion criteria: everyone who wanted to participate in the study was allowed to do so. However, because the website used to recruit the participants is mostly viewed by students at the UvA, it could not be presupposed that the research sample is representative of the general adult population. In the results section, the gender ratio, mean age and age range of the sample population are described. Participants were told that they could participate either for a financial reward of 50 euros, or to earn "participation points", which is a requirement for first-year psychology students at the UvA. Furthermore, they were told that next to the standard

reimbursement, they could earn a financial bonus by collecting more points. The bonus amounted to one eurocent per point which the participant earned in the game.

Procedure

After participants made an appointment online, they visited the psychology lab of the UvA. During the first lab session, the Sneaky Snack Game was installed on their laptop, and participants practiced the task for half an hour. They also received instructions to do the task at home for half an hour each of the six following days. The eighth day, the participants had to come to the psychology lab again to participate in the testing phase. Finally, the participants were asked to fill in several questionnaires, including the ARI. In total, it took the participants 5 hours to complete all lab sessions and at-home-practice.

Materials

Sneaky Snack Game - The Sneaky Snack Game is a computer task which distinguishes between goal-directed and habitual behavior. Participants have to collect icecreams by responding to certain stimuli. This earns them points, for which the participants receive a financial reward. The game features eight different stimuli, all colored abstract symbols. These are a blue square, a yellow triangle, a pink circle, a red crown, a purple star, a green crescent moon, an orange flag, and a brown clover leaf. These are divided into two sets: the first four symbols are shown on top of a van, the other four on top of a scooter. Each stimulus is associated with one of four icecreams, a

(13)

stimuli. The relation between these stimuli and outcomes remains the same throughout the whole experiment. The value of the outcomes depends on the stimulus set: in the first set, two outcomes are always valuable and the other two are always non-valuable, while in the second set their values are reversed. Thus, in each stimulus set the participant must only respond to two stimuli. In every stimulus set, two stimuli are shown seven times as often as the other ones. These constitute the overtraining condition. Below, the eight stimuli and four outcomes included in the task are shown.

Figure 1. The eight stimuli included in the Sneaky Snack Game.

(14)

As was mentioned, the stimuli are presented in separate trials, each showing only one stimulus. Each trial starts with a two-dimensional representation of an empty, grey road which runs from the left to the right side of the screen. The trial scene is shown in the figure below. In the top right corner of the screen, there is a figure of a skater, also shown in a figure below. Then, the sound of a jingle is played, signaling to the participant that a stimulus will appear. Afterwards, a van or a scooter enters from the left side of the screen, on top of which is shown a stimulus (an abstract symbol) which signals that a certain outcome (a certain type of icecream) can be collected in the trial. The participant has 600 ms to respond to the stimulus by hitting on the spacebar, which moves the skater downwards to the road in front of the vehicle, where he collects the icecream it

transports. If the player responds too late, the skater moves down but does not collect the icecream. After 600ms, the icecream which corresponds to the stimulus featured in the trial is shown on top of the vehicle, regardless of whether the player collected it. The vehicle is at the middle of the screen at that point, and continues to move to the right of the screen for another 600 ms with the icecream on top of it, after which it disappears and the trial ends. If the player chose to collect the icecream, the icecream remains stationary from the point where the vehicle reaches the skater, while the vehicle continues to move. If the player responded to a stimulus with a valuable outcome, a green “+1” is shown and a “kaching” sound is played, indicating to the player that a point has been added to his total score. If he responded to a stimulus corresponding to a non-valuable outcome, a red “-1” is shown and a “horn” sound is played, indicating to the player that a point has been

subtracted from his total score. If the player did not respond or responded too late, nothing happens, and no points are added or subtracted. A counter keeping track of the current amount of points is shown at the top of the screen, informing the player about his current amount of points.

(15)

Figure 4. The trial screen.

The trials are administered in separate blocks, consisting of 32 trials, which include only one of the stimulus sets. Thus, in each block only two of the four outcomes are valuable. At the

beginning of each trial block, a “value instruction screen” is shown for 4 seconds, which gives information about which outcomes are currently valuable, and which outcomes are not. Non-valuable outcomes are shown at the left side of screen, encompassed by a red line, and with a minus-sign beneath them. Valuable outcomes are shown at the right side of screen, and are encompassed by a green line, with a plus-sign beneath them. After a trial block is finished, participants must indicate which stimuli correspond to which outcome by clicking first on the stimuli and then on the corresponding outcomes. Also, they must indicate how confident they are of their answer on a confidence bar. The purpose of this is to see if participants actually learn to associate the outcomes with the stimuli.

The participants play this version of the game for seven days. On the first day, the

participants visit the psychology lab at the University of Amsterdam, where the researchers install the Sneaky Snack Game on the laptop of the participant and instruct the participant about the goal of the game. They first play four trials of a demo version of the game, which features pizza slices instead of icecreams. This keeps the associative learning which occurs in the instruction phase distinct from that in the training phase. Then, the participants play ten trial blocks of the game, which means they play 320 trials. This takes them approximately 30 minutes. Afterwards, the participants are told they are finished for this day, and an appointment is made for the same day one week later. Then, the participants play the game at home for six days. Each day, they play ten trial

(16)

blocks.

On the final day of the experiment, participants come to the psychology lab again, where the researchers install the test phase version of the game on their laptops. In this part of the experiment, the goal is to see if participants can change their response to a stimulus when the value of the outcome corresponding to the stimulus is changed. The game is almost exactly the same as in the training phase, except for four changes. First, there are now four possible combinations of outcomes which can be featured in a trial block. Two of these are the same as in the training phase, while in the other two trial blocks two outcomes which were never valuable together in the training phase are now valuable together, while two outcomes which were never non-valuable together are now non-valuable together. This means that in these blocks, the participants have to respond to stimuli to which they had to withhold responding during the training phase, and that they have to withhold responding to stimuli to which they did have to respond during the training phase. Responses of which the outcome has the same value in the training phase and in the testing phase are called congruent responses, and responses which have different values are called incongruent responses.

Secondly, all stimuli are now featured in one trial block, which means that every outcome now corresponds to two different stimuli, one displayed on top of a van and one displayed on top of a scooter. As the stimuli which were displayed on the vans always had the opposite value of the stimuli displayed on the scooters in the training phase, the players again now have to respond to stimuli to which they had to withhold responding during the testing phase, and respond to stimuli to which they had to withhold responding during the training phase. Thus, even the trial blocks which feature the same combinations of valuable and non-valuable outcomes as in the training phase now include congruent and incongruent responses.

Thirdly, the game is now played under a condition of nominal extinction: participants do not receive any feedback anymore about whether their response was correct. Their current amount of points is not shown at the top of the screen, nor is it shown if a point is added or subtracted after a trial. They are, however, instructed that their performance is assessed and does contribute to their final score at the end of the experiment. Fourthly, the outcomes which correspond to the stimuli are not shown anymore at the end of the trial: the icecreams are now hidden by an advertisement.

Before the participants play the test version of the game, they are first instructed by the researchers about the differences between this version and the training version. And like the first day, the players first play four demo trials with pizza slices to show them what the game looks like. Then, they play four testing blocks of 32 trials each. At the end of a testing block, the four outcomes

(17)

are again shown, and the participant must indicate which outcomes were valuable during the block by clicking on them. After finishing the testing blocks, the participants are told that the experiment is now finished, and that they can go home after filling out three questionnaires.

Adult Routines Inventory - "Amount of RRBs" is assessed independently from the outcome devaluation task, by asking the participants to complete a self-report questionnaire measuring RRBs: the Adult Routines Inventory (ARI; Evans, Uljarevic & Lusk, 2016). The ARI consists of 55 items about self-perception of a large amount of RRBs. It includes both physical repetitive sensory motor behaviors/compulsions (RSMBC, e.g. "Do you chew non-edible objects (like pens or any other objects?)"), and mental rigidity/insistence on sameness (RIS, e.g. "Do you insist that certain activities need to take place at a certain time?"), which have been shown by the creators of the ARI to load on separate factors. Participants can indicate on a five-point Likert scale ranging from 0 to 4 to what extent the item applies to them, where the lowest score (0) stands for "Not at all / Never" and the highest score (4) for "Very much / Always". These scores can be added together, resulting in a total score ranging from 0 to 220. This score is interpreted as follows: the higher the score, the higher the self-perception of RRBs . The ARI has a Cronbach's alpha of 0.96, which means its consistency is excellent. Its convergent and discriminant validity and its test-retest reliability have also been shown to be very good. However, as the ARI has not been extensively used yet, it cannot be presumed that it has the same psychometric properties in the current population. Therefore, its internal consistency when administered in this research population has been independently verified.

Data-analysis

The Sneaky Snack Game allows investigating the relation between three categorical within-subject independent variables, i.e. outcome value, outcome congruence, and training amount, and one continuous dependent variable, the response rate of participants during the test phase. All these variables will be entered in a three-way repeated-measures ANOVA with a 2 x 2 x 2 design. This allows investigating the hypothesized main effect of value, two-way interaction between value and congruence, and three-way interaction between value, congruence, and amount of training.

For my own research, I added one more independent variable, amount of RRBs, and investigated the relation between the amount of RRBs a participant exhibits and his sensitivity to outcome devaluation. To this end, I have calculated a “devaluation sensitivity index” (DSI; Snorrason et al. 2016) score for each subject, by subtracting the response rate for incongruent

(18)

devalued outcomes from the response rate for congruent valued outcomes. This creates a measure for the balance between goal-directed and habitual behavior in a subject. A score of 1 is interpreted as complete intentional control over behavior, and a score of 0 or below as complete absence of such control. Then, I calculated the strength of the correlation between a participant's DSI and his score on the ARI in order to investigate how much ability to act goal-directed changes if amount of RRBs increases. One disadvantage of the DSI is that it only compares congruent valued and

incongruent non-valuable responses, while it does not take into account incongruent valued and congruent non-valuable responses. However, the ARI measures the propensity to exhibit a certain behavior without concern for its outcome value, not a propensity to inhibit a behavior without concern for its outcome value. This is reflected by the response types included in the DSI.

(19)

Results Sample characteristics

53 participants entered the study. Of this group, 46 completed the whole experiment. 6 participants were excluded because their response rate on incongruent valued responses was less than 25%, which was taken to mean they did not understand the task. One other participant was excluded because she admitted to cheating on the task. Thus, in the end, 39 participants were left for the analysis. This group consisted of 30 female participants and nine male participants. The mean age of the participants was 23, and ranged between 18 and 50 years.

Figure 5. Mean response rates for each different response type for each training day.

Response rates during the training phase

In the figure above, mean response rates during the training phase are shown for each

response type for each day. As the graph shows, response rates for the normal training condition and the overtraining condition markedly differed only during the first days that participants played the Sneaky Snack Game. Also, a clear difference is visible only in the case of valuable outcomes, while response rates for non-valuable outcomes were almost equal from the start. At the end of the

training phase, response rates show an almost perfect performance, with response rates for valuable outcomes approaching 100% and response rates near 0%. This means that during the training phase

(20)

participants succesfully learned to which stimuli they had to respond and to which stimuli they had to withhold responding.

Internal consistency of the ARI

To see whether the ARI is a valid measure for RRBs in the current research population the internal consistency of the ARI and its subscales were assessed. The ARI was found to have a very high internal consistency, Cronbach's alpha (CA) = 0.91. The RIS-subscale likewise had a very high internal consistency, CA = 0.9. The RSMBC-subscale had a somewhat lower, but still high internal consistency, CA = 0.83. This means that both the ARI and its subscales can be assumed to be reliable and internally consistent measures of an underlying construct.

Response rates during the test phase

A three-way repeated measures ANOVA was conducted to investigate the effect of outcome value, outcome congruence, and training amount on the response rate of participants in the test phase. All relevant assumptions were met, which means that the test may be interpreted. The results show that there was a significant main effect of outcome value, F(1,38) = 1963.4, p < 0.001. There was a higher response rate for valued responses than for non-valuable responses. This means that the first hypothesis can be accepted: participants were able to discriminate between valuable and non-valuable outcomes and respond or withhold responding accordingly.

Furthermore, there was a significant interaction effect between outcome value and outcome congruence, F(1,38) = 24.3, p < 0.001. Bonferroni corrected t-tests showed that this effect consisted of a significantly higher response rate for congruent valued responses than for incongruent valued responses, t(39) = 2.58, p = 0.01, and a significantly lower response rate for congruent non-valuable responses than for incongruent non-valuable responses, t(39) = 7.07, p < 0.001. These results have been represented graphically in figure 6 below. This means that the second hypothesis also can be accepted: when participants have to respond in the same way they always had to, they will make less mistakes in responding or withholding their response than if they have to respond differently.

(21)

Figure 6. Mean response rates and standard deviations for four different response types.

Thirdly, there was no significant interaction between outcome value, outcome congruence and training amount, F(1,38) <1. This means that there was no difference in response rate between the overtrained and normally trained versions of each response type. After a large amount of

training, participants did not make more congruent valuable or incongruent non-valuable responses, or less incongruent valuable or congruent non-valuable responses, than after a small amount of training. Therefore, the third hypothesis cannot be accepted. The effect of outcome congruence does not become stronger after a larger amount of training.

The relation between ARI-scores and test performance

A correlation test was conducted to investigate the relation between participant's scores on the ARI and its subscales, the RSMBC and the RIS, and participant's DSI score. All of these variables were normally distributed, ARI: Shapiro-Wilk (S-W) = 0.96, p = 0.172; RSMBC: S-W = 0.96, p = 0.1; RIS: S-W = 0.2, p = 0.57; DSI: S-W = 0.96, p = 0.09. This means that Pearson's correlation coefficient may be used. There was no significant relation between participant's ARI-scores and their DSI, r(39) = -0.104, p = 0.53, nor between their RSMBC-ARI-scores and their DSI, r(39) = -0.106, p = 0.52, nor between their RIS-scores and their DSI, r(39) = -0.105, p = 0.53.

(22)

Discussion

In this study, the relationship was investigated between outcome devaluation, overtraining and RRBs. The first hypothesis was that participants would be able to adjust their response to a stimulus to the value of the associated outcome. A main effect of outcome value was found, which means that this hypothesis was confirmed: when playing the Sneaky Snack Game, participants were able to discriminate between valuable and non-valuable outcomes and respond accordingly. Thus, they exhibited goal-directed behavior. The second hypothesis was that participants would more often correctly respond or withhold their response if the value of the outcome in the testing phase was the same as the value of the outcome in the training phase. An interaction effect was found between outcome value and outcome congruence, which means that this hypothesis was likewise confirmed: participants more often correctly responded or withheld responding when the value of an outcome was the same in the test phase and the training phase, and made more mistakes when the value of an outcome was different in the test phase and the training phase. Thus, participants also exhibited habitual behavior. This means that the Sneaky Snack Game is able to detect the effect of both current outcome value and earlier response knowledge on human behavior. It can be used to shown that people can adjust their behavior when the value of an action changes, but also that they will make more mistakes when the value of an action is different from what it was at a previous time. This is in accordance with a large amount of earlier research about the effect of outcome devaluation in both human and animal subjects.

The third hypothesis was that the difference between congruent and incongruent responses would be greater after a large amount of training than after a normal amount of training. However, contrary to expectations, no interaction effect was found between outcome value, outcome

congruence and amount of training. Thus, this hypothesis cannot be confirmed. This could mean that there is in fact no such effect among healthy human subjects. However, previous studies have provided support for the existence of an overtraining effect, both in animals (cf. Adams, 1982; Dickinson et al., 1995) and in humans (cf. Tricomi, Balleine & Doherty 2009). This might mean that in the current task overtraining is not correctly operationalized. Indeed, studies on overtraining do not always find an effect of training amount, which means that the occurrence and strength of this effect might depend on the way in which overtraining is established. For instance, in one fairly recent study (Jonkman et al. 2010) the duration of training had no effect on responding after outcome devaluation. One possible explanation offered by the authors was that responding was already controlled by a habitual process in the normal training condition, rendering any extra

(23)

training superfluous. Likewise, in the current study the effect of habit learning might already have reached a plateau in the normal training condition. In this case, participants might have responded differently after a smaller amount of training. As of yet, empirical support for such differential effects of different amounts of training is lacking: studies focussing on the effect of overtraining on sensitivity to outcome devaluation have mostly included just two levels of training intensity. However, such procedures have in fact been used before. For instance, Matyniak and Stettner (1970) investigated the effect of different levels of overtraining on the reversal learning capability of birds. This study included five levels of overtraining, ranging from 80 to 2000 trials, in which responses to one stimulus were reinforced and responses to another stimulus were not reinforced. Then, in the test phase, the values of the stimuli were reversed. The researchers found that the amount of errors the birds made after reversal of the values of two stimuli increased as a function of the amount of trials in which they participated during the training phase. Future research could investigate whether such a differential effect of training amount on response patterns exists in the case of humans by including more levels of training, both less and more extensive than the two conditions featured in this study. There were also other differences between this study and earlier research on the effect of overtraining in human subjects by Tricomi, Balleine and Doherty (2009). This earlier study used food as reward and achieved outcome devaluation through satiation, while in the current investigation the reward consisted of money and outcome devaluation was achieved by explicit instruction. Also, in this earlier study participants had to choose between two types of responses, of which one represented a choice to collect the outcome and another a choice not to collect the outcome, while in the current study In contrast,in the Sneaky Snack Game, participants only have one response option, and the choice not to collect the outcome was made by withholding their response. Because no other studies about overtraining in human subjects have been published, it is impossible to say whether these differences have caused the lack of an effect of overtraining in the current study. In future research, this could be investigated by including different rewards, devaluation procedures, and response methods.

The fourth hypothesis was that people with more RRBs would be less sensitive to outcome devaluation than people with less RRBs. This hypothesis could also not be confirmed: contrary to expectations, people with more RRBs did not respond to congruent valuable outcomes more often than people with less RRBs, nor did they fail to withhold responding to an incongruent non-valuable outcome more often. However, sensitivity to outcome devaluation has been shown in earlier research to be related to various disorders which are also associated with high amounts of

(24)

RRBs, such as OCD and ASD. This could mean there is in fact no relation between RRBs and habitual behavior. Alternatively, it could mean that such a relation only exists in clinical populations, and that the ARI is not sensitive enough to pick out differences between healthy subjects. However, the ARI was explicitly constructed as a dimensional scale which is valid at all levels of RRBs, and not just as an instrument to discriminate clinical from non-clinical levels (Evans, Uljarevic & Lusk, 2016). Likewise, Gillan et al. (2016) found compulsive behavior and related deficits in goal-directed control over action in a general population sample. A third

possibility is that RRBs are actually associated with habitual behavior, but that the sample size was too small so that the test had not enough statistical power to find any differences in behavior. In future research, more participants could be included in order to increase the statistical power of the study. Also, more participants with extreme scores on the ARI could be included to assess if a relation exists at higher levels of variance. A fourth and last possibility is that RRBs are in fact driven by overreliance on habits, but that people who perform many RRBs do not suffer from impaired goal-directed control of behavior with regard to other actions not related to these RRBs. Likewise, Gillan et al. (2011) have already noted that people with OCD do not develop compulsions in every aspect of their lives, but only in the case of specific obsessions. Likewise, Geurts and de Wit (2014) argue that although they did not find a general overreliance on habits in children with ASD, it still might be the case that such an overreliance exists in the case of outcomes which are directly related to the very specific interests of such children. In future research, more personally relevant stimuli and outcomes may be used by asking people with RRBs about the specific object of such actions.

In conclusion, this study shows that the Sneaky Snack Game is able to manipulate the responses of participants by changing the value of outcomes, and thus is a valid means of assessing the effect of outcome devaluation. Also, it shows that the ARI, a questionnaire which has not been widely used in published studies yet, has a very high internal consistency when administered in this research population. However, it was unable to provide proof for the presence of an overtraining effect among human subjects, nor was it able to show a relation between RRBs and sensitivity to outcome devaluation.

(25)

References

Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77-98.

Adams, C.D, & Dickinson, A. (1981). Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology, 33, 109–122.

Alvares, G., Balleine, B., Whittle, L., & Guastella, A. (2016). Reduced goal-directed action control in autism spectrum disorder. Autism Research, 9, 1285-1293.

American Psychiatric Association. (2013). Diagnostic and Statistical Manual, 5th edition (DSM-V). Arlington: American Psychiatric Publishing.

Aristotle. (350BC). Nicomachean Ethics. Retrieved from: http://classics.mit.edu/Aristotle/nicomachaen.html

Asperger, H. (1944). Die "Autistischen Psychopathen” im Kindesalter. European Archives of Psychiatry and Clinical Neuroscience, 117, 76-136.

Bargh, J. A. (1994). The four horsemen of automaticity: Awareness, intention, efficiency, and control in social cognition. In R. S. Wyer & T. K. Srull (Eds.), Handbook of Social Cognition, Vol. 1: Basic Processes. Hillsdale: Erlbaum.

Colwill, R., & Rescorla, R. (1985). Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes, 11, 120-132. Corbit, L. H., & Janak, P. H. (2016). Habitual alcohol seeking: Neural bases and possible relations

to alcohol use disorders. Alcoholism, 40, 1380-1389.

De Wit, S, Barker, R.A., Dickinson, T., & Cools, R. (2011). Habitual versus goal-directed action control in Parkinson’s disease. Journal of Cognitive Neuroscience, 23, 1218–1229. De Wit, S., & Dickinson, A. (2009). Associative theories of goal-directed behaviour: a case for

animal–human translational models. Psychological Research PRPF, 73, 463-476. De Wit, S., Niry, D., Wariyar, R., Aitken, M., & Dickinson, A. (2007). Stimulus-outcome

interactions during conditional discrimination learning by rats and humans. Journal of Experimental Psychology: Animal Behavior Processes, 33, 1–11.

Delorme, C., Salvador, A., Valabrègue, R., Roze, E., Palminteri, S., Vidailhet, M., ... & Worbe, Y. (2016). Enhanced habit formation in Gilles de la Tourette syndrome. Brain, 139, 605-615 Dezfouli, A., Lingawi, N., & Balleine, B. (2014). Habits as action sequences: hierarchical action control and changes in outcome value. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 369.

(26)

Dickinson A. (1985) Actions and habits: the development of behavioral autonomy. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 308, 67-78.

Dickinson, A., Balleine, B., Watt, A., Gonzalez, F., & Boakes, R. (1995). Motivational control after extended instrumental training. Animal Learning and Behavior, 23, 197–206.

Ersche, K.D., Gillan, C.M., Jones, P.S., Williams, G.B., Ward, L.H.E, Luijten, M., ... & Robbins, T. W. (2016). Carrots and sticks fail to change behavior in cocaine addiction. Science, 352, 1468-1471.

Georgeff, M., Pell, B., Pollack, M., Tambe, M., & Wooldridge, M. (2003). The belief-desire-intention model of agency. Lecture Notes in Computer Science, 1555, 1-10.

Geurts, H., & De Wit, S. (2014). Goal-directed action control in children with autism spectrum disorders. Autism, 18, 409-418.

Gillan, C. M., Papmeyer, M., Morein-Zamir, S., Sahakian, B. J., Fineberg, N. A., Robbins, T. W., & de Wit, S. (2011). Disruption in the balance between goal-directed behavior and habit learning in obsessive-compulsive disorder. American Journal of Psychiatry, 168, 718-726. Gillan, C. M., Morein-Zamir, S., Urcelay, G. P., Sule, A., Voon, V., Apergis-Schoute, A. M., ... &

Robbins, T. W. (2014). Enhanced avoidance habits in obsessive-compulsive disorder. Biological Psychiatry, 75, 631-638.

Gillan, C., Kosinski, M., Whelan, R., Phelps, E., & Daw, N. (2016). Characterizing a psychiatric symptom dimension related to deficits in goal directed control. eLife, 5, e11305.

Gordon, R. M. (1986). Folk psychology as simulation. Mind & Language, 1, 158-171.

Hogarth, L., Chase, H. W., & Baess, K. (2012). Impaired goal-directed behavioural control in human impulsivity. Quarterly Journal of Experimental Psychology, 65, 305-316.

Jonkman, S., Kosaki, Y., Everitt, B. J., & Dickinson, A. (2010). The role of contextual conditioning in the effect of reinforcer devaluation on instrumental performance by rats. Behavioural Processes, 83, 276-281.

Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217-250.

Knobe, J. (2006). The concept of intentional action: A case study in the uses of folk psychology. Philosophical Studies, 130, 203-301.

Leekam, S. R., Prior, M. R., & Uljarevic, M. (2011). Restricted and repetitive behaviors in autism spectrum disorders: a review of research in the last decade. Psychological Bulletin, 137, 562-593.

(27)

overtraining. Psychonomic Science, 21, 308-309.

Skinner, B. F. (1938). The Behavior of Organisms: An Experimental Analysis. New York: Appleton-Century.

Snorrason, I., Lee, H. J., de Wit, S., & Woods, D. W. (2016). Are nonclinical obsessive-compulsive symptoms associated with bias toward habits? Psychiatry Research, 241, 221-223.

Solway, A., & Botvinick, M. (2012). Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychological Review, 119, 120-154.

Stich, S. (1978). Autonomous psychology and the belief-desire thesis. The Monist, 61, 573-591. Thorndike, E. L. (1905). The Elements of Psychology. New York: A. G. Seiler.

Thorndike, E. L. (1911). Animal Intelligence: Experimental Studies. New York: Macmillan.

Tricomi, E., Balleine, B., & O’Doherty, J. (2009). A specific role for posterior dorsolateral striatum in human habit learning. European Journal of Neuroscience, 29, 2225-2232.

Weber, M. (1922). Economy and Society. Retrieved from:

https://archive.org/stream/MaxWeberEconomyAndSociety/

Wood, W., & Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychological Review, 114, 843-863.

Referenties

GERELATEERDE DOCUMENTEN

Therefore, in this chapter we did not present a whole new way to compute traffic statistics required by dimensioning formula of Equation (1.2), but we propose to investigate

At higher source intensities the maximum impact energy increases, while the size of the critical area in which ions are strongly accelerated decreases.... sputter 1 nm of the

Using workflow technology has resulted in effort savings of 44.11% for the change imple- mentation in the first run (fluctuating between 16.29% and 56.45%). In the second run, the

Problem-based and project-based learning offer an essential basis, so that students feel ownership on their learning and bring a closer similarity between learning and real

Table 12 illustrates that the half yearly Optimistic Hurwicz criterion strategy shows the best effective interest rate of 1.46% per month and the effective interest rate

Aanrijdingen met damherten en reeën in de omgeving van de Manteling Vanaf januari 1998 is door de WBE Manteling van Walcheren nauwkeurig bijgehouden waar en wanneer

The results confirmed the expected relation between the market value (measured using the market price to book ratio) and the credit rating, as well as relations between the CR

The fact that monophyletic lineages ex- ist within each of these three clades (some of them picked out by roun- ded squares in Zachos and Lovari’s (2013) Fig. 1) is irrelevant: