• No results found

Investigating the sources of expectations using fMRI decoding analyses for real-world object images

N/A
N/A
Protected

Academic year: 2021

Share "Investigating the sources of expectations using fMRI decoding analyses for real-world object images"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Investigating the sources of expectations using fMRI decoding

analyses for real-world object images.

Jennifer Fielder

October 2020

Research Project 1

MSc (Research) Brain and Cognitive Sciences

University of Amsterdam

Student Number: 12773913

Research Group: Cognition and Plasticity Laboratory, Vrije Universiteit Amsterdam Daily Supervisor: Chris Jungerius MSc

Internal Assessor: prof. dr. Heleen Slagter UvA Examiner: dr. Steven Scholte

(2)

2 Abstract

A predictive processing framework argues that the contents of perception is determined by both top-down expectations and bottom-up stimuli information. The brain regions implicated as the sources of these top-down expectations for real-world object images and whether their involvement is affected by attention has not yet been researched. Prior research has found that the hippocampus and dorsal striatum may act as the sources of expectations, yet this research is limited to abstract shapes or gratings and does not

manipulate attention. To study the brain regions associated with the sources of expectations for more ecologically valid stimuli, this study analyses publicly available fMRI data for a task that generates expectations of real-world object images and manipulates attention (Richter & de Lange, 2019a). Decoding analyses were performed for both attended and unattended trials to 1) decode the object identity of a second image that was either expected or unexpected, and 2) decode expectation information (whether a trial was expected or unexpected, thus collapsing across object identity), from visual regions, hippocampus and dorsal striatum. Decoding was not possible from any of these regions in either attention condition. These results could be due to differences in expectation representation for more naturalistic stimuli compared to prior research using abstract shapes or gratings. It is also possible there were methodological limitations regarding the training data used for the decoding model, time-scale of the fMRI session, task constraints, or an error in the analysis pipeline.

(3)

3 Introduction

A predictive processing framework regards perception as an active process whereby bottom-up stimuli information is met by top-down expectations to jointly determine the contents of perception (de Lange et al., 2018; Friston, 2005; Friston, 2010; Hohwy, 2013; Kok & de Lange, 2014). Expectations have been shown to bias perception when sensory input is ambiguous (Chalk et al., 2010; Kok et al., 2013; Sterzer et al., 2008). On a

behavioural level, expectations can also make perception more efficient. For example, when stimuli are validly predicted they are detected faster (Stein & Peelen, 2015; Pinto et al., 2015) and more accurately (Wyart et al., 2012). Expectations may be formed via the process of statistical learning, a process that occurs automatically and without awareness in which statistical regularities are extracted from the environment (Brady & Olivia, 2008; Turk-Browne et al., 2009). Where there is a mismatch between expectations (top-down) and

stimuli information (bottom-up) a prediction error is propagated forward (Friston, 2010). This is computationally efficient, since it is only the error is passed on. There are also feedback connections that carry predictions to update the generative model over time, at all levels of the hierarchy. The model thus changes to fit with perceptual input through a learning process. Perceptual inference proceeds as the expected (top-down) and observed (bottom-up)

information are reconciled iteratively at each step of the processing hierarchy. This leads to a reduction of prediction error at each step until a single perceptual interpretation of the sensory input is settled upon (Hohwy, 2013). This study focusses on the top-down effect.

Neural Response to Expectations

On a neural level, expectations lead to a reduced neural response to expected stimuli compared to unexpected stimuli, termed ‘expectation suppression’ (de Lange et al., 2018). This has been shown across modalities (Kok et al., 2012; Parras et al., 2017; Todorovic & de Lange, 2012) and using both non-invasive techniques in humans such as fMRI (Alink et al., 2010; Kok et al., 2012; Egner et al., 2010) and EEG/MEG (Todorovic & de Lange, 2012; Wacongne et al., 2011) as well as electrophysiology in non-human primates (Kumar et al., 2017; Meyer & Olson, 2011; Schwiedrzik & Freiwald, 2017) and rodents (Rummell et al., 2016). Within a predictive processing framework, there are two proposed mechanisms to explain how top-down expectations modulate sensory cortex resulting in stimulus-specific neural suppression. The dampening account posits that neurons most strongly tuned to the

(4)

4 because expected information that is ‘uninformative’ is suppressed. Conversely, the

sharpening account posits that neurons encoding the unexpected stimulus are suppressed, resulting in a sharper more selective response (de Lange et al., 2018). In line with the sharpening account, evidence has shown that expectations facilitate a stimulus-specific template before the stimulus is shown. Kok et al. (2014) paired specific auditory tones with a visual grating of a specific orientation. Using fMRI they found that prior expectation of a visual stimulus (just playing a tone) evoked a pattern of activity in V1 similar to the pattern of activity evoked by the actual stimulus. This suggests that the expectation led to pre-stimulus activity changes in early visual cortex and suggests a pre-stimulus template may be set up. Despite the low temporal resolution of fMRI, these findings were replicated by Kok et al. (2017) who used MEG and multivariate decoding techniques. They found that the auditory expectation cues induced orientation-specific neural signals, already present 40ms before stimulus presentation and continued into the post-stimulus period. This further suggests that expectations induce a activation of stimulus templates. Notably, the strength of the pre-stimulus expectation templates correlated with behavioural improvement in distinguishing between grating orientations, further reflecting a sharpening of the representation.

Evidence for expectation facilitating pre-stimulus templates extends beyond the use of gratings or arbitrary shapes to more ecologically valid stimuli such as images of real-life objects. Puri et al. (2009) used images of faces and houses. Participants were cued with what to expect (the word ‘face’ or ‘house’) and an image of a face or a house was then shown. They found that expectation of faces and houses resulted in category selective increases in pre-stimulus activity in the fusiform face area (FFA) and parahippocampal place area (PPA) respectively. A limitation of the evidence from fMRI studies, however, is that it cannot establish causal evidence linking pre-stimulus activity with expectation effects. Gandolfo and Downing (2019) addressed this limitation by using fMRI-guided online transcranial magnetic stimulation (TMS). They used stimuli of body and scene images and investigated the

extrastriate body area (EBA) and occipital place area (OPA). Before the target image, a verbal cue validly or invalidly predicted whether the image would be a body (‘m’ or ‘f’ to represent a body gender) or a scene (‘garden’ or ‘kitchen’). After the target image, a binary judgement about the target image was required (‘heavy’ or ‘slim’ for the body task, ‘inverted’ or ‘upright’ for the scene task). Without TMS, they found that valid verbal cues improved the efficiency (speed and accuracy) of judgments about the target image. They then applied TMS during the cue period to either EBA or OPA. They found a double dissociation. When TMS

(5)

5 was applied to EBA, valid predictions showed no enhancement of judgements, but only in the body task. When TMS was applied to OPA, there was no behavioural enhancement from valid cues, but only in the scene task. This provides causal evidence that perceptual expectations are linked to the selective activation of brain regions that encode the target. Source of Expectations

The studies mentioned so far, however, do not investigate the source of these expectations. Expectations can be learned rapidly and are very flexible, since they can be highly context specific (de Lange et al., 2018). On a neural level, therefore, the hippocampus is ideally suited to subserve sensory prediction, since it has bidirectional connectivity with all sensory cortices (Lavenex & Amaral, 2000) and has been shown to be involved in rapidly learning associations between arbitrary stimuli (Schapiro et al., 2012). Furthermore, the hippocampal mechanism of pattern completion subserved by the CA3 and CA1 hippocampus subfields, enables items to be retrieved from memory from a partial cue and reinstated in sensory cortex (Bosch et al., 2014; Gordon et al., 2014; Ji & Wilson, 2007).

Hindy et al. (2016) investigated pattern completion and predictive coding in the hippocampus and early visual cortex using fMRI and multivariate pattern analysis. Participants were shown a visual cue and pressed either a left or right button which determined the likelihood of the second shape’s appearance. They hypothesised that the hippocampus would be more involved in pattern completion (decoding the sequence of cue + action + outcome from each cue) and the visual cortex would be more involved in predictive coding (decoding the outcome of two sequences). They found that the outcome decoding performance (predictive coding) was significantly better than chance in the visual cortex, but not in the hippocampus. In contrast, they found sequence decoding performance (indicative of pattern completion) was significantly better than chance in the hippocampus but not in the visual cortex. This dissociation suggests that memory-based expectations in the visual cortex are related to pattern completion in the hippocampus.

Kok and Turk-Browne (2018) also investigated the role of the hippocampus in predictive processing using fMRI and multivariate pattern analysis. Unlike Hindy et al. (2016), the participants did not perform an action but learnt associations between a certain auditory tone and a shape that appeared afterwards (75% of the time). The participants then saw a second shape and were asked if the two shapes were the same or different. They used decoding models of the BOLD signal to reconstruct the presented shapes in both validly and

(6)

6 invalidly predicted trials. They found that in the primary visual cortex, representations were dominated by the shape presented. In contrast, in the hippocampus the representations were dominated by the predicted shape rather than the presented shape. This dissociation further supports the notion that the hippocampus is involved in cross-modal predictions.

Another candidate region as the source of expectations is the dorsal striatum, including the caudate and putamen. Through the striatum, the basal ganglia receives input from most cortical areas (Hintiryan et al., 2016) which feed back to the cortex via recurrent pathways (Redgrave et al., 2010), making it a plausible region for shaping cortical activity based on behavioural experience (Stalnaker et al., 2019). Indeed, the caudate and putamen have been implicated in associative learning and prediction (Poldrack et al., 2001; den Ouden et al., 2009; Turk-Browne et al., 2008; Shohamy & Turk-Browne, 2013). The study by Kok and Turk-Browne (2018) above also found that the caudate represented the predicted shape but not the presented shape (but found no effect found in the putamen). Regarding the putamen, Hindy et al. (2019) found that after a 3-day delay, prediction actions enhanced background connectivity of the hippocampus with the putamen as well as the early visual cortex, suggesting that the putamen could be a region of interest in regards to the source of predictions.

There are still open questions regarding the source of predictions. First, are the hippocampus and dorsal striatum implicated as a source of predictions for more ecologically valid stimuli? This is important to establish, firstly because object images are a better

representation of the seen and expected visual stimuli in the real-world, and secondly, because it is feasible there could be a difference compared to more simple stimuli such as gratings. Kok et al. (2019) compared neural responses in the hippocampus when viewing and expecting gratings compared to complex abstract shapes (defined as having conjunctions of features such as oriented lines that cannot be reduced to the sum of their parts, for example a triangle cannot be reduced to just the three separate lines). They found that for complex shapes, the hippocampus represented which shape was expected, but for the gratings, it represented only unexpected gratings (more like a prediction error). Since object images are more representative of complex shapes than gratings, this suggests the hippocampus may be a source of expectations for more ecologically valid stimuli too, but this needs to be tested.

(7)

7 Attention and Expectation

Another remaining question from the literature discussed so far is how attention influences expectation. In the studies discussed, only one stimulus was shown at one time (whether this was a face, a grating, or a complex shape). Attention was therefore never disengaged by other stimuli and so the effects of attention on expectation were not tested. This is important to manipulate experimentally because it is possible that selective attention may play a role in some tasks but is not explicitly measured. For example, in the study by Gandolfo and Downing (2019) that measured the effect of cuing for a body or a place, the verbal cue to predict the body task represented the gender of the target body. It is therefore possible that for the target images, attention was directed to body parts necessary for the binary judgement of ‘heavy’ or ‘slim’ depending on the body gender, but this was never measured. Addressing this gap in the literature, Richter and de Lange (2019a) studied how attention affects the neural response of expectations using images of real-life objects (e.g. a chair, a clock). Attention was manipulated by either requiring the object to be categorised as electronic or non-electronic (requiring the object to be attended) or for a character in the centre of the image to be categorised as a letter or non-letter (object therefore unattended). They investigated the ventral visual stream, including early visual cortex (V1 and V2), object-selective lateral occipital cortex (LOC), and temporal occipital fusiform gyrus

(TOFC). They found that object identity could be decoded in both attention conditions in the ventral visual stream, suggesting that the visual information was present in these ROIs in both conditions. Regarding the effects of expectation, they found expectation suppression in the ventral visual stream, but only when objects were attended. This builds on the previous literature by showing how the integration of prior knowledge with sensory information depends on attention. A remaining question is whether attention influences expectation effects in other brain regions than the ventral visual stream. Specific to this study, are the regions implicated as the sources of expectations (hippocampus, dorsal striatum) affected by attention, and if so, how? This project uses previously collected fMRI data from the Richter and de Lange (2019a) study to investigate these questions.

The Present Study

The overarching research question of this study is whether hippocampal and dorsal striatal regions are implicated as sources of visual predictions for more ecologically valid stimuli (object images) and whether this is affected by attention. The regions of interest are

(8)

8 the ventral visual stream, including early visual cortex (V1 and V2) and object-selective lateral occipital cortex (LOC); the hippocampus, including investigating the CA1 and CA3 subfields separately due to their involvement in pattern completion; and the dorsal striatum, including the caudate and putamen. Participants learned pairs of object images and completed a task requiring information about the trailing (second) image to be categorised when

attention was either engaged or disengaged. The trailing image was either expected or

unexpected, depending on the learned pair. In this study, decoding models are used to decode the trailing object identity from all our ROIs, and whether this is affected by attention. To test whether decoding the trailing object depends on whether the trial was expected, these

decoding models will also be run on expected and unexpected trials separately. Finally, to test whether expectation information is represented in our ROIs, a decoding model is used to decode whether a trial was expected or unexpected (regardless of object identity).

It is hypothesised that the trailing object identity will be decoded from visual cortex and LOC in both attention conditions. This is because Richter and de Lange (2019a) report that object identity could be successfully decoded when the object was both attended or unattended (despite finding that expectation suppression in the ventral visual stream only when objects were attended). Regarding the effects of expectation on decoding object

identity, it is hypothesised that decoding accuracy will be higher in the visual cortex and LOC for expected trials compared to unexpected trials. This is because previous literature suggests that more information about the target stimulus will be present in expected trials due to the category-selective activity for expected targets (reflecting a pre-stimulus template).

Regarding the hippocampal and dorsal striatal regions, it is hypothesised that second object identity will be decoded from these regions but only when it is expected (since Kok and Turk-Browne, 2018, found the predicted object could be decoded from hippocampus and caudate). If this is possible in the hippocampus and dorsal striatum, it suggests they are a source of visual expectations. Regarding the effect of attention, it is possible that in the hippocampus and dorsal striatum, object identity will only be decoded from attended expected trials since Richter and de Lange (2019a) found expectation suppression only on attended trials. On the other hand, this expectation suppression was in sensory cortex, and so it is possible that attention may not have the same effect on expectation in hippocampal or dorsal striatal regions, since these represent the sources of expectations rather than sensory cortex that is modulated by expectation. For the final analysis that asks whether it is possible to decode expectation information from our ROIs, it is hypothesised that whether a trial was

(9)

9 expected or unexpected will be decoded from the visual ventral areas, since these areas are modulated by expectation. It is hypothesised that this would only be in the attended

condition, since Richter and de Lange (2019a) found expectation effects in the ventral visual stream for the attended condition only. On the other hand, it is also possible that expectation may not be decoded from the visual areas, since sensory cortex is modulated by expected stimuli, but may not represent the expectation itself. Regarding the hippocampus and dorsal striatum, it is hypothesised that decoding expectation will be possible in these regions, since the hippocampus and dorsal striatum are proposed to represent expectation information. The effect on attention, however, is less clear.

Materials and Methods Data

The present study analyses data provided by the Donders Institute for Brain,

Cognition and Behaviour, Radboud University Nijmegen (Richter & de Lange, 2019b). Full methods of the original study are published in Richter and de Lange (2019a). In the original study, eye tracking data was also collected inside the MRI scanner to measure pupillometry. Since this is not relevant to the research questions of the current paper, this will not be included further and only methods relevant to the present study are included here. Participants

Data analysed in this study are from 34 healthy, right-handed participants (25 females, age 24.9 ± 4.8 years, mean ± SD). An additional four participants took part in the original study but were excluded from analysis (three due to excessive motion during MRI scanning and one due to incomplete data). Recruitment procedures and ethical approval were obtained by Radbound University Nijmegen (Richter & de Lange, 2019a).

Stimuli and Experimental Paradigm

The study was conducted over two consecutive days with the learning session on day one (approximately 60 minutes) and the fMRI session on day two (approximately 150 minutes in total, broken down into 2 x 15 minute behavioural sessions, 90 minutes in the scanner, plus 30 minutes preparation time). Different tasks were conducted on each day, but the stimuli for each participant was the same on both days.

(10)

10 Day 1 - Learning Session

Each trial in the learning session consisted of two images of objects that were presented immediately after each other (no interstimulus interval) for 500ms each. The first image is referred to as the leading image and the second image as the trailing image. There was an intertrial interval of 1000-2000ms (see figure 1). Each participant saw 24 different object images (12 leading, 12 trailing) which were randomly selected from a set of 64 object stimuli taken from Brady et al. (2008) that were clearly electronic or non-electronic in nature (stimuli can be found here: https://osf.io/36te7/, see figure 2 for examples).

There were 12 image pairs during the learning session, and the probability of the trailing image appearing given the leading image was 1. The leading image was therefore perfectly predictive of the identity of the trailing image. Participants were made aware of these regularities. There were also 12 leading characters and 12 trailing characters (randomly assigned, see figure 2 for characters used), but their occurrence was randomised and therefore unpredictable. There was a fixation bull’s-eye present in the centre of the screen for the entire trial (outer circle 0.7° visual angle). Within the inner circle of the bull’s-eye (0.6° visual angle) a character (letter or symbol) was presented ( ̴ 0.4° visual angle). Participants were instructed they could ignore these characters, but to maintain fixation on the bull’s-eye. The same 24 images and 24 characters were used for the learning session and the fMRI session per participant. Images spanned 5° x 5° visual angle on a mid-grey background. The mean relative luminance of the object images was calculated per image by Richter & de Lange (2019a). The stimulus images were converted from sRGB to linear RGB and the relative luminance was calculated for all pixels (where relative luminance Y = 0.2126*R + 0.7152*G + 0.0722*B; Stokes et al., 1996). The average relative luminance of the stimulus set was 0.225 and the relative luminance of the mid-grey background shown during the ITI was 0.216, whereby 0 would be a completely black image and 1 would be completely white. During the learning session stimuli were presented on an LCD screen (BenQ XL2420T, 1920 x 1080 pixel resolution, 60 Hz refresh rate).

Learning session task. On 20% of trials (randomly determined) one of the images (either leading or trailing) was presented upside-down. Participants were instructed to press a button as soon as they detected the upside-down image. Because the upside-down image was randomly determined this was unpredictable, therefore the known predictiveness of the trailing image following the leading image was not task relevant. On the 80% of trails that

(11)

11 did not have an upside-down image, no response was required. Each of the image pairs

occurred 80 times during the whole session. Each participant therefore performed 960 trials split into four runs with a short break between runs.

Figure 1. Top – learning session (day 1) during which the leading image identity was

perfectly predictive of the trailing image identity. On 20% of trials the trailing image was presented upside down and participants had to identify when the this occurred. Bottom – fMRI session (day 2) during which the probability of the trailing image occurring after its paired leading image was 50%. In object categorisation blocks (objects attended) participants identified if the trailing object was electronic or non-electronic. In character categorisation blocks (objects unattended) participants identified if the trailing character was a letter or non-letter. Speed and accuracy were both emphasised for responses. Note – in the fMRI session 6 image pairs were used for the object categorisation task and 6 different image pairs were used for the character categorisation task (schematic above is for illustrative purposes only).

(12)

12

Figure 2. Stimuli examples of the types of images and characters shown to participants.

Day 2 - fMRI Session

Procedure. Participants first performed an additional 240 trials of the same upside-down task from the learning session to refresh the learned object image associations. The fMRI session included an object categorisation task, a character categorisation task, and a localiser. The procedure was as follows. First there was a brief practice run (of either the object categorisation or character categorisation task, counterbalanced across participants, consisting of 50 trials lasting ̴ 5 minutes) during which the anatomical image was acquired. After the practice run, two runs of that same task were performed. Each run consisted of 120 trails and seven null events of 12 seconds ( ̴ 14 minutes). A practice run of the other task followed (object or character categorisation, whichever was not completed first), followed by two full runs. Participants then performed a functional localiser. Finally, participants

completed a pair recognition task outside the scanner to assess the learning of the object pairs before being fully debriefed. During MRI scanning, stimuli were back-projected (EIKI LC-XL100 projector, 1024 x 768 pixel resolution, 60 Hz refresh rate) on an MRI-compatible screen, visible using an adjustable mirror mounted on the head coil.

The original set of 12 image pairs from the learning session (12 leading, 12 trailing images) was split into two sets of 6 image pairs. One of these sets was used for the object categorisation task and the other for the character categorisation task. Each set contained

(13)

13 three electronic and three non-electronic objects as trailing and leading images, ensuring an equal base rate of both categories. In the fMRI session, the probability of the expected trailing image appearing after its paired leading image was 0.5 (unlike the learning session which had a probability of 1). In the remaining 50% of trials, one of the five other trailing images appeared. Each unexpected image occurring after a certain leading image therefore had a probability of 0.1. In both the object categorisation and character categorisation tasks, characters were displayed in the central bull’s eye. Like the object images, six characters were assigned as leading characters and six as trailing characters (three letters and three non-letters in each set) to each task. The tasks during the fMRI session had longer intertrial intervals (4000-6000ms, randomly sampled from a uniform distribution) compared to the learning session.

Object categorisation task (objects attended). This task required participants to categorise the trailing object on each trial as electronic or non-electronic. Accuracy and speed were both emphasised. For this task it was therefore beneficial to be able to predict the

identity of the trailing object using the learned associations. Feedback on behavioural performance was provided at the end of each run. Before performing the task, ‘electronic’ was explained to be any object that contains any electronic components or requires electricity to be used. Participants’ understanding was tested before entering the MRI scanner by

verbally categorising and naming each object on screen to the experimenter.

Character categorisation task (objects unattended). Trials were identical to the object categorisation tasks but participants were required to categorise the trailing character on each trial as a letter or non-letter. The identity of the character was not predictable. The purpose of this task was to be as similar to the object categorisation task as possible, but drawing attention away from the object identity and towards the characters without imposing a heavy attentional or cognitive load. Both tasks were designed for performance to be at ceiling level. Feedback on behavioural performance was provided at the end of each run.

Localiser. The localiser was run to obtain a functionally defined object-selective LOC mask for each participant and to obtain data from an fMRI run that was independent from expectations used to constrain ROI masks to the most informative voxels for object images. The localiser consisted of repeated presentation of the previously seen trailing images and their phase-scrambled version (see figure 3 for examples). Stimuli were shown for 12s at a time, flashing at 2 Hz (300ms on, 200ms off). To ensure participants stayed engaged, the

(14)

14 middle of the fixation bull’s eye would sometimes dim and participants were required to press a button as soon as they detected this dimming. Each trailing image was presented six times and a phase-scrambled version of each image was presented three times. Twelve null events, each with a duration of 12s were also presented. The presentation order was

randomised but direct repetition of the same image was not allowed, and each trailing image once preceded and once followed a null event.

Figure 3. Example object images (top) with their phase-scrambled versions (bottom) shown

in the localiser.

fMRI data acquisition

Anatomical and functional images were acquired on a 3T Prisma scanner (Siemens, Erlangen, Germany), using a 32-channel head coil. Anatomical images were acquired using a T1-weighted magnetisation prepared rapid gradient echo sequence (MP-RAGE; GRAPPA acceleration factor = 2, TR/TE = 2300/3.03ms, voxel size 1 mm isotropic, 8° flip angle). Functional images were acquired using a whole-brain T2*-weighted multiband-6 sequence (time repetition [TR]/time echo [TE]=1000/34.0ms, 66 slices, voxel size 2 mm isotropic, 75° flip angle, A/P phase encoding direction, FOV = 210mm, BW = 2090 Hz/Px). The first five volumes of each run were discarded to allow for signal stabilisation.

Data Analysis Univariate Analysis

Pre-processing. Data were pre-processed and analysed using FSL 5.0 (FMRIB Software Library; Oxford, UK; Smith et al., 2004). Pre-processing included brain extraction

(15)

15 (BET), motion correction (MCFLIRT), spatial smoothing (using a Gaussian kernel, full-width at half-maximum of 5mm), and high pass temporal filtering (128s). Functional images were registered to a standard anatomical image (MNI152 T1 2mm template) using BBR (Boundary-Based Registration) from FSL FLIRT (Jenkinson & Smith, 2001; Jenkinson et al., 2002) using linear registration with 12 degrees of freedom. Analyses was performed using FEAT (FMRI Expert Analysis Tool) version 6 (part of FSL; Woolrich et al., 2001).

Localiser analysis. fMRI data from the localiser task were analysed to functionally define the lateral occipital cortex (LOC) to clusters that showed significant preference for object stimuli compared to phase-scrambled images. This method followed the Richter and de Lange (2019b) pipeline and used their scripts. The first-level analysis modelled the time-series data based on general linear modelling (GLM). The two regressors of interest were one that modelled intact objects and another that modelled scrambled objects. A regressor of no interest included one that modelled the instruction screens. The first temporal derivatives of these three regressors (objects, scrambled objects, instructions) were added to the GLM. Additionally, 24 motion regressors (FSL’s standard and an extended set of motion

parameters) were added to account for head motion. This included the six standard motion parameters, the squares of the six motion parameters, the derivatives of the standard motion parameters and the squares of the derivatives. Regressors were convolved with a double-gamma HRF. Higher-level analyses for the localiser were then run to investigate brain regions selective for object images versus scrambled images across participants. This used the first-level data from the individual subjects and used FSL's Local Analysis of Mixed Effects (FLAME 1, part of FSL; Woolrich et al., 2004).

Multi-Voxel Pattern Analysis (MVPA) - Searchlight

This followed the pipeline and scripts of Richter and de Lange (2019b). Decoding analyses of the localiser fMRI data was used to find voxels that represented the object images. The first step was to use least squares single (LSS) method which has been

introduced for fast event-related designs (Mumford et al., 2012). It fits a separate GLM for each trial (regressor of interest) with the remaining trials as a regressor of no interest to calculate the activity associated with that trial. This provided single-trial parameter estimates for each voxel (not spatially smoothed). These parameter estimate maps were then used to decode the 12 trailing images seen in the localiser (12 classes). The decoding used a linear support vector machine (SVM) (from SVC function in Scikit-learn, Pedregosa et al., 2011)

(16)

16 and stratified 4-fold cross-validation (from the “model_selection” module in Scikit-learn, Pedregosa et al., 2011). The analysis was performed across the whole-brain using a

searchlight (6mm radius). The output of this analysis was the average decoding of each voxel when it is part of the searchlight. These were used to constrain the ROI masks to the most informative voxels representing the object images (see Constraining ROI Masks).

ROI Definition

Anatomically defined ROIs. (V1, V2, caudate, putamen, hippocampus, CA1 and

CA3 hippocampal subfields). Each participant’s T1-weighted anatomical scan was segmented

using FreeSurfer 6.0.0 (using the “recon-all” command for cortical reconstruction; Fischl, 2012). V1 and V2: This method followed the pipeline by Richter and de Lange (2019b). To extract V1 and V2 masks, the labels produced from FreeSurfer were converted to volume files (using “mri_label2vol” command from FreeSurfer; Fischl, 2012). Masks were then transformed from FreeSurfer space to EPI (functional) space using FSL 5.0 FLIRT (FMRIB's Linear Image Registration Tool; Jenkinson et al., 2002) “applyxfm” command to multiply the FreeSurfer segmented volume by the transformation matrix from the functional localiser output. Masks for each hemisphere were then added together to create bilateral masks (using “fslmaths” function from FSLUTILS FSL 5.0; Smith et al., 2004). Dorsal Striatum Masks: To extract dorsal striatal masks, labelled volumes from the FreeSurfer segmentation output were extracted. This was performed using the FreeSurfer (Fischl, 2012) command

“mri_binarize” with the flag “match” to extract a separate mask for each structure using its label number. This was done for the caudate and putamen (left hemisphere labels 11 and 12, and right hemisphere labels 50 and 51, respectively). Masks were transformed to EPI space using the same method described above (FLIRT -applyxfm). The left and right hemisphere masks for each structure were then added together to create bilateral masks (using “fslmaths” function from FSLUTILS FSL 5.0; Smith et al., 2004). Hippocampal Masks: To extract the hippocampus masks, the FreeSurfer command ‘recon-all’ with the flag ‘-hippocampal-subfields-T1’ was used (Iglesias et al., 2015). This uses each participant’s T1-weighted anatomical scan to segment hippocampal subfields. From this output, the subfields CA1 and CA3 were extracted from each hemisphere using FreeSurfer’s “mri_binarize” with the flag “match” (labels 206 and 208 for CA1 and CA3 respectively). Masks were transformed to EPI space using the same method described above (FLIRT -applyxfm) and added (using

“fslmaths” from FSLUTILS; Smith et al., 2004) to create bilateral masks. These bilateral masks were also added together, to create a CA1 + CA3 mask. A total hippocampus mask

(17)

17 was also created using the total hippocampal volume that the segmentation output generated by the FreeSurfer (Fischl, 2012) command “mri_binarize” with the flag “min 1” to select all non-zero voxels.

Functionally defined ROI (LOC). This method followed the Richter and de Lange (2019b) pipeline and scripts. The localiser run from the fMRI session was used to create a LOC mask. The first stage was to manually select clusters (the four largest) in each

participant’s native space that showed significant preference for object images compared to scrambled images, using the first-level localiser FEAT analysis (described in Univariate

analysis: localiser analysis). This was achieved by setting a Z threshold and visually

inspecting the subsequent significant clusters in FSLView (Smith et al., 2004). These clusters were individually extracted then added together for each participant (using “fslmaths” from FSLUTILS; Smith et al., 2004). The default threshold was Z ≥ 5. If the total number of voxels in the extracted mask was less than 400 (calculated using “fslstats” from FSLUTILS; Smith et al., 2004) then a lower Z threshold was used to extract significant clusters to achieve a resulting mask of 400 or more voxels (this step was added to the Richter and de Lange (2019b) scripts to automate the process of entering a lower Z threshold if required). A lower Z threshold (than Z ≥ 5) was required for 14 participants (41%). Of these, Z ≥ 4 for seven participants, Z ≥ 3 for four participants, Z ≥ 2 for two participants and one participant used a threshold of Z ≥ 1. The second step was to constrain each participant’s LOC mask to where it overlapped with the group-wise LOC analysis (higher-level in Univariate analysis: localiser

analysis). This was achieved using “fslmaths” with the flag ‘-mul’ (FSLUTILS; Smith et al.,

2004)

Constraining ROI masks (all ROIs). Bilateral masks of each ROI were constrained to the most informative 150 voxels representing the trailing object images from the decoding analysis on the localiser data (see Multi-voxel pattern analysis (MVPA) - searchlight) using the pipeline and script from Richter and de Lange (2019a). This was achieved by ranking the voxels by preference for the trailing objects and selecting the top 150 voxels. By restricting the masks to voxels that showed object specific information, we can be more confident that differences found in analyses are related to the representation of objects. This was important to do especially for the caudate and putamen masks, since even within the same region of the basal ganglia there can be different functional properties (Pauli et al., 2016).

(18)

18 Multi-Voxel Pattern Analysis (MVPA) – Decoding From ROIs:

Decoding object identity. To test whether information about the trailing object identity was represented in the different ROIs, decoding analyses were performed. We first used the script from Richter and de Lange (2019b) and the visual cortex masks they provided (constrained to the most informative 300 voxels). As above, parameter estimate maps were obtained, this time for the main task runs (object categorisation task for the attended

condition, character categorisation task for the unattended condition). The voxel-wise GLMs followed the same pre-processing as stated under univariate analysis and included a regressor for each trailing image per expectation condition and the 24 motion regressors and temporal derivatives (as in univariate analysis). It is important to note that this pre-processing included spatial smoothing (using a Gaussian kernel, full-width at half-maximum of 5mm), but the localiser parameter estimates in the Richter and de Lange (2019b) pipeline are from not spatially smoothed data. The analysis was run using Python 3.7.7 (Van Rossum & Drake, 2009)and decoding used a linear support vector machine (SVM) (from SVC function in Scikit-learn, Pedregosa et al., 2011) trained on the six relevant localiser image identity labels for that run (y) and localiser single-trial parameter estimates (X). Decoder performance was assessed by testing on parameter estimates per trailing image from the main MRI tasks to predict image identity labels.

To better understand and control this analysis a new script was written. As mentioned, the Richter and de Lange (2019b) pipeline used non-spatially smoothed data for the localiser (training data) and spatially smoothened data for the main runs (test data). Since smoothing homogenises the data and potential pattern information is lost for MVPA to detect, this new analysis did not spatially smooth the data for extracting the parameter estimates for the main MRI tasks. An additional difference was that the new script uses single-trial parameter estimates for the test data rather than per trailing image parameter estimates like the Richter and de Lange (2019b) script. In this new analysis, the parameter estimate maps (for both main MRI tasks and localiser) were concatenated into a 4D matrix (3D voxel maps over all trials) (using “concat_images” function in NiBabel 3.1.1; Brett et al., 2020) and were masked to our constrained ROIs (using “apply_mask” function in Nilearn 0.6.2; Abraham et al., 2014). This new analysis, like the Richter and de Lange (2019b) analysis, used a linear SVM for decoding (from SVC function in Scikit-learn 0.22.1, Pedregosa et al., 2011 with the arguments “kernel=’linear’, probability=True”). Like the Richter and de Lange (2019b) analysis, the decoder was trained on localiser image identity labels (y) and single-trial

(19)

19 parameter estimates (X) (using the “fit” method in Scikit-learn; Pedregosa et al., 2011). The model was then tested on the single-trial parameter estimates of the main MRI tasks and the accuracy was obtained (using the “score” method in Scikit-learn which assesses the

proportion of predicted labels that match the ground truth labels; Pedregosa et al., 2011). Loading text files (e.g. of the image identities per run) into arrays and saving the array of decoding accuracy results to a text file used the Python package Numpy 1.18.1 (Harris et al., 2020; van der Walt et al., 2011).

Both analyses to decode object identity discussed so far use all trials from the main MRI tasks, thus pooling together both expected and unexpected trials. It is possible, however, that there are differences in how the trailing image is represented in our ROIs depending on whether it was an expected versus unexpected trial, and that differences are cancelled out when considering all trails. To test whether the trailing image can be decoded from visual cortex regardless of expectation or attention, and to test whether the trailing image can be decoded in the hippocampus and dorsal striatum only when it is expected, additional decoding analyses were performed that split the main experiment trials (test data) into expected and unexpected trials. The decoding method was the same as the new script methods described above (trained on localiser data), but the decoder was tested only on the main MRI task parameter estimates for relevant (expected or unexpected) trials to predict the image identity.

Decoding expectation. A final decoding analysis was run to decode whether the trial was expected or unexpected. This addresses potential limitations of the object identity decoding analyses. The first limitation it that it is possible that there are not enough trials for training when considering the six object identities from the localiser. The localiser consists of six presentations of each image, which may be too few for meaningful patterns to be

extracted (36 trials in total for training). Second, the localiser data consists of presentations of just one image. In contrast, the main MRI task consists of the presentation of two images (leading and trailing) for 0.5 seconds each. Since we only have fMRI data for the trial

overall, we cannot split the main task trials into leading and trailing, and it is therefore

possible the signal is not precise enough to decode the trailing object identity since it

represents a blurred signal of both these images. For these reasons, the new analysis was run. To avoid having too few trials, this new analysis collapses across image identity to decode whether the trial was expected or unexpected. To avoid the limitations of the localiser data being too different from the main tasks, it uses only data from the main MRI tasks (the first

(20)

20 run of each task is the training data and second run is the test data). The decoding analysis used the same methods as the new script in Decoding object identity, except that the training data consisted of the expectation condition labels (expected or unexpected, represented by 1s or 0s) (y) and the single-trial parameter estimates (X) of the first run of each attention task (120 trials in total for training). The model was tested using the single-trial parameter estimates of the second run of each task to predict whether the trial was expected or unexpected. Plots of the decoding accuracy results for all analyses were made using the Python package Matplotlib 3.1.3 (Hunter, 2007).

Statistical Analyses

Statistical analysis was performed on the output of the decoding scripts. For output from the Richter and de Lange (2019b) script, data were imported into JASP (version 0.13.1; JASP Team, 2020) statistical software for analysis. For the results of the new scripts, data were imported into a Jupyter notebook (version 6.0.3; Project Jupyter, 2020) for statistical analysis using Python 3.7.6 (Van Rossum & Drake, 2009) including the NumPy 1.18.1 (van der Walt et al., 2011) and the module “stats” from SciPy 1.3.2 (Virtanen et al., 2020). Normality tests were run using the Shapiro-Wilk test of normality. To assess whether there were significant differences between attended and unattended conditions per ROI, paired-samples t-tests were run if the data were normally distributed, or Wilcoxon Signed-Rank if the data were not normally distributed. To assess whether decoding accuracy was

significantly above chance, one-sample t-tests were run for normally distributed data, or Wilcoxon-Signed Rank for non-normally distributed data.

Results Univariate Analyses

Results for the object images versus scrambled images contrast using the functional localiser data are shown in figure 4. Z (Gaussianised T/F) statistic images were thresholded using clusters determined by Z > 3.1 and a (corrected) cluster significance threshold of p = 0.05 (Worsley, 2001). As can be seen, the lateral occipital cortex (LOC) region is

significantly activated from object images. This group-level analysis was used to define LOC masks per participant.

(21)

21

3.1 7.6

Figure 4. Thresholded activation images for the group mean of the object versus scrambled

images contrast. Masks

ROI masks were defined in each participant’s native space and used for the decoding. Figures 5, 6, and 7 show an example of these masks for one participant (constrained to the most informative 150 voxels) for each ROI in light blue superimposed onto that participant’s functional scan. The same participant is shown for examples of all the ROIs.

(22)

22

Figure 5. Example masks for the visual cortex (V1 and V2) and LOC constrained to the most

informative 150 voxels. Plots generated using the Python package Nilearn 0.6.2 (Abraham et al., 2014).

(23)

23

Figure 6. Example masks for the hippocampal subfields involved in pattern completion (CA1

and CA3) and hippocampus total constrained to the most informative 150 voxels. Plots generated using the Python package Nilearn 0.6.2 (Abraham et al., 2014).

Figure 7. Example masks for the dorsal striatum regions (caudate and putamen) constrained

to the most informative 150 voxels. Plots generated using the Python package Nilearn 0.6.2 (Abraham et al., 2014).

(24)

24 Decoding Trailing Object Identity

To test whether it is possible to decode the trailing image from our ROIs, including the visual cortex, hippocampus, and dorsal striatum, decoding models trained on the localiser image identity labels (y) and parameter estimates (X) were performed. Decoder performance was assessed by testing on parameter estimates from the main MRI tasks to predict image identity labels. Chance level was therefore 1/6 (0.1667) since there were six trailing images.

The first step for testing whether trailing object identity could be decoded was to replicate the Richter and de Lange (2019a) analyses. The mean decoding accuracy for each attention condition and ROI using the Richter and de Lange (2019b) script and masks

(primary visual cortex (V1), lateral occipital complex (LOC) and temporal occipital fusiform gyrus (TOFC)) is shown in figure 8.

Figure 8. Mean accuracy for decoding object identity using the Richter and de Lange (2019b)

script and masks. Masks for the ROIs used the most informative 300 voxels. Dotted line represents chance level. Error bars represent standard error of the mean.

Shapiro-Wilk tests of normality indicated that for each ROI per attention condition the data significantly deviated from a normal distribution (p < .01 for all analyses).

Non-parametric tests were therefore run to analyse the decoding accuracy. To test whether there were significant differences between the attended and unattended conditions for any of our

(25)

25 ROIs, paired samples Wilcoxon signed-rank tests showed that there were no statistical

differences between the attended and unattended conditions for any ROIs (V1: Z = 261.5,

p = .082; LOC: Z = 120.5, p = .061; TOFC: Z = 149.5, p = .347). One-sampled Wilcoxon

signed-rank tests also showed that the median decoding accuracy was not significantly greater than chance in either attention condition for any ROI (V1 attended: Z = 335, p = .262; V1 unattended: Z = 147, p = .996; LOC attended: Z = 161, p = .991; LOC unattended:

Z = 269, p = .691; TOFC attended: Z = 160, p = .992; TOFC unattended: Z = 231, p = .876).

These results are surprising, since the original paper by Richter and de Lange (2019a) reports above chance decoding accuracy for decoding the trailing object identity. To better understand and control this analysis, a new script was made. Decoding trailing object identity accuracy results from the new script (using masks constrained to the most informative 150 voxels) are shown in figure 9.

Figure 9. Mean accuracy for decoding object identity across all trials (both expected and

unexpected trials). Masks for the ROIs used the most informative 150 voxels. Dotted line represents chance level. Error bars represent standard deviation from the mean.

Shapiro-Wilk tests of normality showed that data significantly deviated from a normal distribution in the LOC attended (W = 0.932, p = .001) and V2 unattended (W = 0.963,

(26)

26

p = .041) conditions. Non-parametric tests (Wilcoxon signed-ranks) are therefore run for

these instances. For all other ROIs and attention conditions the data did not significantly deviate from a normal distribution (Shapiro-Wilk ps > .05) and so parametric tests (t-test) were used.

Paired-samples t-tests/Wilcoxon signed-ranks (as appropriate) showed that there were no statistical differences in decoding accuracy between the attended and unattended

conditions for any ROIs (V1: t(33) = -0.409, p = .684; V2: W = 1042, p = .684; LOC:

W = 831, p = .307; CA1+CA3: t(33) = 0.321, p = .749; Hippocampus: t(33) = -0.659, p = .512; Caudate: t(33) = -0.043, p = .966; Putamen: t(33) = -1.014, p = .314).

One-sampled t-tests/Wilcoxon signed-ranks (as appropriate) showed that decoding accuracy was not significantly different than chance for any ROI in either attention condition (V1 attended: t(33) = -0.313, p = .756; V1 unattended: t(33) = 0.254, p = .800; V2 attended:

t(33) = -0.527, p = .600; V2 unattended: W = 813, p = .249; LOC attended: W = 911, p = .505; LOC unattended: t(33) = 0.705, p = .484; CA1+CA3 attended: t(33) = 1.208, p = .231; CA1+CA3 unattended: t(33) = 0.626, p = .533; Hippocampus attended: t(33) = 0.120, p = .905; Hippocampus unattended: t(33) = 1.033, p = .305; Caudate attended: t(33) = -0.350, p = .728; Caudate unattended: t(33) = -0.239, p = .812; Putamen attended: t(33) = -0.827, p = .411; Putamen unattended: t(33) = 0.569, p = .571).

These results also fail to replicate the Richter and de Lange (2019a) finding regarding decoding the trailing object identity from any of our ROIs. However, this pools together both expected and unexpected trials. To test whether the trailing object will always be decoded from visual cortex regardless of attention condition, but only in expected trials from the hippocampus and dorsal striatum, additional decoding analyses were performed that split the main experiment trials (test data) into expected and unexpected trials.

(27)

27

Figure 10. Mean accuracy for decoding object identity in just expected trials. Masks for the

ROIs used the most informative 150 voxels. Dotted line represented chance level. Error bars represent standard deviation from the mean.

Shapiro-Wilk tests of normality showed that data significantly deviated from a normal distribution for CA1+CA3 unattended (W = 0.964, p = .049). Non-parametric tests (Wilcoxon signed-ranks) are therefore run for these instances. For all other ROIs and condition attention conditions the data did not significantly deviate from a normal distribution (Shapiro-Wilk ps > .05) and so parametric tests (t-test) were used.

Paired-samples t-tests/Wilcoxon signed-ranks (as appropriate) showed that there was no statistical difference in decoding accuracy between the attended and unattended conditions for any ROIs in the expected trials (V1: t(33) = -0.359, p = .721; V2: t(33) = -0.138, p = .890; LOC: t(33) = -1.404, p = .165; CA1+CA3: W = 1014, p = .425; Hippocampus: t(33) = 0.201,

p = .841; Caudate: t(33) = -0.199, p = .843; Putamen: t(33) = -0.130, p = .897).

One-sampled t-tests/Wilcoxon signed-ranks (as appropriate) showed that decoding accuracy was not significantly different to chance in the expected trials for any ROI in either attention condition (V1 attended: t(33) = -0.512, p = .610; V1 unattended: t(33) = 0.049,

p = .961; V2 attended: t(33) = -0.044, p = .965; V2 unattended: t(33) = 0.128, p = .898; LOC

(28)

28 attended: t(33) = 0.606, p = .547; CA1+CA3 unattended: W = 906.5, p = .464; Hippocampus attended: t(33) = 0.160, p = .873; Hippocampus unattended: t(33) = -0.109, p = .914; Caudate attended: t(33) = -0.150, p = .881; Caudate unattended: t(33) = 0.148, p = .883; Putamen attended: t(33) = -0.248, p = .805; Putamen unattended: t(33) = -0.112, p = .911).

The mean decoding accuracies when testing on unexpected trials only are shown in figure 11.

Figure 11. Mean accuracy for decoding object identity in just unexpected trials. Masks for

the ROIs used the most informative 150 voxels. Dotted line represented chance level. Error bars represent standard deviation from the mean.

Shapiro-Wilk tests of normality showed that data significantly deviated from a normal distribution for LOC unattended (W = 0.958, p = .021) and CA1+CA3 unattended (W = 0.953, p = .012). Non-parametric tests (Wilcoxon signed-ranks) are therefore run for these instances. For all other ROIs and condition attention conditions the data did not significantly deviate from a normal distribution (Shapiro-Wilk ps > .05) and so parametric tests (t-test) were used.

Paired-samples t-tests/Wilcoxon signed-ranks (as appropriate) showed that there was no statistical difference in decoding accuracy between the attended and unattended conditions for any ROIs in the unexpected trials (V1: t(33) = -0.180, p = .858; V2: t(33) = 0.782,

(29)

29

p = .437; LOC: W = 1029, p = .625; CA1+CA3: W = 1109.5, p = .854; Hippocampus: t(33) = -0.604, p = .548; Caudate: t(33) = 0.155, p = .877; Putamen: t(33) = -1.148, p = .255).

One-sampled t-tests/Wilcoxon signed-ranks (as appropriate) showed that decoding accuracy was not significantly different to chance in the unexpected trials for any ROI in either attention condition (V1 attended: t(33) = -0.873, p = .386; V1 unattended:

t(33) = -0.658, p = .513; V2 attended: t(33) = -0.151, p = .881; V2 unattended: t(33) = -1.242, p = .218; LOC attended: t(33) = -0.437, p = .664; LOC unattended: W = 905.5, p = .619;

CA1+CA3 attended: t(33) = 0.905, p = .369; CA1+CA3 unattended: W = 962, p = .602; Hippocampus attended: t(33) = 0.396, p = .693; Hippocampus unattended: t(33) = 1.261,

p = .212; Caudate attended: t(33) = 0.060, p = .952; Caudate unattended: t(33) = -0.150, p = .881; Putamen attended: t(33) = -0.783, p = .436; Putamen unattended: t(33) = 0.743, p = .460).

The results from these analyses indicate that it was not possible to decode the trailing object from any of our ROIs in either the attended or unattended conditions, even when considering expected and unexpected trials separately. Since no condition showed decoding accuracy above chance, no further analyses were run (e.g. whether there were differences in decoding object identity between expected and unexpected trials for any of the ROIs). Decoding Expectation

To test whether our ROIs would represent expectation information, regardless of object identity, a new decoder was trained on expectation condition labels (y) and single-trial parameter estimates (X) of the first run of the main MRI task and tested on its ability to predict whether a trial was expected or unexpected in the second run of the task. Since there were two expectation conditions, chance level was therefore 0.5. Mean decoding accuracy in each ROI for this analysis is shown in figure 12.

(30)

30

Figure 12. Mean accuracy for decoding whether the trial was expected or unexpected. Masks

for the ROIs used the most informative 150 voxels. Dotted line represented chance level. Error bars represent standard deviation from the mean.

Shapiro-Wilk tests of normality showed that data significantly deviated from a normal distribution for CA1+CA3 attended (W = 0.882, p = .0016). Non-parametric tests (Wilcoxon signed-ranks) are therefore run for these instances. For all other ROIs and condition attention conditions the data did not significantly deviate from a normal distribution (Shapiro-Wilk

ps > .05) and so parametric tests (t-test) were used.

Paired-samples t-tests/Wilcoxon signed-ranks (as appropriate) showed that there was no statistical difference in decoding accuracy between the attended and unattended conditions for any ROIs (V1: t(33) = 1.677, p = .103; V2: t(33) = -0.763, p = .451; LOC: t(33) = -1.180,

p = .246; CA1+CA3: W = 238, p = .845; Hippocampus: t(33) = 0.085, p = .933; Caudate: t(33) = 1.503, p = .142; Putamen: t(33) = 0.298, p = .767).

One-sampled t-tests/Wilcoxon signed-ranks (as appropriate) showed that decoding accuracy was not significantly greater than chance for any ROI in either attention condition (V1 attended: t(33) = 1.698, p = .099; V1 unattended: t(33) = -0.851, p = .401; V2 attended:

t(33) = 0.501, p = .620; V2 unattended: t(33) = 1.382, p = .176; LOC attended: t(33) = 0.104, p = .918; LOC unattended: t(33) = -1.486, p = .147; CA1+CA3 attended: W = 225, p = .877;

(31)

31 CA1+CA3 unattended: t(33) = -0.347, p = .731; Hippocampus attended: t(33) = -0.604,

p = .550; Hippocampus unattended: t(33) = -0.768, p = .448; Caudate attended: t(33) = 1.911, p = .065; Caudate unattended: t(33) = -0.361, p = .720; Putamen attended: t(33) = 1.430, p = .162; Putamen unattended: t(33) = 0.921, p = .364).

These results indicate that it was not possible to decode expectation information from any of our ROIs in either the attended or unattended conditions.

Discussion

This study found that the identity of the trailing object could not be decoded from the ROI masks provided by Richter and de Lange (2019b) (V1, LOC, TOFC) when the object was attended or unattended, or from any of the ROIs defined in the present study (V1, V2, LOC, CA1 and CA3 hippocampal subfields, hippocampus, caudate or putamen) when the trailing object was attended or unattended. Whether the trailing object was expected or unexpected could also not be decoded from any of our ROIs when the trial was attended or unattended. Interpretations of these analyses will be discussed.

Regarding the decoding of object identity, the first result of this study is the failure to replicate the finding by Richter and de Lange (2019a) that the trailing object could be

decoded in both attended and unattended conditions, even when using their scripts and

provided masks constrained to the most informative 300 voxels (Richter & de Lange, 2019b). This was surprising and a new script for this decoding analysis was therefore written which improved the previous analysis by using unsmoothed parameter estimates for both the main MRI task and the localiser. It was also surprising that from the new script the trailing object could not be decoded from visual areas including LOC. Firstly, because object images were presented at fixation, suggesting the early visual cortex should be engaged. However, when inspecting the distribution of the most informative 150 voxels in V1 as seen in figure 5, these are not centred around the posterior part of the occipital lobe which would be expected in correspondence to the foveal part of the visual field. It is especially surprising that the trailing object could not be decoded from the LOC since the images used in the study were everyday objects, and the LOC has been found to respond to higher level shape information (Kourtzi & Kanwisher, 2001). Indeed, the univariate analyses from the localiser revealed LOC activity for object images versus their phase scrambled version, and so the inability to decode object image identity from this area was unexpected. Given the inability to replicate the Richter and

(32)

32 de Lange (2019a) results in the visual areas, as well as the distribution of informative voxels shown in the V1 mask, it raises the possibility of an error in our pipeline of analysis.

One potential reason for the inability to decode object identity from our ROIs is that there was not enough power or information from 150 voxels for our decoding analysis. The new masks were constrained to the most informative 150 voxels because smaller ROIs such as the hippocampus and CA1 and CA3 subfields were included in the new analyses. While it is a strength of this study that masks were in each participant’s native space, thus considering subtle individual differences in brain structure (Kanai & Rees, 2011), as well as being

constrained to voxels that contained information about the object images, it is possible the smaller masks reduced power. However, this is unlikely, since visual cortex masks have been constrained to the most informative 150 voxels for MVPA in previous literature that have extracted meaningful patterns (e.g. V1, V2, V3 in Kok et al., 2012).

It is possible that the trailing object identity could not be decoded because each trial in the main MRI task consisted of the presentation of two images, leading and trailing image for only 0.5 seconds each. The BOLD signal associated with each trial therefore included both images, thus potentially representing a blur of both. Research using rapid serial visual presentation (RSVP) (typically 10 images shown per second) show that brain responses to successive stimuli overlap in time (Marti & Dehaene, 2017). This overlapping brain activity contains information that is decodable, but this requires techniques with high temporal resolution such as magnetoencephalography (Marti & Dehaene, 2017) or electrophysiology (Nikolić et al., 2009). Given the relatively poor temporal resolution of fMRI, the information we have is only for both images, meaning the neural response we have for each trailing image would differ depend on its leading image. This limitation was overcome in the analysis that split trials into expected and unexpected conditions, since the expected trials would contain the same two images (leading and trailing) each time. However, the trailing image could not be decoding from the expected trials either. This may be explained because the decoding model was trained on the localiser task which involved the presentation of one image, while the fMRI response from the main task still represented a blurred neural response of two images (despite being the same two images in the expected condition), thus limiting the decoding model’s ability to generalise to the test data. Future research should therefore use stimuli pairs of different modalities. For example, certain tones could be associated with different object images. Tone-image pairings have been used in previous research (e.g. Kok et al., 2014; Kok et al., 2017; Kok et al., 2019; Kok & Turk-Browne, 2018). This would

(33)

33 ensure that information in the visual areas is representing the trailing image, and not an overlapped response of a leading image.

Another reason the trailing object may not have been decoded in the ROIs is because the localiser (used as training data) consisted of each trailing image being shown for 12 seconds (flashing), whereas test data was a very brief presentation of images (including the leading image, as discussed). It is possible the very different presentation durations resulted in different patterns of activity, further limiting the decoder’s ability to generalise to the main MRI task. Furthermore, since the localiser task required the detection of a dimming fixation dot, this rendered the object image unattended. It is therefore possible this limited the decoder’s ability to generalise to the main MRI task in the attended condition. Future

research should therefore use a localiser with more similar presentation to the main task (e.g. shorter presentations, objects both attended and unattended).

Regarding decoding object identity from the hippocampus, it is possible that, unlike previous research that studied abstract shapes (e.g. Kok et al., 2019; Kok & Turk-Browne, 2018) the hippocampus is not implicated as the source of learned predictions for more ecologically valid stimuli (here, everyday objects). However, this seems unlikely, since the hippocampus has been linked to memory retrieval of arbitrary learned associations between higher-level items, such as locations, famous people, objects and animals (Horner et al., 2015). This study by Horner et al. (2015), however, displayed the items as words and not images, and so may not apply to the current study, although participants were asked to be imagine them as vividly as possible, which is shown to mimic perception on a neural level (Sunday et al., 2018).

It is also possible that the trailing object could not be decoded from the hippocampus or dorsal striatum because they were not engaged during the localiser task. The decoder was trained on the localiser task which involved the detection of a dimming fixation dot while viewing one image at a time. It is possible the hippocampus and dorsal striatum were not engaged during this task since there was no learning taking place and no relationship between stimuli, shown to engage the hippocampus (Schapiro et al., 2012) and striatum (Turk-Browne et al., 2008). Thus, when the decoder was tested on the main MRI task involving the

presentation of learned pairs, it is possible the hippocampus and dorsal striatum were engaged but the decoder could not detect the representation of object identity due to constrains of the training data. Regarding the dorsal striatum specifically, another potential reason the trailing

(34)

34 object identity could not be decoded from the dorsal striatum is because the main MRI task took place on day 2, whereas learning the object pairs took place on day 1. It is possible that these regions were engaged during the learning phase (day 1) but not after learning (day 2). For example, Neely et al. (2018) found that inhibiting the dorsomedial striatum

optogenetically in rodents during predictive learning (associating primary visual cortex activity to an auditory cursor to receive a reward) prevented learning. However, when inhibiting the dorsomedial striatum after learning, it did not affect these learned associations and rewarded patterns were still produced. This suggests that the dorsomedial striatum is implicated during the learning phase, but not in retrieval. Since fMRI data was only collected on day 2, this may suggest why the dorsal striatum did not represent information regarding the trailing object or expectation, as it may not have been engaged after learning. Future research should therefore consider a broader timescale of learning and ideally collect neuroimaging data on the first day of learning as well as during the final task. Collecting neuroimaging data at various time points is also relevant to the hippocampus, since Hindy et al. (2019) found that after a 3-day delay compared to immediately after learning, background connectivity between the hippocampus and V1/V2 sharpened.

Addressing the limitations of the first analyses to decode the trailing object identity, that 1) the localiser data may be too different to the main MRI task, 2) the training data (localiser) only consisted of 36 trials, 3) the hippocampus and dorsal striatum may not have been engaged during the localiser, 4) each trailing image would be preceded by a different leading image, the final analysis collapsed across image identity and tested whether it was possible to decode whether the trial was expected or unexpected using half of the main MRI task as training data and the other half as test data (therefore 120 trials for training). This analysis revealed that mean decoding accuracy was not significantly above chance in any ROIs in both attended and unattended conditions. This was surprising, since the hippocampus and dorsal striatum were hypothesised to contain information regarding expectation based on previous research (Hindy et al., 2016; Kok & Turk-Browne, 2018; Kok et al., 2019).

However, this study uses more complex/naturalistic stimuli than previous studies, and so it is possible the representation of expectation also differs. Another possibility is that the training time was too short, or that the ‘refresher’ task on day 2 was not extensive enough. However, this is unlikely, since participants showed a behavioural benefit of categorising expected versus unexpected pairs (categorised quicker and more accurately) suggesting that they still learnt the object pairs at the time of scanning (Richter & de Lange, 2019a). However, it is

Referenties

GERELATEERDE DOCUMENTEN

Waarschijnlijk hebben die wel een nog hogere kostprijs tot gevolg, terwijl de kostprijs van de diervriendelijke producten door de veel lagere arbeidsproductiviteit (33 procent

attention to central or peripheral parts of the visual field while maintaining gaze to the center, has been shown to cause region-specific changes in brain activity as measured

When exploring the significantly increased levels of migration since December 2014, marking the beginning of the normalisation process between the United States

Men raakte hier dus minder vertrouwd met arbeidsparticipatie van (gehuwde) vrouwen. Het recht op arbeid van de gehuwde vrouw stond nog steeds ter discussie. In

In order to exhibit the velocity of different Altmetric.com data sources in disseminating newly published research outputs at the day level, it is necessary to find a precise proxy

This contribution examined the reproducibility of a sample of articles published in the journal of Scientometric in 2017 by examining the availability of different artifacts..

Considering this aspect, the criteria for selecting the eIC could be redefined in the following way: eIC is the candidate (not artifact related) IC showing the largest overlap with

(A.) Left: IC # 36 has a left lateralized activation map and its time course (B.) is correlated with the reference BOLD signal based on the left-sided interictal spikes