• No results found

The role of recurrent processing in figure/ground segmentation and object completion in natural stimuli

N/A
N/A
Protected

Academic year: 2021

Share "The role of recurrent processing in figure/ground segmentation and object completion in natural stimuli"

Copied!
13
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master Programme Brain and Cognitive Science

University of Amsterdam

Report, Research Project 2017

Supervisor: Dr. Steven Scholte

Author: Lucas Weber

The role of recurrent processing in figure/ground segmentation and object

completion in natural stimuli

Abstract

Humans are extraordinarily good at recognizing objects in their everyday surroundings, which are most of the time suboptimal. Examples for suboptimal conditions for recognition are occlusion through other objects or being contrasted to a cluttered background. Prior research on simplistic stimuli has shown that complex circumstances a pure feedforward sweep of information processing in the ventral visual stream might not be enough to sustain robust object recognition. Instead, evidence accumulates that the visual system utilizes recurrent processing from higher to lower areas to amplify low level features. The current study aims to unveil the role of recursion in natural object recognition. We hypothesized that the suppression of recurrent processing through backward masking has a bigger impact on recognition of naturally occluded objects than of non-occluded objects. Further, we assumed that suppression of recurrent processing has a bigger effect on recognition of objects embedded in their natural background than of the same objects, segmented from their surroundings. Using behavioural and electroencephalogram (EEG) measures we found that recognition of objects in their natural background depends on to a larger to a larger degree on recurrent processing than segmented objects. This can be traced back to a larger need for surface segregation. For occlusion our behavioural and EEG data contradict each other, therefore research is needed to clarify the role of recurrent processing.

(2)

Introduction

In daily life we encounter objects in conditions that are suboptimal for recognition. They are in midst of cluttered surroundings or occluded through other objects. This is in fact not the exception, but rather the rule. Despite these suboptimal circumstances, humans are extraordinarily good in grouping the fragmented surfaces making up an object into meaningful units, while distinguishing them from other objects and correctly classify them into different categories. How do humans achieve such a complicated task?

For a long time, researchers have assumed that visual object recognition is accomplished by a feedforward sweep of information-processing in the ventral visual stream (Riesenhuber & Poggio, 1999; Milner & Goodale, 1992). In this view, information flows from low level structures in V1 and V2, specialized in simple feature detection tasks like e.g. edge detection, to higher level structures like the lateral occipital area (LO) (Grill-Spector et al., 2001; Konen & Kastner, 2008). While progressing through the hierarchical organized ventral visual stream, the size of single units receptive fields increases, becoming increasingly invariant in their response, climaxing in units as specialized as the infamous Halle-Berry cells in the medial temporal lobe (MTL) (Quiroga, 2005). At first glance, a feedforward model sounds reasonable, since the visual system has to operate in a constantly changing environment; if it was not plainly feedforward, older information would interfere with information arising from subsequent changes in the scene and render a reliable representation of the surrounding impossible. However, under closer inspection, the information in the visual field is most of the time more stable than expected, with only small details changing over time. Further, pure feedforward vision is not able to account for object recognition in more complex situations as the ones depicted earlier (e.g. Wyatte, Curan & O’Reilly, 2012). Indeed, prior research suggests that in more complicated visual scenes with e.g. degraded stimuli, so-called recurrent processing is necessary to enable object recognition (Johnson & Olshausen, 2005). Recurrent processing describes feedback mechanisms from higher to lower level visual areas amplifying the initial activity in lower level visual areas, making the object-related signals become stronger until saturation is reached and recognition is accomplished. This is reasonable since anatomical studies of the visual pathways have shown that the density of backward and forward projecting neurons is almost equal (Sporns & Zwi, 2004; Felleman & Van Essen, 1991). On top of this, Drewes et al. (2016) demonstrated that shape percepts can be optimally reinforced when they are flashed in a 60 ms interval. They conclude that after this period recursion arises, so that new incoming stimulation and recursion are matched and reinforced by each other.

As earlier mentioned, natural object recognition takes place in suboptimal conditions, exemplified by occlusion or cluttered scenes. For occlusion, research on simplistic Kanisza figures (Kanisza, 1976) has shown that low level visual areas (V2: Ramsdey, Hung & Roe, 2001; von der Heydt et al., 1984; V1: Halgren et al. 2003; Lee & Nguyen, 2001) as well as high level areas (Harris et al., 2011) are both involved in object completion and therefore object recognition. It has been suggested that the necessary interaction between higher and lower areas comes through recurrent processing (Halgren et al, 2003; Harris et al., 2011; Murray et al., 2002). . Wokke et al. (2012) tested this assumption by blocking higher and lower visual areas using transcranial magnetic stimulation (TMS) at different timepoints, while their subjects had to identify a illusionary square in a Kanisza figure. Blocking V1 and V2 at timepoints at which feedforward related activation has already vanished, impaired performance of human subjects. Additionally, blocking the activity in the higher visual area LO had its biggest effect when applied earlier in time than the lower level visual areas. This indicates that completion of occluded objects is dependent on feedback mechanisms. Further, Wyatte, Curien and O’Reilly (2012) show that this also holds true for more complex stimuli, like three-dimensional object models. However, in this type of perceptual completion, parts of the percept are not occluded, but simply missing (replaced with background). This is called modal completion by Gestalt psychologists (Michotte, Thinès, & Crabbé, 1964/1991) and is fairly uncommon in daily life. A more common form is amodal completion. In amodal completion an occluder explains missing parts in an object. It has been suggested that modal and amodal completion are based on the same neural mechanisms (Kellman, Yin & Shipley, 1998).

Beside occlusion, scene segmentation seems to be a process which is dependent on recursion. Scene segmentation describes the process in which textures in the visual field are being organized into separate units and is therefore essential to make objects identifiable. Scene segmentation depends on two

(3)

processes: boundary detection between two different textures, and surface segregation, which describes the figure-ground segregation itself. Different studies (Lamme, Rodriguez-Rodriguez, & Spekreijse, 1999; Hupé et al., 1998; Lamme, 1995) show that the activity of neurons in V1 differ due to their location in the visual field: is it on the figure itself, a neuron responses more strongly than to the same elements located on the background (Zipser, Lamme & Schiller, 1996). This discrepancy in activity cannot arise from the feedforward sweep, since it relies on information from outside the neuron’s receptive field. Higher level visual areas must therefore provide selective feedback to the lower level neurons when they are located on the object of interest. The same study shows that boundary detection as a prerequisite of scene segmentation does not rely on feedback. When an object is segmented from its natural background and placed on monochrome plain, human figure-ground segmentation might be required to a lesser degree to successful recognize the object. In this view, an object that is segmented from its natural background and placed on a monochrome plain should require no (or less) surface segregation and therefore less recurrent processing to be successfully recognized.

One way to study the effects of recurrent processing is by the usage of masking (Lamme & Roelfsema, 2000; Breitmeyer, 1984). Masks are stimuli that are either shown immediately before (forward masking) or after (backward masking) the presentation of a target. Both types of masks decrease the visibility of the target stimuli in visual tasks. While forward masking supresses the neural onset-response of the target (Judge, Wurtz, & Richmond, 1980; Macknik & Livingstone, 1998; Macknik & Martinez-Conde, 2007), backwards masking suppresses the target’s after-discharge (Macknik & Livingstone, 1998; Macknik & Martinez-Conde, 2004). There is evidence that the backward mask’s suppression of the target’s after-discharge corresponds to the disruption of recurrent processing from higher to lower visual areas (Fahrenfort, Lamme & Scholte, 2007). In this view, the presentation of a backward mask is interrupting the feedback processing whilst leaving the initial feedforward sweep of information intact (Lamme, Zipser, & Spekreijse, 2002; Di Lollo, Enns & Rensink, 2000). Habak, Wilkinson & Wilson (2006) showed that effectiveness of backwards masks peaks at 53 – 107 ms after stimulus onset, which is in line with the prior mentioned 60 ms to the onset of recurrent processing (Drewes et al., 2016).

The research reviewed so far points collectively towards a necessity of recurrent processing for objects in complex scenes (difficult figure/ground segmentation or occlusion through other objects). However, all cited evidence is based on research conducted on simplistic stimuli, which do not resemble the complexity of real world situations. It still remains unclear whether the prior assumptions are holding for stimuli exhibiting greater ecological validity. This study therefore aims to test whether the assumption that recognition of occluded and unsegmented objects need recurrent processing extends to natural stimuli. If that is the case, the expected drop in human subjects’ performance due to masking should be greater for more difficult images (occluded objects / unsegmented objects) compared to their easier counterparts (non-occluded objects / segmented objects). Out of the preceding assumptions we formulated the following hypotheses: (1) Backwards masking impairs human performance in an object recognition task to a greater extend when objects are partially occluded compared to non-occluded objects. (2) Backwards masking impairs human performance in an object recognition task to a greater extent when objects occur in their natural background compared to the same objects that were segmented and placed on a plain background. (3) Backwards masking compromises event-related potentials (ERPs) in a electroencephalogram recording (EEG) of early visual (occipital) areas to a greater extent while humans subjects try to recognize occluded / unsegmented objects compared to non-occluded / segmented objects. The ERPs of higher visual (lateral/peri occipital) areas are relatively unaffected.

Methods

Participants

To test our hypotheses, seventy-eight participants (n = 78; 21MALES,58FEMALES, 0 OTHERS) were recruited through the recruitment page for psychological research of the University of Amsterdam (UvA) (https://www.lab.uva.nl/lab/). All participants were residents of the Netherlands, while 86% were of Dutch nationality and 14% reported other nationalities. All participants were right-handed, reported normal or corrected-to-normal vision, provided written informed consent and were compensated either

(4)

monetarily or with credit points, necessary for first year students of the UvA. The experimental design was approved by the ethics committee of the University of Amsterdam.

Stimuli & apparatus

Task. Participants completed a four-options decision task, in which they had to categorize

backwards-masked and backwards-unbackwards-masked images of different objects. Due to the experiment’s embedment into a bigger project with different research questions, categorizations were both on a ordinate and a subordinate level. However, the ordinate and subordinate categorization-tasks were balanced over all conditions and performed on the same stimuli, which is why we do not expected an interfering effect of classification level on our research question. The ordinate and subordinate categories used can be viewed in table 1). All object categories were controlled to appear in similar contexts (objects appear in the same scenes, which makes classification on the basis of background impossible) and predominant colour (objects do not exhibited a salient colour, which makes classification solely on the basis of this colour impossible).

Stimuli. Stimuli where obtained from Microsoft’s common objects in context - database (MSCOCO).

MSCOCO’s advantage over other databases like ImageNet (Deng et al., 2009) is that their images are taken in natural scenes. It is possible to have multiple objects in a single picture and images are not cropped and centred on the object of interest, which means that position and size of the object can vary. However, this makes a confound through objects size possible, which we prevented by controlling objects not to be smaller than 10% or larger than 80% of the picture (based on the amount of pixels specified in the MSCOCO object - annotations). Half of the target-objects were occluded (estimated degree of occlusion ranged from 30-60%) while the other half was not or just to a negligible degree (>5%) occluded. We chose this degree of occlusion based on a paper of Johnson and Olshausen (2005), which describes a medium accuracy rate for occlusion in this range. This let us expect an effect without letting participants performance drop to chance levels. Additionally, every image was used with its natural background as well as in a segmented version, where the object’s background was replaced with plain white colour. Thus, we used exactly the same objects in the segmented and the unsegmented condition. We utilized the object annotations of MSCOCO to segment the objects (examples of stimuli of the different conditions can be seen in figure 1).

Table 1

Ordinate and subordinate categories

Ordinate categories Subordinate Categories

dog German shepherd

Labrador

cat American shorthair

Calico

bicycle race bicycle

ladies bicycle

motorcycle Chopper

Touring

bus single-deck bus

double-decker bus

car station wagon

Sedan

bag hand bag

backpack

suitcase traditional suitcase

(5)

Figure 1: Examples of different target stimuli and masks. (a) Shows a non-occluded double-decker bus, whilst the single-deck bus in (b) is occluded. The two images in the middle show examples of a (c) unsegmented and (d) segmented German shepherd dog. The image on the right shows one of the (e) masks used in the experiment, consisting out of randomly sampled 80x80 pixel – tiles from the respective category pair.

Masks. The masks used in the experiment consisted of 80x80 pixel squares randomly sampled from the

target pictures scrambled together to 800x800 masking stimuli (see figure 1e). 80x80 pixels tile-size was chosen since it had the greatest masking effect without leading to confusion between target and mask. For every ordinate categorization we composed 10 unique masks (10 masks containing tiles of cats and dogs, 10 masks containing tiles of busses and cars and so forth), resulting in 40 different masks in total.

Apperatus. The experiment was performed on a Dell P2412H monitor measuring 51.7cm x 32.3cm with a

frame rate of 60 Hz and a screen resolution of 1920 x 1080 pixels. Participants were placed in 70 cm distance to the screen. Stimuli were presented using MATLAB’s Psychophysic Toolbox Version 3. Participants were able to indicate their response using the buttons ‘a’ ‘d’ ‘j’ ‘l’ on a standard US-international layout computer keyboard positioned on their lap, using ‘a’ and ‘l’ (outer responses) indicated a certain response, while ‘d’ and ‘j’ (inner responses) indicated an uncertain response.

EEG-setup. Our EEG-setup consisted of a 64-channel Active Two EEG system (BioSemi, Amsterdam, the

Netherlands) sampling at a rate of 512 Hz. Additionally, we added external electrodes on each temple, above and under the left eye for ocular correction and on each mastoid for later referencing. Electrodes were attached following the international 10/5 system. Since we were especially interested in occipital lobe activity, we moved the frontal electrodes F5 and F6 to position I1 and I2 respectively.

Procedure

After signing the informed consent form, participants were instructed to complete practice trials, while the experimenters set up the EEG. Participants were able to perform the task in Dutch or English language.

Paradigm. A single trial (see also figure 2) consisted of a fixation cross for 500 to 1500 ms (timing for every

trial was drawn from an uniform distribution), followed by a forward mask for 50 ms and the target presentation for 16 ms. In masked trials a backward masking stimulus was presented for 400 ms and a blank screen for 500 ms, whilst in the unmasked condition the blank screen was presented for the whole 900 ms interval. Finally, the participants where prompted to respond. Responses were only possible during a 2000 ms interval after the prompt onset, to avoid motor related confounds in the EEG signal through earlier responses. During the inter-trial interval of 1000 – 2000 ms (again sampled from an uniform

(6)

distribution) the screen was cleared. For every trial, masking stimuli were randomly sampled from the 10 potential candidates of the corresponding category.

Practice. Practice consisted of 10 trials of cat/dog categorization, of which 8 were only forward masked

(unmasked condition) and 2 forward and backward masked. This practice resembled the real experiment and was meant to familiarize the participant with the procedure. Participants were able to continue when performed with at least 70% accuracy. After this, 8 practice blocks for familiarization with the subordinate categories were carried out. Before every block a verbal explanation of differences between categories and example images were provided on the screen. The trials in these blocks resembled the unmasked subordinate trials of the actual experiment, but target presentation time was elongated to 2 seconds to render the objects clearly visible. After every trial, accuracy feedback was provided.

The actual experiment consisted of 1024 trials, divided into 64 blocks (16 trials/block). The second 512 trials were a repetition of the first 512 trials. Every block was made up of either only ordinate or only subordinate categorizations. Within a block only one type of categorization was asked (e.g. cat vs. dog or German shepherd vs. Labrador). Half of the participants completed 16 blocks of ordinate categorization first and continued with 16 blocks of subordinate categorization, while the other half followed the opposite pattern. Target stimuli of all conditions were equally distributed over all blocks (segmentation, occlusion, masking).

Data Analysis

Behavioural Analysis. Analysis of behavioural data was conducted using MATLAB R2017a (The

MathWorks Inc., Natick, MA, USA, www.mathworks.com) and IBM SPSS Statistics 24.0

(IBM, Armonk,

USA)

. We performed two repeated measure two-way analyses of variance (ANOVA) on our data with the

within-subject factors occlusion and masking for the first and segmentation and masking for the second ANOVA. The dependent variables for both procedures were the mean accuracy-rates (percentage correct responses) within the conditions for each subject.

EEG processing. For processing of our EEG data we used Makoto’s (Miyakoschi, 2017)

pre-processing pipeline utilizing the EEGLAB extension for MATLAB. Crucial steps in the pipeline are (1) down sampling of the data to 256Hz, (2) high-and low-pass filtering the data at 1 and 40 Hz, (3) removing sinusoidal line noise by using the cleanline-plugin for EEGLAB (Mullen, 2012) and (4) rejection of outlier channels based on extreme amplitudes, lack of correlation with other channels (r < .8) and lack of predictability of other channels (using Artifact Subspace Reconstruction (ASR); explained in more detail by Bigdely-Shamlo et al., 2015). To avoid biasing our data due to bad channel rejection, channels were (5) interpolated. After pre-processing the data was (6) re-referenced and (7) epoched (100 ms before to 500 ms after forward mask-onset). Finally, we performed a (8) baseline correction based on the signal between -100 ms and 0 ms.

EEG analysis. After pre-processing we conducted Wilcoxon Signed Rank Tests on the differences between

ERPs of masked vs. unmasked trials within our experimental conditions (occluded/non-occluded and segmented/unsegmented). The results give insight in how far the EEG signal was influenced by masking. We performed another set of Wilcoxon Signed Rank Tests on difference of the difference waves (masked vs unmasked) of the two levels of our conditions (occluded vs. non-occluded and segmented vs. unsegmented). This uncovers any interaction between the levels and masking in the neural signal. Due to the large amount of statistical tests we conducted here, we used a Bonferroni correction to account for multiple comparisons (α = 9.8e-6). For our analysis we chose one electrode representative for low level

visual area (Oz) and one representative for high level visual area (PO7) out of all available occipital and lateral peri-occipital electrodes (Iz, I1, I2, Oz, O1, O2, PO3, PO4, PO7, PO8). The chosen electrodes showed the highest amplitudes, while still being highly correlated with their surroundings.

(7)

Figure 2. Paradigm. (a) shows a single trial in the backwards-masked condition. (b) shows the same trial in the unmasked condition.

Results

Behavioural analysis

To test whether the task was actually manageable for the participants we calculated the grand mean of accuracy (maccuracy = .71, SE = .31). We also assessed the effectiveness of our masks (maccuracy (masked)

= .61, SEaccuracy (masked) = .44; maccuracy (unmasked) = .79, SEaccuracy (unmasked) = .44). Further, performance

during the subordinate practice was recorded and all participants exhibited sufficient ability to discriminate between subordinate categories.

Our first hypothesis assumed that objects that are occluded by another object in natural scenes need more recurrent processing to be recognized than their not occluded counterparts. If that is the case, recognition of occluded objects will suffer relatively more under backwards-masking compared to non-occluded objects. The two-way repeated measure ANOVA shows significant main effects of occlusion and masking (Focclusion (1, 77) = 557.52, pocclusion < .05 ; Fmask (1, 77) = 1513.63 ; pmask < .05). However, we did not

observed a significant interaction for occlusion and masking (Focclusion*mask (1, 77) = 3.97, p occlusion*mask > .05).

This is opposing our expectations since we expected our occluded condition to be compromised to a higher degree by backwards-masking than our non-occluded condition.

In our second hypothesis we proposed that scene segmentation is dependent on recurrent processing in natural scenes. Therefore, we expected that recognition of images that are already segmented from their background needs less recurrent processing compared to objects that are still embedded into their natural scene. Our two-way repeated measure ANOVA returned significant main effects for both factors (Fsegmentation (1, 77) = 333.63, psegmentation < .05 ; Fmask (1 77) = 1511.19 ; pmask < .05), but no significant

interaction (Fsegmentation*mask (1, 77) = 0.013, psegmentation*mask > .05). Again, this is against our expectations, since

our hypothesis assumes an interaction between the factors.

At first glance, our data seems not to support our hypotheses. However, on closer inspection we can see that our backward masking may have been to potent resulting in a drastic decrease of mean accuracy values of our masked condition in both hypotheses. Performances are approaching chance levels (mmask/occluded = .57, mmask/non-occluded = .65; mmask/segmented = .64, mmask/unsegmented = .58). This leads to floor-like

effects as performance becomes less likely to further drop when it approaches 50%. We tackled this issue by normalizing our data before interpretation (see figure 4). After normalizing, we performed two paired-sample t-test on the relative changes in both conditions. Both tests showed that the normalized changes in performance were significantly different from each other (toccluded/non-occluded = 7.57, poccluded/non-occluded < .05 ;

tsegmented/unsegmented = 6.44, psegmented/unsegmented < .05 ). The mean decrease of performance within occluded

trials (moccluded = 69%) and the mean decrease within non-occluded trials (mnon-occluded = 55%) due to

(8)

into the expected direction. We observed the same results in relative changes within our segmented and unsegmented conditions (msegmented = 55%; munsegmented = 68%).

Figure 3. Relative drop of accuracy due to backwards-masking. The light grey bars show the relative drop of performance in the different experimental conditions.

EEG analysis

With our third hypothesis we aimed to support our behavioural findings with neural data. After pre-processing, we were left with EEG-data of 57 participants. If occluded (or unsegmented) objects need more recurrent processing to be recognized than non-occluded (or segmented) objects, we expect them, when unmasked, to cause higher activity in early areas of the visual stream at timepoints at which the original feedforward sweep has already past. Since masking is preventing early visual areas to engage into recurrent processing, we should find less late activity in the early visual areas when stimuli are masked. We expect that decrease to be greater in the conditions which presumably depend to a greater extent on recurrent processing (occluded / unsegmented objects).

Preceding analysis, our EEG data was checked whether backwards masking compromised the feedforward sweep of processing. In order to do so, the earliest onset of difference in ERPs between masked and unmasked condition was determined. The first deflection of the signals was located in electrode Oz at onsetfirst deflection = 176 ms. This late onset of the effect of backwards-masking is a good

indicator for the intactness of the initial feedforward sweep.

Our ERP analysis of occluded and non-occluded trials shows that neural activity of electrode Oz was affected by masking in both conditions(see figure 4a). The onset of the first significant difference between the ERP waves of masked vs unmasked trials is at onset(non-occ *mask)-(non-occ-unmask) = onset (occ*mask)-(occ*unmask) = 219 ms after stimulus-onset. However, looking at the ERPs in figure 4a we also see that occluded

and non-occluded trials seem to be affected to the same degree. By subtracting the difference waves from each other and comparing the result to 0 we found that the effect of masking on both conditions does not differ significantly from each other at any time (see also figure 4b). This indicates that masking affects the ERPs of occluded and non-occluded stimuli-objects to the same degree.

The ERP analysis of unsegmented and segmented trials also show an effect of masking on both conditions at electrode Oz (see figure 4c). Onset of the first significant differences between masked and unmasked trials are at onset(seg*mask)-(seg*unmask) = 176 ms and onset(unseg*mask)-(unseg-unmask) = 195 ms. In this case,

however, segmented and unsegmented trials seem to be perturbed differently. Using the same procedure as in the prior paragraph, we found a significant difference between the effects of masking on unsegmented and segmented trials (see figure 4d). This difference in effects can be interpreted as an interaction between masking and segmentation.

(9)

The peri-occipital electrode PO7 shows only a late effect of masking in all experimental conditions (onset(non-occ *mask)-(non-occ-unmask) = 271 ms; onset(occ*mask)-(occ*unmask) = 305 ms)(see figure 5a,c). Further, the

interaction between masking and all experimental conditions are not significant (figure 5b,d), showing that the neural response in PO7 to occluded and non-occluded (unsegmented and segmented respectively) objects is equally affected by backwards-masking.

Figure 4. ERPs and difference waves of electrode Oz in different conditions. (a) shows the ERPs of electrode Oz of occluded and non-occluded trials. In both conditions masking decreases neural activity significantly. This is also depicted by (b) which shows the significant difference of masked and unmasked trials within the levels of occlusion. However, it also shows that the difference between the effects of masking in both conditions is not significant. (c) shows the ERPs of unsegmented and segmented trials. Here also neural activity is significantly decreased in both conditions. Looking at (d) it becomes apparent that the effect of masking differs for segmented and unsegmented trials.

(10)

Figure 5. ERPs and difference waves of electrode PO7 in different conditions. In (a) we can see that the ERPs in electrode PO7 are also affected by masking in their late response. It becomes more clear by looking at the difference waves in (b): the effect of masking on occluded trials and non-occluded trials is significant, but they do not differ significantly from each other. This indicates that there is no interaction between masking and occlusion in the signal of electrode PO7. We can observe the same pattern for segmentation in (c) (d).

Discussion

The current study aimed to shed light on the role of recurrent processing in human object recognition in suboptimal, natural circumstances. We investigated in how far recurrent processing is necessary for object recognition when objects are (1) occluded by other objects and when they have to be (2) segmented from a complex natural scene. Our behavioural results exhibit a drop of accuracy for all our experimental conditions when stimuli were backward-masked. However, the drop in accuracy differs within the levels of our independent variables (occlusion / segmentation). Recognition of unsegmented and occluded objects was more compromised by backwards masking than their segmented and non-occluded counterparts. According to Fahrenfort, Lamme and Scholte (2007), backwards-masking perturbs object recognition by supressing recurrent processing from higher to lower visual areas. Thus we can conclude that occluded objects and objects in their natural background (unsegmented) rely on more recurrent processing. However, due to a performance decline to almost chance levels in our masking condition, we had a floor-like effect in which accuracy levels were not able to further drop. This made it necessary to normalize our data. Even though, we were able to uncover the prior mentioned interaction from the arising floor-like effect, normalization causes us to lose explanatory power. This is why we lay more emphasis on our ERP analysis (3). We found that masking reduced activity in early visual cortex (electrode Oz) in all of our experimental conditions. However, this masking effect on Oz ERPs was only interacting with the unsegmented / segmented condition and not with the occluded / non-occluded condition. Our neural data therefore underpins the notion that figure/ground segmentation does depend

(11)

on feedback mechanisms (hypothesis 2), but that the completion of occluded objects in natural images does not (hypothesis 1).

A reason for us not finding the effect of masking on occluded object recognition reported in earlier studies might depend on the different quality of occlusion that was used. As mentioned in the introduction Michotte, Thinès, & Crabbé (1964/1991) make a distinction between object completion in which the occluded object is simply ‘deleted out’ of the object (replaced with background), what they call ‘modal completion’ and completion in which another object is actually occluding the target, explaining the missing part in the object, termed ‘modal completion’. While in prior studies occlusion was predominantly ‘deleted- out’, thus depending on modal completion (e.g. Johnson & Olshausen, 2005; Wyatte et al., 2012), in our study an occluder explained the missing part in the target, depending on amodal completion. Even though it was hypothesized that modal and amodal completion are based on the same neural mechanisms (Kellman, Yin & Shipley, 1998), this notion is not undisputed (Lee, 2003; Sugita, 1999). The difference between modal and amodal completion might account for our divergent results. If that is the case, the results of previous studies cannot be transferred to naturally occluded stimuli, since in natural scenes occluded objects commonly have an occluder to explain their missing portion.

On the other hand, our investigation of the role of recursion in figure/ground segmentation in natural stimuli fits the prior research on simplistic stimuli. Not only our behavioural and difference analysis of ERPs confirms the literature, but also the timing our EEG results also confirms prior findings. Wokke et al. (2012) report that interrupting V1/V2 using TMS at 236-259 ms after stimulus onset interferes with recurrent processing and therewith with surface segregation. The perturbation of our EEG signal due to masking ((unmask – mask) deflects significantly from zero) in electrode Oz also falls into the same time period (218 – 261 ms after stimulus onset). Wokke et al. (2012) also show that late V1/V2 activity (due to recurrent processing) is necessary for surface segregation in their simplistic stimuli. This is in line with our finding that natural stimuli that need less surface segregation (segmented objects) suffer less under suppression of recursion through masking. Taken together, Wokke et al. and this study provide converging evidence that surface segregation indeed is dependent on recurrent processing by using different methods.

Based on our analysis, we can say that figure/ground segmentation in natural images does depend on recurrent processing from higher to lower visual areas. The current study therefore supports prior findings on more simplistic stimuli and extends scope to natural images. However, completion of objects, when occluded, in natural scenes seems not to rely on recurrent processing. Since our results regarding occlusion are not univocal, further research using less potent masks is needed to compensate for our compromised results and confirm our interpretation.

References

BioSemi, B. V. (2011). BioSemi ActiveTwo.[EEG system]. Amsterdam: BioSemi.

Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Frontiers in neuroinformatics, 9.

Fahrenfort, J. J., Scholte, H. S., & Lamme, V. A. (2007). Masking disrupts reentrant processing in human visual cortex. Journal of cognitive neuroscience, 19(9), 1488-1497.

Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex, 1(1), 1-47.

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE.

Di Lollo, V., Enns, J. T., & Rensink, R. A. (2000). Competition for consciousness among visual events: the psychophysics of reentrant visual processes. Journal of Experimental Psychology: General, 129(4), 481.

Drewes, J., Goren, G., Zhu, W., & Elder, J/H. (2016). Recurrent processing in the formation of percept shapes. The Journal of Neuroscience, 36(1), 185 - 192.

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in neurosciences, 15(1), 20-25.

(12)

Grill-Spector, K., Kourtzi, Z., & Kanwisher, N. (2001). The lateral occipital complex and its role in object recognition. Vision research, 41(10), 1409-1422.

Guide, M. U. S. (1998). The mathworks. Inc., Natick, MA, 5, 333.

Habak, C., Wilkinson, F., & Wilson, H. R. (2006). Dynamics of shape interaction in human vision. Vision research, 46(26), 4305-4320.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).

Hupé, J. M., James, A. C., Payne, B. R., & Lomber, S. G. (1998). Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons. Nature, 394(6695), 784.

Johnson, J. S., & Olshausen, B. A. (2005). The recognition of partially visible natural objects in the presence and absence of their occluders. Vision research, 45(25), 3262-3276.

Kanizsa, G. (1976). Subjective contours. Scientific American, 234(4), 48-52.

Kellman, P. J., Yin, C., & Shipley, T. F. (1998). A common mechanism for illusory and occluded object completion. Journal of Experimental Psychology: Human Perception and Performance, 24,859–869.

Konen, C. S., & Kastner, S. (2008). Two hierarchically organized neural systems for object information in human visual cortex. Nature neuroscience, 11(2), 224.

Lamme, V. A. (1995). The neurophysiology of figure-ground segregation in primary visual cortex. Journal of Neuroscience, 15(2), 1605-1615.

Lamme, V. A., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in neurosciences, 23(11), 571-579.

Lamme, V. A., Rodriguez-Rodriguez, V., & Spekreijse, H. (1999). Separate processing dynamics for texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey. Cerebral Cortex, 9(4), 406-413.

Lamme, V. A., Zipser, K., & Spekreijse, H. (2002). Masking interrupts figure-ground signals in V1. Journal of cognitive neuroscience, 14(7), 1044-1053.

Lee, T. S. (2003). Computations in the early visual cortex. Journal of Physiology, 97, 121–139.

Lee, T. S., & Nguyen, M. (2001). Dynamics of subjective contour formation in the early visual cortex. Proceedings of the National Academy of Sciences, USA, 98, 1907–1911.

Halgren, E., Mendola, J., Chong, C. D., & Dale, A. M. (2003). Cortical activation to illusory shapes as measured with magnetoencephalography. Neuroimage, 18(4), 1001-1009.

Harris, J. J., Schwarzkopf, D. S., Song, C., Bahrami, B., & Rees, G. (2011). Contextual illusions reveal the limit of unconscious visual processing. Psychological Science, 22, 399–405.

Judge, S. J., Wurtz, R. H., & Richmond, B. J. (1980). Vision during saccadic eye movements. I. Visual interactions in striate cortex. J Neurophysiol, 43(4), 1133-55.

Macknik, S. L., & Livingstone, M. S. (1998). Neuronal correlates of visibility and invisibility in the primate visual system. Nature neuroscience, 1(2).

Macknik, S. L., & Martinez-Conde, S. (2007). The role of feedback in visual masking and visual processing. Advances in cognitive psychology.

Martinez-Conde, S., Macknik, S. L., & Hubel, D. H. (2004). The role of fixational eye movements in visual perception. Nature reviews. Neuroscience, 5(3), 229.

Michotte, A., Thine`s, G., & Crabbe´, G. (1964). Amodal completion of perceptual structures. In: G. Thine`s, A. Costall, & G. Butterworth (Eds.), Michotte’s experimental phenomenology of perception (pp.140–167, published 1991). Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers.

Mullen, T. (2012). Cleanline. Retrieved August 10, 2017, from https://www.nitrc.org/projects/cleanline/

Murray, M. M., Wylie, G. R., Higgins, B. A., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002). The spatiotemporal dynamics of illusory contour processing: Combined high-density electrical mapping, source analysis, and functional magnetic resonance imaging. The Journal of Neuroscience, 22, 5055–5073.

Miyakoshi, M. (2017). Makoto's pre-processing pipeline. Retrieved August 05, 2017, from https://sccn.ucsd.edu/wiki/Makoto%27s_preprocessing_pipeline#Interpolate_all_the_removed_channels

Quiroga, R.; et al. (2005). "Invariant visual representation by single neurons in the human brain". Nature. 435 (7045): 1102– 1107. doi:10.1038/nature03687

(13)

Ramsden, B. M., Hung, C. P., & Roe, A. W. (2001). Real and illusory contour processing in area V1 of the primate: a cortical balancing act. Cerebral Cortex, 11(7), 648-665.

Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature neuroscience, 2(11).Sporns, O., & Zwi, J. D. (2004). The small world of the cerebral cortex. Neuroinformatics, 2(2), 145-162.

Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature, 401, 269–272.

von der Heydt, R., Peterhans, E., & Baumgartner, G. (1984). Illusory contours and cortical neuron responses. Science, 224, 1260–1262.

Wokke, M. E., Sligte, I. G., Steven Scholte, H., & Lamme, V. A. (2012). Two critical periods in early visual cortex during figure– ground segregation. Brain and behavior, 2(6), 763-777.

Wyatte, D., Curran, T., & O'Reilly, R. (2012). The limits of feedforward vision: Recurrent processing promotes robust object recognition when objects are degraded. Journal of Cognitive Neuroscience, 24(11), 2248-2261.

Zipser, K., Lamme, V. A., & Schiller, P. H. (1996). Contextual modulation in primary visual cortex. Journal of Neuroscience, 16(22), 7376-7389.

Referenties

GERELATEERDE DOCUMENTEN

was widespread in both printed texts and illustrations, immediately comes to mind. Did it indeed reflect something perceived as a real social problem? From the punishment of

characteristics (Baarda and De Goede 2001, p. As said before, one sub goal of this study was to find out if explanation about the purpose of the eye pictures would make a

To give recommendations with regard to obtaining legitimacy and support in the context of launching a non-technical innovation; namely setting up a Children’s Edutainment Centre with

By combining organizational role theory with core features of the sensemaking perspective of creativity, we propose conditional indirect relationships between creative role

To analyze collaboration, we provide one such highly idealized model and abstract away most features of scienti fic groups and their research envi- ronments, with the exception of

These commands use the same optional arguments as \scalerel and \stretchrel to constrain the width and/or the aspect ratio, respectively, of the manipulated object.. As was mentioned

soils differ from internationally published values. 5) Determine pesticides field-migration behaviour for South African soils. 6) Evaluate current use models for their ability

Fouché and Delport (2005: 27) also associate a literature review with a detailed examination of both primary and secondary sources related to the research topic. In order