• No results found

Dog or poodle? : recurrent processing in object categorization

N/A
N/A
Protected

Academic year: 2021

Share "Dog or poodle? : recurrent processing in object categorization"

Copied!
19
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Graduate School of Psychology

RESEARCH MASTER’S PSYCHOLOGY THESIS

J

ESSICA

L

OKE

S

UPERVISED BY H

.

STEVEN SCHOLTE

Dog or Poodle: Recurrent processing in object categorization Abstract

Humans recognize visual scenes and objects with remarkable speed. In fact, visual stimuli flashed for only 30ms could already be recognized. Previous studies have attributed this categorization speed to the rapid feed-forward sweep. Hence, it appears that feed-forward processing is sufficient for object recognition. Then, what are the roles for recurrent processing within object recognition and what determines the extent of its occurrence? We hypothesize that the occurrence of recurrent processing is related to task difficulty – the more difficult the presented task, the more recurrent processing is needed. In our experiment, we examine the occurrence of recurrent processing within object categorization, specifically between ordinate and subordinate level categorization. Previous studies have revealed a robust ordinate level advantage, signifying that a categorization task within the ordinate level is easier compared to categorizations within the subordinate level. We found that participants show poorer performance within the subordinate level categorization as compared to the ordinate level categorization. Furthermore, this deterioration in performance was greater within masked conditions – indicating that interference of recurrent processing was more detrimental for the more difficult task. However, our ERPs analysis only revealed differences between ordinate and subordinate categorization; and within a later time window (>330ms). In summary, our experiment provided behavioral evidence that recurrent processing within object recognition is related to task difficulty; our ERP analysis provided evidence that ordinate and subordinate categorization differ after the feed-forward sweep.

(2)

Introduction

We recognize objects with remarkable speed (Thorpe, Fize, & Marlot, 1996). Using a classifier-based readout technique, Hung et al. (2005) could identify object identity and category through neuronal population activity in inferior temporal cortex. This classification could be done within 125ms from stimuli onset with 70% accuracy. Therefore, it is believed that a rapid feed-forward sweep is responsible for this quick recognition (Serre et al., 2007). This also indicates that a feed-forward sweep is sufficient for some form of object recognition. However, studies have also shown the occurrence of recurrent processing within object recognition. Recurrent processing refers to the loops in feedback connections between higher and lower visual areas. Therefore, the question remains, what are the roles for recurrent processing, what determines the extent of its occurrence and in how far can object recognition extend in its absence?

Previous research has shown that recurrent processing is especially important for two purposes – scene segmentation (Fahrenfort, Scholte, & Lamme, 2007; Scholte et al. 2008; Wokke et al., 2012) and encoding finer object details to aid recognition (Wyatte, Curran, & O’Reilly, 2012; O’Reilly et al., 2013). For scene segmentation, Scholte et al. (2008) have shown that this process happens in two steps – (1) texture boundary detection, and (2) surface segregation. Using electroencophalography (EEG), they were able to temporally distinct both processes with texture boundary detection taking place before surface segregation. Both mechanisms distinguish objects from its background, facilitating object recognition. Additionally, using functional magnetic resonance imaging (fMRI) they were able to show a cascading activation change for both processes up the visual hierarchy. This cascading activation up the visual hierarchy suggests presence of feedback signals. The necessity for this segmentation within object recognition is confirmed by studies showing that more time is needed for object detection within visual scenes with fewer clear edges (Groen et al., 2013) and also for object categorization when the object is located within a scene compared to when it’s presented in isolation (Vanmarcke, Calders, & Wagemans, 2016).

Similarly, studies have shown that object recognition often necessitates further encoding of finer object details (Hochstein, & Ahissar, 2002; Hupé et al., 1998). Based on previous studies, this encoding process happens in two different ways. First, visual details are ‘filled-in’ by higher visual areas during recurrent processing. This is demonstrated by several experiments using degraded or partially occluded objects visual stimuli (Wyatte, Curran, & O’Reilly, 2012; O’Reilly et al., 2013). It was shown that interference of recurrent processing (using a mask) severely decreased performance in object recognition for degraded or occluded items. These previous studies postulate that recurrent processing enables a top down modulation which enable completion and recognition of incomplete objects. Second, visual details are further encoded beyond the initial feed-forward processes through reactivation of early visual areas (Scholte et al., 2008). Utilizing transcranial magnetic stimulation, Koivisto et al. (2011) demonstrated that activity in early visual cortex continues to modulate categorization speed beyond the initial feed-forward sweep. This re-activation of early visual cortex suggests that further encoding of low-level visual features is performed in object recognition. Altogether, recurrent processing seems to be associated with task difficulty. As the visual scene becomes more cluttered with fewer well-defined edges, the task of identifying an object becomes more difficult as it is harder to distinguish the object from its background. In addition, if more detailed visual features are needed for object recognition, the task similarly becomes more difficult and would require further processing. Therefore, by varying task difficulty, we could observe

(3)

different occurrence of recurrent processing. In this experiment, we can vary task difficulty by observing object recognition at the ordinate and subordinate level.

Object recognition/categorization could be differentiated into three levels – superordinate, ordinate and subordinate. An example of a superordinate categorization would be “animal”, an ordinate example would be “dog” and subordinate example would be “golden retriever”. Rosch et al. (1976) was the first to provide robust evidence for an ordinate (commonly known as basic) level advantage. Within the ordinate level, objects are verified and named faster than objects within the superordinate or subordinate levels (Tanaka & Taylor, 1991). Naming objects at an ordinate level is also preferred to other levels. This ordinate level advantaged is attributed to the theory that human visual processing is biased towards ordinate-level features (Biederman, 1987). In the context of our experiment, an ordinate-level features bias suggests that object categorization at the subordinate level would be more difficult as compared to the ordinate level, and therefore would call for additional processing.

Furthermore, categorizing subordinate objects would require finer details compared to categorizing ordinate objects (Collin & McMullen, 2005). This is consistent with a study by Yu et al. (2016) which has shown that as compared to objects in ordinate levels, objects in subordinate levels have a higher number of shared features and are perceptually more similar with other objects within the same group. The higher level of similarity would require finer processing to distinguish between different objects within the subordinate level. Therefore, we could reason that additional processing is necessary to fine tune the activation of early visual cortex. This demonstrates that categorizing subordinate objects is more difficult than categorizing ordinate objects, linking task difficulty to occurrence of recurrent processing.

In this study, we examined the occurrence of recurrent processing in relation to task difficulty using a masking paradigm, with both forward and backward masks. Forward masks appeared right before the visual stimulus whereas backward masks appear right after stimulus presentation. Masking stimulus has been demonstrated to interfere with recurrent processing and reduce V1 responses (Lamme et al., 2002). All of our experimental trials were presented with forward masks and half of the trials were also presented with backward masks. Stimuli presented with only forward masks would have allowed for a greater occurrence of recurrent processing as compared to stimuli presented with both forward and backward masks. Our masks were created by scrambling the pixels of our original stimuli. This mask would be presented immediately following the visual stimuli, interfering with recurrent processing by limiting feedback from higher visual areas to early visual areas for categorical relevant features during categorization (Fahrenfort, Scholte, & Lamme, 2007). This also results in a limitation of additional processing that would have been possible for object recognition.

Convolutional Neural Networks (CNNs). Convolutional neural network is a type of neural network which have been useful in image classification. CNNs’ hierarchical layers have been shown to share similar properties with human visual system (DiCarlo et al., 2012); furthermore, its performance has also matched (and even surpassed) human performance. Therefore, cognitive scientists have utilized CNNs to understand the mechanisms underlying visual processing.

On the ImageNet 2015 classification task, the winning model – ResNet-152 – has an error rate of 3.57%. This is remarkable development from 2012 where the winning model – AlexNet – has an error rate of 16.4%. Notably, the improvement of the models over the years has come with increasing

(4)

depth in layers. AlexNet consisted of 8 convolutional layers, and ResNet-152, as the name suggests, has 152 convolutional layers.

Pertaining to human visual processing, the signal does not travel through 152 layers from the retina to higher visual areas; but these additional layers within CNNs could be mimicking both the feed-forward and feedback processing in the human visual system. In a recent study comparing a 7 layer CNN model with a 16 layer CNN model, Ramakrishnan et al. (in press) found the 16 layer CNN model being able to explain more variance in brain activity; crucially, this additional explained variance is found after 120ms from stimuli onset – after the feedforward sweep. Therefore, we suggest that the difference in model performance is due to the fact that deeper layers perform processes similar to recurrent processing in the human brain. In this experiment, we will compare CNNs performance with human performance in the categorization tasks. We expect models with deeper layers to outperform models with fewer layers at both categorization level; furthermore, we expect a larger decrease in performance for shallower layers compared to deeper layers for the more difficult (i.e. subordinate-level) categorization task. This expectation is similar to our prediction for human performance – that recurrent processing occurs to a larger degree for more difficult tasks.

Material and methods

Participants. Seventy-eight participants (74% females) were recruited from University of Amsterdam. Participants were compensated either financially or with research credits, required of first year students. All participants completed a written informed consent; they were all right-handed and had normal or corrected-to-normal vision.

Visual Stimuli. The experiment was presented on a Dell P2412H monitor measuring 51.7cm x 32.3cm. The software used was Psychophysics Toolbox Version 3 within MATLAB.

In total, 256 unique images were selected from Microsoft COCO: Common Objects in Context, an online image database. These images were chosen for its naturalistic scenes which are representative of reality. (See Figure 1 for examples.) Equal number of images from eight different categories were selected – (1) cats, (2) dogs, (3) bikes, (4) motorbikes, (5) cars, (6) busses, (7) bags, and (8) suitcases. Each image was shown four times (two times with only a forward mask and another two times with both forward and backward masks).

(5)

Figure 1. Examples of images selected as experimental stimuli. (left) The target was the double decker bus occluded by passengers. (right) The target was the suitcase next to the shrubs. Instead of choosing images where targets are salient, we chose these images as they are representative of scenes in everyday lives.

Experimental procedures. All participants were seated 70cm from the screen. Before starting the experimental trials, they have to complete practice trials to familiarize themselves with the task and stimulus from the different categories. Afterwards, participants completed the experimental categorization task in twelve different pairs – (1) cats vs. dogs, (2) bikes vs. motorbikes, (3) cars vs. busses, (4) bags vs. suitcases, (5) Calico vs. Tabby cats, (6) Labrador vs. German Shepherd dogs, (7) race bikes vs. lady bikes, (8) Touring vs. Chopper motorbikes, (9) sedans vs. station wagons, (10) single vs. double decker busses, (11) bag packs vs. handbags, and (12) traditional vs. trolley suitcases. In total, both the practice trials and experimental task took approximately 100 minutes.

Within the experimental task, participants performed 1024 trials in total. Trials were grouped into 64 blocks (16 trials/block). Thirty-two blocks consisted of ordinate categorization and the other 32 blocks consisted of subordinate categorization trials. All blocks included equal number of only forward masked and forward+backward masked trials. See Figure 2 for depiction of experimental paradigm.

(6)

Figure 2. Experimental design. Participants observed the following: (1) fixation cross 500-1000ms, (2) forward mask 50ms, (3) stimulus 17ms, (4) no mask 0ms or backward mask 400ms, (5) blank screen 500ms, (6) categorization prompt 1500ms. While completing the experiment, participants also provided EEG measurements.

EEG measurement. Recordings were gathered with a Biosemi 64-channel Active Two EEG system. A standard 10-10 electrode placement was used. As we are more interested in visual processing, electrodes F5 and F6 were moved to the occipital lobe as I1 and I2. Six external electrodes were used with the electrodes on the mastoids serving as reference. The sampling rate was 512 Hz.

CNN training. We examined the extent to which human performance is captured by CNNs in categorization tasks with different levels of difficulty. The three CNN models we have selected are – ResNet-18, ResNet-34 and ResNet-152. These models were obtained from the PyTorch model zoo and are pre-trained to differentiate between 1000 categories. They differ in number of processing layers and performance in object recognition (See Table 1).

Table 1.

Object recognition performance of ResNet-18, ResNet-34, and ResNet-152. Top-1 error represents the error rate when only the top prediction was used for validation; whereas, top-5 error represents the error rate when top five predictions were used for validation.

Model # of convolutional layers Top-1 error rate Top-5 error rate

ResNet-18 18 30.24% 10.92%

ResNet-34 34 26.70% 8.58%

ResNet-152 152 21.69% 5.94%

In this experiment, we only used the CNNs as a fixed feature extractor. This means we have frozen all the layers’ parameters except for the final fully connected layer. This way, the weights for layers would not change through our estimation process. The last layer was reset to only generate two outputs (e.g. cat or dog, German Shepherd or Labrador) to make the CNNs performance comparable

(7)

to human performance. The categorization task for CNNs consist of four pairs of ordinate and six pairs of subordinate categories. All categorical pairs are defined identically as our experimental task.

Data Analysis

Behavioral Data. All behavioral analyses were performed in R (version 3.2.2.). Our behavioral data examined the relationship between recurrent processing and task difficulty, with accuracy as a measure for performance. Scores were obtained and grouped based on trial types: (1) unmasked ordinate categorization, (2) unmasked subordinate categorization, (3) masked ordinate categorization, and (4) masked subordinate categorization.

We executed a two-factor repeated measures ANOVA to examine the differences in performance between ordinate and subordinate level categorization and masked and unmasked trials. Within the ANOVA, we examined the main effects of category level and masking, and interaction of both factors. EEG data. Preprocessing of raw EEG data was completed in EEGLAB within MATLAB using the Makato preprocessing pipeline. The steps are as follow: (1) down-sampling of data to 256 Hz, (2) high-pass filtering at 1 Hz, low-pass filtering at 40Hz, (3) removal of line noise at 50 Hz, (4) usage of Artifact Subspace Reconstruction (ASR) to clean continuous data and reject bad channels – coherence of a given electrode and its surrounding electrodes must be at least 0.8, (5) interpolating removed channels, (6) re-referencing of data to the average, (7) segmentations into epochs -200 to 500ms from stimulus onset, and lastly, (8) baseline correction based on signal between -200 and 0ms from stimulus onset.

Results

Behavioral analysis. All seventy-eight participants were included in the behavioral analyses. Participants are evaluated on their accuracy in the categorization task. These scores were grouped into four types: (1) unmasked ordinate categorization, (2) unmasked subordinate categorization, (3) masked ordinate categorization, and (4) masked subordinate categorization. Table 2 summarizes the behavioral performance within the four conditions.

Table 2.

Experimental performance for N=78 across conditions.

Ordinate-unmasked Ordinate-masked

M = 84.76%, SD = 4.72% M = 63.86%, SD = 4.54%

Subordinate-unmasked Subordinate-masked

M = 71.48%, SD = 5.36% M = 57.77%, SD = 4.18%

The accuracy for all participants across all four conditions was M = 69.76%, SD = 3.45%. Results from the two factor repeated measures ANOVA revealed that there was a significant main effect of

(8)

categorization level, F(1,77) = 437, p < 0.001. As predicted, participants’ accuracy on ordinate level categorization was significantly higher (M = 74.31%, SD = 4.09%) than subordinate level categorization (M = 64.64%, SD = 4.15%). There was also a significant main effect of masking, F(1,77) = 1594, p < 0.001. Also as predicted, participants’ accuracy on unmasked stimuli was significantly higher (M = 78.14%, SD = 4.48%) than masked stimuli (M = 60.82%, SD = 3.58%). (See Figure 3) Importantly, there was a significant interaction effect between categorization level and masking, F(1, 77) = 148, p < 0.001. This significant interaction effect indicated that masking influenced both ordinate and subordinate categorization to different degrees. At first glance, it appears that the deterioration of performance was bigger within ordinate categorization as compared to subordinate categorization. However, due to the floor limit of 50% accuracy, it is important to normalize this performance before evaluating the interaction. After normalization, the decrease in performance due to masking within the ordinate condition is 60% whereas decrease within the subordinate condition is 64%. Therefore, we can interpret from the significant interaction that masking was more detrimental for subordinate categorization than ordinate categorization.

Figure 3. Participants accuracy grouped by trial types. Participants performed significantly better in unmasked trials compared to masked trials. Similarly, participants performed significantly better in ordinate compared to subordinate categorization. Moving from unmasked to masked trials, decrease in performance is greater for subordinate categorization compared to ordinate categorization, indicating that recurrent processing is more valuable within the more difficult categorization.

Convolutional Neural Network Performance. In our experiment, we also want to examine if CNN exhibit similar visual processing properties. To gauge CNN performance, we used three pre-trained

(9)

ResNet models – ResNet-18, ResNet-34, ResNet-152. These three CNN models performed a similar categorization task provided to the experiment participants.

All three CNNs’ performance is slightly better at ordinate-level categorization with an average of 95% accuracy, compared to subordinate-level categorization with an average of 92%. Table 3 summarizes CNNs’ performance for each categorical pair.

We predicted that CNN with deeper layers (e.g. ResNet-152) would outperform CNN with shallower layers (e.g. ResNet-18) at both levels of categorization. This prediction was only met in one of the ordinate-level categorical pair (bag vs suitcase). For all other ordinate-level categorization, both deep and shallow models perform similarly. (See Figure 4) In addition, we predicted that CNN with fewer layers would suffer a larger decrease in performance at the subordinate-level categorization, as compared to CNN with deeper layers. This predicted gradient in performance was only found in certain categorical pairs: Calico versus Tabby (subordinate cats), German Shepherd versus Labrador (subordinate dogs), single versus double decker (subordinate busses). (See Figure 5)

Additionally, we compared the performance between three CNN models and our participants. Unfortunately, no functional comparison can be made because there was a huge difference between performances. (See Figure 6)

Figure 4. Ordinate-level categorization performance for three ResNet models. Evidently, all models perform similarly except for the comparison between bags versus suitcase. Here, we see the predicted gradient that the deeper model – ResNet 152 – outperform the shallower model – ResNet-18.

(10)

Figure 5. Subordinate-level categorization performance for three ResNet models. Compared to ordinate-level categorization, there is more variance in performance between models. However, there are no discernable patterns across categories.

(11)

Figure 6. Comparison of performance between the CNNs and our experimental participants. (top) Ordinate-level categorization – we observed that both the CNNs and our participants found the “Bag vs Suitcase” categorization to be most difficult within all ordinate-level categorization. (bottom) Subordinate-level categorization – there are no discernable consistent patterns between the CNNs and our participants’ performance. We observed a large decrease in accuracy when moving from

(12)

ordinate-level to subordinate-level categorization for experimental participants; whereas, this decrease is minor or even absent in the CNNs’ performance.

Additional exploratory analysis. After observing the CNNs’ performance, we reasoned that our predictions could not be met due to a ceiling effect. Therefore, we added noise to the images to degrade CNNs’ performance. This noise could also mimic the effect of masking. Salt and Pepper noise was distributed evenly on images. CNN’s performance degraded rapidly with the addition of noise. (See Table 3) However, this degradation was similar for all models, though ResNet-152 shows slightly worse performance compared to ResNet-18 with addition of more noise in images.

Table 3.

Object categorization performance of ResNet-18, ResNet-34, and ResNet-152.

Cat vs. Dog Bus vs. Car Bicycle vs. Bag vs. Average of Motorcycle Suitcase ordinate

categories

ResNet-18 99% 99% 100% 80% 95%

ResNet-34 99% 99% 99% 83% 95%

ResNet-152 99% 99% 100% 87% 96%

Calico German Single vs. Shepherd vs. vs. Tabby Labrador Double Bus

ResNet-18 88% 86% 92% ResNet-34 87% 98% 97% ResNet-152 92% 98% 97%

Sedan Handbag Traditional Average of vs. vs. suitcase vs. subordinate Stationwagon Bagpack Trolley categories

ResNet-18 83% 95% 93% 90%

ResNet-34 83% 92% 98% 93%

ResNet-152 84% 94% 93% 93%

10% Noise 25% Noise 50% Noise Cat vs. Dog Cat vs. Dog Cat vs. Dog ResNet-18 85% 76% 56% ResNet-34 86% 75% 51% ResNet-152 86% 74% 55%

(13)

Event-related potential analysis. After preprocessing, we are left with EEG measurements from 69 participants. EEG measurements from 8 participants were excluded from analysis because of insufficient trials – this problem arose from technical issues during data collection. From 69 participants, ERPs were constructed by averaging epoched data from all participants across trials per condition. This was done using EEGLab in MATLAB. (See Figure 7) Peak activity within electrodes was approximately at 100, 200 and 350ms.

Figure 7. Topographic map of peak activity at 100, 210 and 350ms. Electrodes at the early visual areas were most active and therefore selected for analysis.

We performed a Wilcoxon Signed Rank Test to observe the difference in ERPs for two factors – subordinate versus ordinate categorization within masked versus unmasked conditions. Additionally, we observe if there’s an interaction between both factors. The comparison is done over each electrode and time sample, implying that (68 electrodes * 128 time samples or) 8,704 comparisons are made. Therefore, we considered two corrections for our p-value – i) 0.01/8704, and ii) 0.01/128. With the first p-value, and a more conservative one, we found no significant differences between both factors and also no significant interaction. With the second p-value, with only corrections on time samples, we found significant differences in electrodes P5 and Fz within the masked condition. Table 4 summarizes the time window where significant differences are found.

(14)

Figure 8. Electrode P5 for effects on categorization level within masked and unmasked condition. There is a noticeable difference between subordinate and ordinate categorization starting at approximately 300ms. Both masked and unmasked conditions show similar ERP components. Within masked conditions, there are significance differences between subordinate and ordinate categorization at 331-335ms. The asterisks indicate where the difference is significant.

Figure 9. Electrode Fz for effects on categorization level within masked and unmasked conditions. Similar to electrode P5, there are no noticeable differences in the general ERPs components between both masked and unmasked conditions. Difference between ordinate and subordinate categorization emerged at approximately 280ms. This difference was significant within the masked condition at 351ms, as indicated by the asterisk.

(15)

Table 4.

Time window for significant differences in ERPs between subordinate and ordinate categorization

Electrodes Subordinate Masked vs Ordinate Masked

P5 331-335ms

Fz 351-359ms; 382ms

Discussion & Conclusion

In this study, we hypothesized that the occurrence of recurrent processing is related to task difficulty. Our behavioral results have shown that masking stimuli and performing object categorization at the subordinate level decreased participants’ accuracy. This is a good indicator that our manipulations were successful and the task was indeed more difficult. Furthermore, our behavioral results supported our hypothesis that recurrent processing is related to task difficulty as we observed that masking caused a larger deterioration in categorization performance at the subordinate level compared to the ordinate level.

However, our ERP analysis did not confirm our hypothesis. Within the ERPs, we observed differences between ordinate and subordinate categorization within masked conditions in electrodes P5 and Fz. This difference is statistically significant only within the masked condition, and only by using a more liberal p-value correction. Importantly, this difference emerged at approximately 330-360ms. According to Lamme and Roelfsema (2000), with the presentation of an optimal (i.e. high contrast) stimulus, a feed-forward sweep can reach the high visual areas within 100ms. Therefore, this deflection at 330ms provides evidence that object recognition itself happens at a later time course, plausibly facilitated by recurrent processing. A previous study by Johnson & Olshausen (2003) have similarly shown that differences in ERPs within 300-500ms were related to object categorization and recognition. This is different from Tanaka et al. (1999) which have shown differences between subordinate and ordinate level categorization at approximately 140ms from stimulus onset within left posterior sites. The researchers attributed this difference to increased visual demands due to different categorization-level. However, in our data, difference between categorization-level only emerge later in time. (See Figure 10).

(16)

Figure 10. Electrode PO3 for effects on categorization level within masked and unmasked conditions. Similar to electrodes P5 and Fz, differences between categorization levels only emerged around 300ms. Here, this difference is not statistically significant.

In our analysis, we considered two different p-values – using two different false discovery rate corrections. This correction is important as EEG data is frequently laden with noise and the occurrence of significant differences could simply be noise (Luck & Gaspelin, in press). With our mass univariate approach, the traditional Bonferroni correction is still too conservative. This is evident in our analysis as the p-value of 0.01/(68*128) provided us with null results; whereas, the p-value of 0.01/128 provided us with significance within two electrodes. For our analysis, a more reasonable approach would be pre-determining latency windows and performing the corrections according to number of samples within those time windows. Based on our current knowledge, we would be examining the samples within 300-500ms for ordinate versus subordinate categorization.

In the mentioned analysis, we estimated the occurrence of recurrent processing through estimating the differences between categorization levels in masked and unmasked conditions. This is only sensible if we believe that masking affected both subordinate and ordinate conditions similarly – i.e. decreased recurrent processing to similar degrees. However, if the ability of masking to interfere with recurrent processing is mediated by the strength of recurrent processing itself, our interpretation would be flawed. In order to properly disentangle the effects of masking – interfering with recurrent processing and causing a visual artifact – it would be necessary to obtain ERPs of the mask presentation alone. This can help us understand how the mask interacts with visual processing at baseline.

In our analysis, we had also attempted to compare CNNs’ and participants’ performance. However, the networks’ performance far exceeded our participants’ performance; therefore, no reasonable comparison can be made. To degrade the CNNs’ performance, we added noise to the images; but, we found the degradation to be similar for all models regardless of depth. CNNs are known to be highly susceptible to noise; though an image with 10% noise could still be easily recognize by humans, networks’ accuracies have dropped to ~85%. This is similarly found in previous studies (Geirhos et al., in press). Surprisingly, we found that the deeper model – ResNet-152 was slightly more susceptible to noise compared to the shallower model – ResNet-18. In another study done in our lab,

(17)

CNNs were found to perform better when objects are embedded in scenes compared to when objects are presented alone. In fact, objects’ background was more important for deeper models compared to shallow ones. Thus, we could reason that ‘more noise’ was captured by the deeper model, leading to poorer performance. This divergence in performance in response to noisy images could be further studied to discover the contrast in visual processing between CNNs and humans.

For our experimental participants, the subordinate-level categorization performance was poorer compared to ordinate-level categorization performance, with the exception of the “German Shepherd versus Labrador” categorization. In this case, participants performed just as well as ResNet-18 within unmasked trials. Some participants have remarked that they found this categorization to be easier compared to other subordinate categorization, due to the fact that German Shepherds always have their ears perked up whereas Labrador always have floppy ears. In our experiment, we were careful to select the comparison between perceptually similar objects in terms of color and general shape/size; but, our participants were quick to notice features we had overlooked. This simple distinction could already boost subordinate categorization performance to match ordinate categorization. This demonstrates how important stimuli selection is and also the precaution researchers should take when generalizing findings from experimental task.

To summarize, our experiment investigated the occurrence of recurrent processing in relation to task difficulty. Our behavioral supported our hypothesis that interference of recurrent processing was more detrimental to more difficult tasks but our ERP analysis did not support this conclusion. Our experiment confirmed that backward-masking was effective for interfering with feedback processes, and this interference was detrimental to categorization performance. In comparison to existing studies on object recognition, stimuli used in our experiment depict natural scenes and are representative of reality; thereby also making our conclusions stronger and more generalizable to object recognition as we personally know.

(18)

References

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding.

Psychological Review, 94(2), 115-147.

Collin, C. A., & McMullen, P. A. (2005). Subordinate-level categorization relies on high spatial frequences to a greater degree than basic-level categorization. Perception and Psychophysics, 67(2), 354-364.

Curran, T., Tanaka, J. W., & Weiskopf, D. M. (2002). An electrophysiological comparison of visual categorization and recognition memory. Cognitive, Affective & Behavioral Neuroscience, 2(1), 1-18. Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underlying the nonlinear threshold for access to consciousness. PLoS biology, 5(10), 260.

DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition?

Neuron, 73(3), 415-434.

Fahrenfort, J. J., Scholte, H. S. & Lamme, V. A. F. (2007). Masking disrupts reentrant processing in human visual cortex. Journal of Cognitive Neuroscience¸ 19(9), 1488-1497.

Geirhos, R., Janssen, D. H. J., Schutt, H. H., Rauber, J., Bethge, M., & Wichmann, F. A. (in press). Comparing deep neural networks against humans: Object recognition when the signal gets weaker. Groen, I. I. A., Ghebreab, S., Lamme, V. A. F. and Scholte, H. S. (2012). Low-level contrast statistics are diagnostic of invariance of natural textures. Frontiers of Computational Neuroscience, 6:34, doi: 10.3389/fncom.2012.00034

Groen, I. I. A, Ghebreab, S., Prins, H., Lamme, V. A. F., & Scholte, H. S. (2013). From image statistics to scene gist: Evoked neural activity reveals transition from low-level natural image structure to scene category. Journal of Neuroscience, 33(48), 18814-18824.

Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36(5), 791-804.

Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout of object identity from Macaque inferior temporal cortex. Science, 310(5749), 863-866.

Hupé, J. M., James, A. C., Payne, B. R., Lomber, S. G., Girard, P., & Bullier, J. (1998). Cortical feedback improves discrimination between figure and background by V1, V2, and V3 neurons. Nature,

394(6695), 784-787.

Koivisto, M., Railo, H., Revonsuo, A., Vanni, S., & Salminen-Vaparanta, N. (2011). Recurrent processing in V1/V2 contributes to categorization of natural scenes. Journal of Neuroscience, 31(7), 2488-2492. Lamme, V. A., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neuroscience, 23(11), 571-579.

Lamme, V. A., Zipser, K., & Spekreijse, H. (2002). Masking interrupts figure-ground signals in V1.

Journal of Cognitive Neuroscience, 14(7), 1044-53.

Lamy, D. & Zoaris, L. (2009). Task irrelevant stimulus salience affects visual search. Vision Research,

(19)

Luck, S. J., & Gaspelin, N. (in press). How to get statistically significant effects in any ERP experiment (and why you shouldn’t).

O’Reilly, R. C., Wyatte, D., Herd, S., Mingus, B., & Jilk, D. J. (2013). Recurrent processing during object recognition. Frontiers in psychology, 4(124), 1-14.

Ramakrishnan, K., Scholte, H. S., Groen, I. I. A., Smeulders, A. W. M. & Ghebreab, S. (in press). Deeper neural network reveal temporal dynamics of human visual representations.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychologym 8, 382-439.

Scholte, H. S., Jolij, J., Fahrenfort, J. J., & Lamme, V. A. F. (2008). Feedforward and recurrent processing in scene segmentation: Electroencephalography and functional magnetic resonance imaging. Journal

of Cognitive Neuroscience, 20(11), 2097-2109.

Scott, L. S., Tanaka, J. W., Sheinberg, D. L., & Curran, T. (2006). A reevaluation of electrophysiological correlates of expert object processing. Journal of Cognitive Neuroscience, 18(9), 1-13.

Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorization.

PNAS, 104(15), 6424-6429.

Tanaka, J. W., & Taylor, M. (1991). Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 23, 457-482.

Tanaka, J., Luu, P., Weisbrod, M., & Kiefer, M. (1999). Tracking the time course of object categorization using event-related potentials. NeuroReport, 10, 829-835.

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature,

381(6582), 520-522.

Van der Wilt, L. (2017). An exploration of recurrent processing in convolutional neural networks using EEG data from a masked categorization task. (Unpublished bachelor thesis). University of Amsterdam, The Netherlands.

Vanmarcke, S., Calders, F., & Wagemans, J. (2016). The time-course of ultrarapid categorization: The influence of scene congruency and top-down processing. i-Perception, 0(0), 1-16.

Wokke, M. E., Sligte, I. G., Scholte, H. S., & Lamme, V. A. F. (2012). Two critical periods in early visual cortex during figure-group segregation. Brain and Behavior, 2(6), 763-777.

Wyatte, D., Curran, T., & O’Reilly, R. (2012a). The limits of feedforward vision: Recurrent processing promotes robust object recognition when objects are degraded. Journal of Cognitive Neuroscience,

Referenties

GERELATEERDE DOCUMENTEN

For a new material under development, information available on similar materials or relationships, for example, with physicochemical properties can provide an in dicatio n

Er kwam bij beide rassen beduidend meer rozeverkleuring van de stengels voor in de AC-folie objecten en het onbedekte object dan het geval was bij de zwart/witte folie-objec-

Tijdens dit onderzoek zijn kleine ingrepen mogelijk, zoals het verwijderen van een (kleine) poliep of een spiraal waarvan de touwtjes niet meer zichtbaar zijn.. De hysteroscoop is

De resultaten van deze studie verbreden de bestaande kennis over de interactie tussen de neurale mechanismen die verantwoordelijk zijn voor de perceptie van ritme en melodie door

The focus of this paper was on establishing the power positions of actors based on the 4R model, exploring the strategic actions startups take to be better

query suggestion[15, 17], our work deviates from previous studies in that (1) the suggestions are aimed at potential children search intents; (2) the suggestions are constructed in

Population Time period Nominal or real exchange rate used Research method used Main Result Caporale &amp; Doroodian (1994) US imports from Canada United States 1974M1

Next, the visual dictionary is matched to image points selected by different interest point operators and we measure the resulting performance of the same two feature