• No results found

Spatial coherence and contrast energy of an image index complexity and signal the need for recurrent processing

N/A
N/A
Protected

Academic year: 2021

Share "Spatial coherence and contrast energy of an image index complexity and signal the need for recurrent processing"

Copied!
14
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Spatial coherence and contrast energy of

an image index complexity and signal the

need for recurrent processing

Research Masterthesis

Noortje Seijdel – 10092331

Supervisors: Steven Scholte & Sara Jahfari 20-08-2015

(2)

Spatial coherence and contrast energy of an

image index complexity and signal the need

for recurrent processing

Noor Seijdel, University of Amsterdam Supervised by Dr. H. Steven Scholte & Dr. Sara Jahfari

When observing the world around us, perception of everyday scenes feels effortless. Indeed, humans are incredibly fast in perceiving natural images, suggesting that it involves only feed-forward visual processing. However, for complex tasks or under challenging conditions, neural feedback or recurrent processing is assumed to be required. In the current study, we investigated whether the need for recurrent processing during scene categorization depends on the complexity of an image. To index scene complexity, two image statistics, contrast energy (CE) and spatial coherence (SC) were used. While measuring EEG, participants performed a speed-accuracy task in which they had to categorize images based on the presence or absence of an animal. For simple images, differences in evoked activity between animal and non-animal pictures seem to arise early in time in early visual areas, indicating that feed-forward processing might be sufficient. For complex images, this difference in V1 arises late in time, after a difference in higher visual areas is observed, suggesting that neural feedback or recurrent processing is required. Together with previous findings, this pattern of results suggests that SC and CE index scene complexity and signal the need for recurrent processing. Using SC and CE, the brain seems able to efficiently determine the appropriate visual processing mode.

Object recognition, visual cortex, natural scenes, recurrent processing, image statistics, feed-forward sweep

Introduction

When you look around, perception of an everyday scene feels effortless. For example now, looking at your desk, out of your window or at your colleague feels automatic and undemanding. You were probably looking at what is called a ‘natural scene’: a scene that you typically encounter in your natural environment (e.g. people, streets, forests). Humans are indeed incredibly fast in perceiving and recognizing natural images (Potter, 1975; Intraub, 1981; Thorpe, Fize & Marlot, 1996; Fei-Fei, Iyer, Koch & Perona, 2007; Schyns & Oliva, 1994; Joubert, Rousselet, Fize, & Fabre-Thorpe, 2007), implying that it could be the result of a mainly feed-forward network (Serre, Oliva & Poggio, 2007; Delorme, Richard & Fabre-Thorpe, 2000; Thorpe, Fize & Marlot, 1996; vanRullen & Thorpe, 2002). However, many processes that are relevant for object recognition, such as figure-ground segmentation and object grouping, seem to arise from recurrent processing or feedback signals toward V1 (Lamme & Roelfsema, 2000; Roelfsema, Lamme, Spekreijse & Bosch, 2002; Scholte, Jolij, Fahrenfort & Lamme, 2008).

It could be that image complexity influences the level of neural feedback that is needed for object recognition. While target detection for simple objects is fast, this is not the case for complex objects (Treisman & Gelade, 1980). In a recent paper, Wyatte, Jilk and O’Reilly (2014) argue that neural feedback could be important for object identification under

(3)

challenging conditions (e.g. when objects are partially masked or when the image is unclear). Possibly, simple images in which objects are easily segregated may be processed or categorized with a feed-forward sweep alone, while more complex images may require neural feedback because the feed-forward representation is too noisy to provide object information.

In 2009, results by Scholte, Ghebreab, Waldorp, Smeulders & Lamme showed that the visual system is able to employ a simple summary statistic during natural scene perception: the Weibull statistics of the contrast distribution. They showed that two parameters of the Weibull distribution, beta and gamma, were able to explain 80% of the variance in the early ERP. Furthermore, they showed that these parameters can be well approximated from early visual responses in a biologically plausible way, using two parameters: contrast energy (CE) and spatial coherence (SC) (Scholte et al., 2009; Ghebreab, Smeulders, Scholte & Lamme, 2009). Interestingly, these parameters seem to contain information about the complexity of an image and arrange them in a meaningful way (Groen, Ghebreab, Lamme & Scholte, 2012; Groen, Ghebreab, Prins, Lamme & Scholte, 2013). While these statistics are significantly correlated, they represent different characteristics of the images. SC varies with the degree of correlation between local contrast values and gives an indication of the amount of clutter in an image. Images with low SC values seem to be highly structured, while images with high SC values are more cluttered. CE varies with the range of contrast strengths in an image. Images with low CE values seem to contain less depth, while images with high CE values contain more depth and a stronger figure-ground segmentation. Together, they describe an image space in which images with low CE/SC values seem to be simpler, containing clearly isolated objects and less depth, whereas images with high CE/SC values seem to be more complex and cluttered (see figure 1). Thus, these findings suggest that SC and CE could possibly provide information about the level of neural feedback that is required.

Groen et al. (2014) investigated this. For the experiment, CE and SC values were used to selectively sample scenes with varied complexity (three conditions: LOW, MED. HIGH, see figure 1). Participants were asked to perform a speed-accuracy task, in which they had to categorize images based on the presence or absence of an animal. Results showed that subjects were slower and made more errors in the high complexity condition, implying that this task was more difficult. Using fMRI, they investigated brain activity in early visual areas. If SC/CE indeed affect the need for recurrent processing, HIGH complexity images should have led to elevated activity in early visual areas. And indeed, results revealed a difference in average brain activity in area V1 between animal and non-animal scenes, but only for complex images (HIGH).

Figure 1. Examples of the selected images. For the LOW condition (bottom left), MED (center) and HIGH (top right).

In a recent behavioral study, we replicated those findings (Seijdel, Jahfari, Pelzer, Groen, Scholte, in prep). Results showed that participants responded slower and made more mistakes in the HIGH condition as compared to the LOW and MEDIUM conditions. Additionally, we investigated the differential influence of SC/CE on animal and non-animal images. Interestingly, the effects of SC/CE on task performance

(4)

were most prominent for the non-animal pictures. Further analysis using a hierarchical version of the Drift Diffusion Model (HDDM; Wiecki, Soer & Frank, 2013; Ratcliff, 1987; Ratcliff & McKoon, 2008) was performed to investigate the influence of the image complexity on the ease of evidence accumulation and the strategic adjustments in the amount of evidence needed before making a decision. The DDM is known to decompose correct and incorrect RT data into different decision and non-decision components: it assumes that from a starting point z, information begins to accumulate in favor of one of the options. When the evidence accumulation process, quantified by drift rate v, reaches a boundary a, a choice is made and a response is initiated (see figure 2). For complex, non-animal pictures, the HDDM revealed a decreased rate of information processing and a lower decision boundary. Together, this could indicate that, because evidence accumulation is slow for complex images, participants lower the amount of evidence needed for the decision in order to still deliver a timely response.

Figure 2. Schematic representation of the Drift Diffusion Model. From a starting point z, information begins to accumulate in favor of one of the options with drift rate v until it reaches a boundary a, and the decision is made. Non-decision time Terr captures the

processes that are unrelated to decision-making, such as response execution.

Then, using the same paradigm, the effects of SC and CE were investigated separately. In order to do this, images were selectively samples such that they

either varied in their CE value (CE-task) or in their SC value (SC-task). The parameter that was not variable was set to medium values. Participants performed either on the CE-task or on the SC-task. Results showed a less strong and less consequent influence of CE and SC individually on task performance. It seems thus, that the two parameters complement each other.

Taking all these findings into account, there seems to be appealing evidence that SC and CE together provide information about the complexity of an image and subsequently about the level of neural feedback that is needed to process those images.

In the current study, we aim to expand the outlined theoretical framework by further investigating the role of low-level image statistics (SC/CE) in natural scene perception. Specifically, we examine whether image complexity, as indicated by SC/CE values, influences the ‘visual processing mode’ or the need for recurrent processing. If activity in early visual areas is necessary for animal/non-animal categorization after the higher areas have been activated, this would suggest a role for neural feedback. To investigate this, we attempt to replicate earlier findings and combine these with new EEG measurements. As previously found, we expect longer reaction times and more errors for complex images, indicating that those images are more difficult and thus require more extensive processing. Additionally, in terms of DDM parameters, we expect a lower drift rate and a lowered response boundary. Using the ERPs, we investigate whether the differences in evoked activity in early visual areas between animal and non-animal images arise later in time for complex images than for simple images (~after 170 ms). If differences in the ERPs occur after 200 ms in early visual areas, after activation of the higher visual areas, it is unlikely that they result from feed-forward activity alone.

(5)

Materials and Methods

Subjects

Thirty subjects (9 male) participated in the experiment. All participants were aged between 18-25 years (M = 21.9, SD = 1.9) and reported normal or corrected to normal vision. All participants gave written informed consent prior to participation and were rewarded with research credits or monetary compensation (30 euro). Stimuli

A selection of 480 images was obtained from a previous study by Groen, Jahfari, Ghebreab, Lamme & Scholte (2010). For each image, they computed one CE and one SC value and selectively sampled images for three conditions: LOW, MEDIUM, and HIGH (whereby each condition was defined by its CE/SC values). Each condition contained 160 images, half of which contained an animal. Within conditions, animal and non-animal images were matched in their CE and SC

values such that these two categories did not differ from each other in their mean or median values. Experimental procedure

During the experiment, participants performed on an animal vs. non-animal categorization task. In this task, subjects were required to indicate whether the presented image contained an animal or not. There were two trial types: one in which participants had to respond as fast as possible (“speed trials”) and one in which participant had to respond as accurate as they could (“accuracy trials”). Images were presented for 100 ms. Participants performed on 960 trials in total; all 480 images were used for both trial types. The experiment consisted of two main blocks. Trials were presented in a randomized sequence and participants indicated their response using corresponding keyboard buttons. Keyboard buttons were counterbalanced across blocks. Choices and RTs after the start of the presentation of the image were

Figure 3. Experimental paradigm. Animal/non-animal categorization task. During the experiment, participants had to categorize images based on the absence or presence of an animal. Images were shown for 100 ms. On half of the trials, participants were asked to respond as quickly as possible (“speed trials”). On the other half of the trials, participants had to react as accurate as possible (“accurate trials”)

(6)

recorded. For both trial types, participants received feedback on their performance. On the speed trials, participants were presented with “too slow” feedback in case they failed to respond in time (<500 ms), and “on time” when they were quick enough. On the accuracy trials, participants were presented with “correct” and “incorrect” feedback. Between trials, a fixation-cross was presented with a semi-randomly chosen duration of either 350, 400, 450, 500 or 550 ms, averaging to 450 ms. Stimuli were presented on a 23 inch Asus LCD display (type) positioned at eyelevel at a refresh rate of 60 Hz, using Presentation software (version 17.0, Neurobehavioral Systems, Inc.). EEG recordings

EEG recordings were done using a Biosemi 64-channel Active Two EEG system (Biosemi Instrumentation BV, www.biosemi.com). Electrodes were placed according to the international 10-10 system, with reference electrodes placed at the earlobes. Because we were mainly interested in the visual system, two frontal electrodes (F5, F6) were replaced by two extra occipital electrodes (I1, I2). Eye movements were monitored using electrodes placed lateral to both eyes (horizontal electro-oculogram, hEOG) and above and below the left eye (vertical electro-oculogram, vEOG), aligned with the pupil location when participants looked straight ahead. Data analysis: general

For analysis, data of 27 participants were taken into account. Two subjects were excluded due to technical problems and incorrect placement of the extra occipital electrodes. One participant was excluded as he/she made many errors on the trials (2.5 SD above the mean). All further analyses were performed on the remaining 27 participants. RTs faster than 100 ms were considered “fast guesses” and were removed. RTs longer than 2.5 above the mean were discarded as outliers. For all statistical tests, an alpha level of 0.05 was used.

Data analysis: behavioral

Task performance was assessed by computing the mean RT and percentage of errors for each subject. First, results were tested for main or interaction effects of Speed-Accurate (2 levels) using a repeated-measures ANOVA with Condition (3 levels), Animal-non-Animal (2 levels) and Speed-Accurate (2 levels) as within-subject factors and RT and percentage of errors as the dependent variables. In case there was no interaction with Speed-Accuracy, data of speed and accurate trials were collapsed. In case there was an interaction, data of speed and accurate trials were analyzed separately.

Data analysis: Hierarchical Drift Diffusion Model We fitted a hierarchical version of the Drift Diffusion Model (HDDM) to the RT distributions of correct and incorrect responses. The DDM assumes that, when making a decision, From a starting point z, information begins to accumulate in favor of one of the options with drift rate v until it reaches a boundary a, and the decision is made. Non-decision time Terr captures the processes that are unrelated to

the decision making, such as response execution. The full DDM also contains three parameters for between-trials variability: variability in starting point sz, variability in non-decision time st and variability in evidence accumulation ƞ.

Eleven different models were tested, in which drift rate (v) and boundary separation (a) were either fixed or varied across conditions (Table 1). For each model, 80,000 samples were generated from the posteriors. In order to assure chain convergence, the first 30,000 samples were discarded (burn). Then, we used every 10th sample, resulting in a trace of

5000 samples. On the basis of the Deviance Information Criterion (DIC), the optimal model (lowest DIC) was selected (Spiegelhalter, Best, Carlin & van der Linde, 2002). The model was then tested for convergence using the Gelman-Rubin statistic, which compares the intra-chain variance of the

(7)

model to the intra-chain variance of different runs of the same model. All chains were converged, all values were close to 1.

In order to then test our hypotheses with respect to the DDM parameters, the same ANOVAs were performed, with drift rate v and boundary a as the dependent variables.

Data analysis: EEG

The raw EEG signal was pre-processed using a high-pass filter of 0.01 Hz (12 db/octave), a low-high-pass filter at 30 Hz (24 db/oct), and a notch filter at 50 Hz. Next, the signal was epoched from -250 ms to 750 ms around stimulus onset. Epochs that contained artifacts due to muscle or eye movement were manually rejected. Artifacts caused by blinks were removed using independent component analysis (ICA). Data was baseline corrected (based on data between -200 and 0 ms relative to stimulus onset) and down sampled to 256 Hz. To increase spatial localization, we performed a spline Laplacian transformation (Perrin, Pernier, Bertrand & Echallier, 1989). In order to reduce the multiple comparisons problem, we defined three poolings of interest based

on the previous study by Groen et al. (in prep): an occipital pooling (channels O1, O2, I1, I2, Oz, Iz), a medial peri-occipital pooling (P1, P2, P3, P4, PO3, PO4, POz, Pz) and a lateral peri-occipital pooling (P5, P6, P7, P8, P9, P10, PO7, PO8).

Regarding the EEG analyses, we first computed the ERPs of all even animal and non-animal trials for each subject. The ERP-difference was computed (animal minus non-animal) and tested against zero using a permutation test based on the t-statistic using every time point between 40-250 ms (because it is unlikely that the first differences will occur before 40 ms and after 250 ms). Based on these findings, time-windows were defined in which there was a significant animal/non-animal difference. Each window that consisted of more than one time-point was saved. In those time-windows, the ERP-difference of all odd animal and non-animal trials was tested against zero, for each condition. P-values were corrected for multiple comparisons (number of conditions, time-windows and poolings) using FDR correction (pFDR = 0.0140). Using this even-odd trial procedure, we ensure that time-window selection did not bias our results, while reducing the multiple comparisons problem (see figure 4).

Figure 4. Cross-validation procedure. First, the average animal/non-animal difference on even trials was tested against zero for each time-point between 40-250 ms using permutation testing. Time-windows in which the difference was significant (p < .05) were used to define windows. In those widows, the animal/non-animal difference on odd trials was tested against zero for each condition using t-tests.

(8)

Figure 5. Effects of spatial coherence and contrast energy values on animal vs. non-animal categorization. For non-animal trials, both RT and percentage of errors increased when images were high in their SC/CE values. For these images, the rate of evidence accumulation decreased, while the decision boundary was lowered. Errors bars represent 1 SEM. * = p < .05, ** = p < .001.

Results

The repeated measures ANOVA on RT with Condition (3 levels: LOW, MEDIUM, HIGH), Animal (2 levels: animal; non-animal) and Speed-Accuracy (2 levels: speed, accuracy) as within-subjects factors, revealed a main effect of Speed-Accuracy, F(1, 26) = 93.754, p < .001, η2par = .783. According to task instruction,

subjects responded faster on the “speed” trials than on the “accurate trials”. There were no differences in the speed-accuracy trade-off between conditions: there were no interaction effects with Speed-Accuracy. For the percentage of errors, results also revealed a main effect of Speed-Accuracy F(1, 26) = 57.456, p < .001, η2par = .688. Subjects made more

errors on “speed” trials than on “accurate trials”. Additionally, there was a Condition*Speed-Accuracy

interaction effect F(1, 26) = 4.127, p = .022, η2par =

.137. Because of the interaction, speeded and accurate trials were analyzed separately in the following sections.

Behavioral results Reaction times

In figure 5A, RTs are plotted for all conditions. The repeated measures ANOVA on speed trials revealed main effects of Animal, F(1, 26) = 15.335, p = .001, η2par = .371 and Condition, F(2, 52) = 10.714, p < .001,

η2par = .292. Subjects were faster on animal trials

than on non-animal trials. No interaction effect was observed. For accurate trials, the repeated measures ANOVA on RT revealed a main effects of Animal, F(1, 26) = 20.824, p < .001, η2par = .445, Condition, F(2, 52)

(9)

= 12.985, p < .001, η2par = .333, and an Condition by

Animal interaction effect, F(2, 52) = 6.819, p = .002, η2par = .208. Bonferroni corrected paired comparisons

indicated no significant differences for animal trials. For non-animal trials, participants responded slower for complex images (HIGH) as compared to the LOW and MED images (t(26) = 4.642, p < .001; t(26) = -4.983, p < .001).

Percentage errors

In figure 5B, the percentages of errors are shown. For speed trials, there was a main effect of Condition, F(1, 26) = 10.709, p < .001, η2par = .292, and an interaction

effect, F(2, 52) = 6.928, p = .002, η2par = .210.

Subsequent paired comparisons indicated that the percentage of errors in the MED condition was significantly lower compared to the LOW and HIGH condition for animal trials (t(26) = 3.099, p = .005; t(26) = -3.875, p = .001). For non-animal trials, the percentage of errors was significantly higher in the HIGH condition compared to both the LOW and the MED condition (t(26) = -4.197, p < .001; t(26) = -3.027, p = .006). For accurate trials, there was a main effect of Animal, F(1, 26) = 35.406, p < .001, η2par = .577.

HDDM results

As shown in table 1, model selection indicated that a model (model eleven), where both drift rate v and

response boundary a were allowed to vary across Condition (LOW, MED, HIGH) and across Animal-non-Animal while response boundary a was additionally allowed to vary across Speed-Accuracy, best explained the observed data.

Drift rate

In figure 5C, the drift rates are visualized. The repeated measures ANOVA on drift rate v revealed main effects of Animal, F(1,26) = 28,219, p < .001, η2par = .520 and Condition, F(2,52) = 13.891, p < .001,

η2par = .348 (see 7C). Additionally, there was an

Animal*Condition interaction effect, F(2,52) = 19.662, p < .001, η2par = .431, indicating that the

Condition effect was more apparent for non-animal trials. Paired comparisons revealed a higher drift rate in the MED condition compared to the LOW and HIGH condition for animal trials (t(26) = -4.445, p < .001; t(26) = 3.629, p = .001). For non-animal trials, the rate of evidence accumulation was lower for trials in the HIGH condition as compared to the LOW and MEDIUM (t(26) = 5.603, p < .001; t(26) = 4.171, p < .001).

Boundary

In figure 5D, the response boundaries are plotted. There were main effects of Animal, F(1,26) = 45.556,

Table 1. HDDM model selection. For the tested models, drift rate v and response boundary a were either allowed to vary across Speed-Accuracy (S/A), Animal (A/NA), Condition (L/M/H) or a combination of those.

Model v allowed to vary across a allowed to vary across DIC

11 L/M/H, A/NA S/A, A/NA, L/M/H -13,281,193,702

10 L/M/H, A/NA S/A, A/NA -13,230,310,375

9 A/NA S/A, A/NA -13,235,637,507

8 L/M/H S/A, A/NA -13,001,063,682 7 L/M/H, A/NA S/A -12,759,698,288 6 A/NA S/A -12,784,321,571 5 L/M/H S/A -12,713,215,917 4 - S/A -12,697,062,229 3 - A/NA -8,315,598,798 2 A/NA - -8,142,542,677 1 L/M/H - -8,074,848,850 0 - - -8,065,908,986

(10)

p < .001, η2par = .637, Condition, F(2,52) = 3.772, p

=.030, η2par = .127, and an Animal*Condition

interaction, F(2,52) = 5.127, p = .009, η2par = .165.

Subsequent pairwise comparisons for animal trials showed no significant effects. For non-animal trials, the response boundary was significantly higher in the LOW condition as compared to the HIGH (t(26) = 3.534, p = .002). For accurate-trials, there were main effects of Animal, F(1,26) = 58,849, p < .001, η2par =

.694, Condition, F(2,52) = 3.639, p = .033, η2par = .123,

and an interaction effect, F(2,52) = 8.727, p = .001, η2par = .251. More evidence was needed when the

image contained no animal. Subsequent pairwise comparisons showed that, for animal-trials, less evidence was needed for the LOW condition than for the MED condition (t(26) = -2.901, p = .007). For non-animal trials, more evidence was needed for the LOW condition as compared to the HIGH (t(26) = 3.120, p = .004).

In sum, these results show that SC/CE differentially influence task performance on animal and non-animal trials. For animal trials, participants make the least mistakes on MEDIUM images. For those images, the rate of evidence accumulation is higher. For non-animal trials, performance is worse for HIGH complex images, as indicated by higher response times and lowered accuracy. For those images, the rate of evidence accumulation is lower, accompanied by a lowered response boundary. EEG results

For each pooling, the ERPs of all even animal and non-animal trials for each subject were computed. The ERP-difference was computed (animal minus non-animal) and tested against zero using a permutation test based on the t-statistic using every time point between 40-250 ms. Based on these findings, time-windows were defined. In those time-time-windows, the ERP-difference of all odd animal and non-animal trials

was tested against zero, for each condition (see Data analysis: EEG and figure 4).

Occipital pooling

As can be seen in figure 6A, three time-windows for both speed and accurate trials were selected based on even trials. For speed trials, results show no differences between animal and non-animal trials in the first time-window. In the second window, around 100 ms, there was an animal/non-animal difference on odd trials in the LOW and MED condition (t(26) = 2.635, p = .014; t(26) = 3.169, p = .004). In the third window, from around 220 ms, there was a significant animal/non-animal difference for the HIGH condition (t(26) = 3.110, p = .005). These results in early visual areas show that for simple images, the animal/non-animal difference emerges early in time, while for complex images this difference occurs relatively late. For accurate trials, the animal/non-animal difference again occurred early in time in the MED condition (around 100 ms) (t(26) = 3.676, p = .001). Then, during the second and third time-windows (125-145, 165-185 ms), animal and non-animal trials differed from each other on simple images (LOW, t(26) = -4.516, p < .001; t(26) = -4.097, p < .001). Visually, it seems that the difference wave of the HIGH condition again starts to rise after 220 ms. However, based on even trials we did not test this time-window. Again, these results show an early animal/non-animal difference for simple images. Medial peri-occipital pooling

As shown in figure 6B, one time-window was selected for both speed and accurate trials (roughly from 150-250 ms). For both speed and accurate trials, there was an enhanced animal/non-animal ERP difference for all conditions (Speed; t(26) = 3.161, p = .0014, t(26) = 3.931, p < .001, t(26) = 4.593, p < .001, Accurate; t(26) = 4.308, p < .001, t(26) = 5.300 p < .001, t(26) = 4.818, p < .001). These results show that, at medial peri-occipital channels, the animal/non-animal ERP difference is reliably established after 150 ms.

(11)

Lateral peri-occipital pooling

In figure 6C, the difference ERPs at lateral peri-occipital channels are shown. For speed trials, we tested in one time-window ranging from around 160-210 ms. Results showed an enhanced ERP difference for simple images (LOW; t(26) = -4.928, p < .001). For

accurate trials, two time-windows were selected. In the first, ranging roughly from 45-65 ms, no differences were observed. In the second window, ranging from 160-210 ms, results showed an enhanced ERP difference for images in the LOW and MED condition (t(26) = -3.538, p = .002; t(26) = -3.013, p = .006).

Figure 6. Animal/non-animal differences on odd trials in the predefined time-windows. In the predefined time-windows, for each condition (LOW, MED, HIGH) the animal/non-animal differences were tested against zero using t-tests (FDR corrected). Significant differences from zero are displayed with solid lines at the bottom of the time-windows.

(12)

Discussion

In this study, we hypothesized that image complexity, as indexed by SC/CE values, influences the visual processing mode of natural images. Specifically, we investigated whether SC/CE could be used as a measure that indicates images complexity and subsequently signals the need for recurrent processing. A recent fMRI study by Groen et al. (in prep) provided the first evidence for such a role. Using an animal/non-animal categorization task, they found elevated differential activity in early visual areas for images with high SC/CE values. In the current study, we aimed to replicate the earlier behavioral findings and to complement them with new EEG measurements.

First of all, as in previous studies we found that images with high SC/CE values resulted in higher response times and lowered accuracy. In our model-based analyses we used a hierarchical version of the drift diffusion model to obtain estimates of participants’ rate of evidence accumulation and response caution (HDDM; Wiecki et al., 2013; Ratcliff, 1987; Ratcliff & McKoon, 2008). We found that the rate of evidence accumulation was lower for the “high complex” images, accompanied by a lowered response boundary. Together, these results suggest that those trials were more difficult to categorize.

Second, using EEG, we found that for simple images, differences between the animal and non-animal ERP start to emerge in occipital channels first, around 100 ms, followed by differences at medial- and lateral peri-occipital sites. These results are consistent with a feed-forward sweep of information through the visual system (Roelfsema et al., 2002; Lamme & Roelfsema, 2000; Scholte et al., 2008; Serre et al., 2007; Thorpe et al., 1996; Thorpe & Fabre-Thorpe, 2001). For complex images, with HIGH SC/CE values, this process seems to occur differentially. In our results, the first animal/non-animal ERP differences are found at the medial peri-occipital channels, in a time-window that starts around 150 ms. From there, object information seems to spread to the occipital pooling, where the ERP differences in

the HIGH condition appeared only after 220 ms. This order of processing suggests that for complex images, categorization depends on recurrent processing or feedback signals towards V1 (Bar, Kassam, Ghuman, Boshyanm Schmid, Dale, Hämäläinen, Marinkovic, Schacter, Rosen & Halgren, 2006; Koivisto et al., 2011, Roelfsema et al., 2002; Koivisto, Kastrati & Revonsuo, 2013, O'Reilly, Wyatte, Herd, Mingus, & Jilk, 2013).

Overall, these results show that SC/CE could be a measure by which the brain can efficiently determine which visual processing mode is appropriate or required.

Animal versus non-animal

Interestingly, the behavioral effects of SC/CE seem to differ between animal pictures and non-animal pictures. One possible explanation for this finding could be that search can be terminated earlier in trials in which an animal is present, simply because it is more likely that participants detect the animal before they have searched the complete image (even though the location of the animal is unknown). Therefore, participants not always have to perceive the complete image in trials in which an animal was present, possibly resulting in a differential influence of SC/CE values.

Performance on MEDIUM trials

One thing that stands out from these results is the fact that people perform so well on medium trials, sometimes even better than on low trials. One explanation could be that we live in a mediocre world, and that images with medium SC/CE values just best reflect real life. Because the visual system has evolved with- and in response to the natural environment, it is very likely that the visual system is tuned to those statistical regularities in natural scenes (Doi & Lewicki, 2005) If such an image database exists, it would be interesting to test this by running an enormous ‘real-life’ image database and simply count the number of scenes with medium SC/CE values.

Another explanation could be that medium images hold the best context vs. object ratio. In the

(13)

real world, objects never occur in isolation: they co-vary with other objects and specific environments. It is, for example, more likely that you see a goat on a mountain top than in a grocery store. Objects appearing in a familiar background are detected faster and with higher accuracy than objects in an unexpected environment (Oliva & Torralba, 2007; Neider & Zelinski, 2006). On most LOW images, there is little context, thus providing little extra ‘cues’ on the presence or absence of the animal. For HIGH images, on the other hand, there may be too much distraction.

Task differences

It could be that the different trial types (animal, non-animal; LOW, MED, HIGH) have influenced the nature of the task. Simple images – with a high figure-ground segmentation (LOW) – may have led to a more ‘pop-out’ type of stimulus, while more complex images demanded visual search. In that sense, LOW/MED trials can be seen as categorization trials that don’t require object recognition (‘there is an object on a background, is it an animal or not’), while HIGH trials require visual search and object recognition. Consequently, categorization of simple trials may not depend on processes that require recurrent processing, such as figure-ground segmentation and grouping (Oliva & Torralba, 2006; Koivisto et al., 2013).

Link between DDM and EEG

Given these results, a compelling next step would be to investigate the link between the drift diffusion parameters and the EEG activity. By investigating the correlation of the parameters and the ERP amplitude between subjects, we could possibly clarify at which time-point in the categorization process a persons’ speed of evidence accumulation or decision boundary would be reflected in the EEG. Unfortunately, this analysis was not possible in the current study. Drift Diffusion Model selection resulted in twelve different response boundary values for the conditions and six drift rate values. Due

to lack of EEG data, we were unable to reliably link these results to each other.

Conclusion

In sum, this study suggests that the spatial coherence and contrast energy of an image index scene complexity and subsequently indicate whether recurrent processing is needed.

References

Bar M., Kassam K., Ghuman A., Boshyan J., Schmidt A.

(2006). Top-down facilitation of visual

recognition. Proc. Natl. Acad. Sci. U.S.A. 103, 449–454 Doi, E., & Lewicki, M.S. (2005). Relations between the

statistical regularities of natural images and the response properties of the early visual system.

Japanese Cognitive Science Society, SIG P&P, pp. 1–8.

Fei-Fei, L., Iyer, A., Koch, C., & Perona, P. (2007). What do we perceive in a glance of a real-world scene? Journal of Vision, 7(1), Article 10.

Ghebreab, S., Smeulders, A.W.M., Scholte, H.S., & Lamme, V.A.F. (2009). A biologically plausible model for rapid natural image identification. Adv Neural Inf Process Syst1–9.

Groen, I.I.A., Ghebreab, S., Lamme, V.A., & Scholte, H.S. (2012a). Spatially pooled contrast responses predict neural and perceptual similarity of naturalistic image categories. PLoS Computational Biology, 8.

Groen, I.I.A., Ghebreab, S., Prins, H., Lamme, V.A., & Scholte, H.S. (2013). From image statistics to scene gist: evoked neural activity reveals transition from low-level natural image structure to scene category. Journal of

Neuroscience, 33, 18814-18824.

Groen, I.I.A., Ghebreab, S., Lamme, V.A.F., & Scholte, H.S. (2010). The role of Weibull image statistics in rapid object detection in natural scenes. Journal of Vision,

10(7), 992.

Intraub, H. (1981). Rapid conceptual identification of sequentially presented pictures. Journal of Experimental Psychology Human Perceptual Performance, 7, 604-610.

Joubert, O., Rousselet, G., Fize, D., & Fabre-Thorpe, M. (2007). Processing scene context: Fast categorization and object interference. Vision Research, 47, 3286– 3297.

(14)

Koivisto, M., Kastrati, G., & Revonsuo, A. (2014). Recurrent processing enhances visual awareness but is not necessary for fast categorization of natural scenes.

Journal of Cognitive Neuroscience, 26(2):223-31.

Lamme, V.A.F., & Roelfsema, P.R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neuroscience, 23, 571-579. Neider, M.B. and Zelinski, G.J. (2006) Scene context guides

eye movements during visual search. Vision Res. 46, 614–621

Oliva, A., & Schyns, P. (1997). Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34, 72–107.

Oliva, A. & Torralba, A. (2006).Building the gist of a scene: the role of global image features in recognition. Prog

Brain Res155: 23–36.

Oliva, A., & Torralba, A. (2007). The role of context in object recognition Trends in Cognitive Sciences, 11(12), pp. 520-527.

O'Reilly, R., Wyatte, D., Herd, S., Mingus, B., & Jilk, D. (2013). Recurrent processing during object recognition.

Frontiers in Psychology, 4(124).

Potter, M.C. (1975). Meaning in visual search, Science, 187, 965–966.

Roelfsema, P.R., Scholte, H.S., & Spekreijse, H. (1999). Temporal constraints on the grouping of contour segments into spatially extended objects. Vision

Research, 39, 1509-1529.

Roelfsema, P.R., Lamme, V.A.F., & Spekreijse, H. (2000). The implementation of visual routines. Vision Research,

40, 1385-1411.

Roelfsema, P.R., Lamme, V.A.F., Spekreijse, H., & Bosch, H. (2002). Figure-ground segregation in a recurrent network architecture. Journal of Cognitive Neuroscience, 14(4), 525-37.

Rousselet, G.A., Husk, J.S., Bennett, P.J., & Sekuler, A.B. (2008).Time course and robustness of ERP object and face differences.J Vis8: 1–18

Rousselet, G.A. & Pernet, C.R. (2011).Quantifying the time course of visual object processing using ERPs: It's time to up the game.Front Psych2: 1–6

Scholte, H.S., Jolij, J., Fahrenfort, J.J., & Lamme, V.A.F. (2008). Feedforward and recurrent processing in scene segmentation: electroencephalography and functional magnetic resonance imaging. Journal of Cognitive

Neuroscience, 20(11), 2097-109.

Schyns, P.G., & Oliva, A. (1994). From blobs to boundary edges: evidence for time- and spatial-scale-dependent scene recognition. Psychological Science, 5, 195-200. Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward

architecture accounts for rapid categorization.

Proceedings of the National Academy of Sciences, USA, 104, 6424–6429.

Simoncelli, E. P., & B. A. Olshausen (2001). Natural image statistics and neural representation. Annual Review of

Neuroscience 24, 1193-1216.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & Van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: series B

(Statistical Methodology) 64, 583–639.

Thorpe, S.J., & Fabre-Thorpe, M. (2001). Seeking categories in the brain, Science, 291, 260 – 263.

Thorpe, S.J., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system, Nature, 381, 520 – 522.

VanRullen, R., & Thorpe, S. J. (2002). Surfing a spike wave down the ventral stream. Vision Research, 42, 2593– 2615.

Wiecki, T., Soer, I., & Frank, M. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in python. Frontiers in Neurinformatics 7: 14.

Wyatte, D., Jilk, D., & O'Reilly, R. (2014). Early recurrent feedback facilitates visual object recognition under challenging conditions. Frontiers in Psychology, 5(674).

Referenties

GERELATEERDE DOCUMENTEN

Chapter 8 by Jakkie Cilliers (Stability and security in Southern Africa) provides and overview of the state of the nation in Southern Africa, as well as an update and cursory

If the rules of a smart contract conflict with what follows from the legal agreement, a party can in principle ask the court to correct the implementation of the smart contract..

These results indicated a positive effect of feed supplemented with SMS on the meat as higher amounts of the unsaturated fatty acid oleic acid was observed for the experimental

Na deze inleiding zijn er acht werkgroepen nl.: Voorbeelden CAI, De computer als medium, Probleemaanpak met kleinere machines, Oefenen per computer, Leer (de essenties

Omdat vrijwel alle dieren behorend tot de rassen in dit onderzoek worden gecoupeerd is moeilijk aan te geven wat de staartlengte is zonder couperen, en hoe groot de problemen zijn

Volgens de vermelding in een akte uit 1304, waarbij hertog Jan 11, hertog van Brabant, zijn huis afstaat aan de kluizenaar Johannes de Busco, neemt op dat ogenblik de

By using concepts from digital media studies, animal studies and disability studies, this thesis analyses the Instagram profiles of micro-celebrity cats.. We find that while the

the proposed solution based on the APS metric &amp; LC check prevents the appearance of disturbing “halo” artifacts and it maintains a high level of detail in the texture regions