• No results found

GRANDMOTHER CELL THEORY: On the impact of human single cell recordings

N/A
N/A
Protected

Academic year: 2021

Share "GRANDMOTHER CELL THEORY: On the impact of human single cell recordings"

Copied!
14
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The way we perceive our environment is character-ized by a great sense of unity. Continuously, the brain gets bombarded with an enormous amount of infor-mation from all our five senses. It somehow man-ages these streams of information and integrates them into one homogeneous percept, which we call consciousness. We experience a conscious percept seemingly without any effort, this forms a stark con-trast with the underlying computational difficulties of such a feat.

The visual system alone makes use of a se-ries of complex computations to correctly identify individual objects in the visual field. The problem of how the brain would be able to discriminate be-tween objects and extract the object identity from several thousand possibilities has been discussed for decades.

Around 60 years ago neuroscientists came up with a possible solution to this problem. They

hypothesized that the brain might have highly spe-cialized neurons that only respond to one individual object or person. These hypothetical neurons are popularly known as ‘grandmother cells’, as originally named by Jerry Lettvin (Rose, 1996).

The researchers who predicted the existence of such units in the brain had little or no empirical evidence that supported their hypotheses. They came to their conclusion by means of pure specu-lation or by combining several lines of indirect evi-dence (Gross, 2002). Even though recordings from single units in mice, cats and primates had taken a large step forward in that time period, the studies that were conducted focused primarily on more ba-sic visual processes.

The notion of grandmother cells would prob-ably have vanished quickly if not for the accumulat-ing reports of cells that respond to highly complex shapes, such as hands and faces (Barlow, 1952;

G

RANDMOTHER

C

ELL

T

HEORY

On the impact of human single cell recordings

Guido T. Meijer

Institute of Interdisciplinary Studies, Cognitive Science Center Amsterdam, University of Amsterdam

Abstract | The grandmother cell theory has been conceived in the 60’s to account for the

way on which the brain might represent complex stimuli. It describes the possibility that the

brain makes use of highly selective neurons that are tuned to one particular complex object,

such as the appearance of one’s grandmother. This theory has been developed with none or

minimal empirical data to rely on. Nowadays, technological advances allow neuroscientists

to record from single neurons in human epilepsy patients using clinically implanted

electrodes. This method has revealed that there are highly selective neurons in the human

brain that respond solely to one complex object, such as the appearance of actress Jennifer

Aniston. Does the processing of information become so sparse that a one-to-one mapping

of objects to neurons arises in higher levels of visual processing or are there alternative

explanations? What can our modern day understanding of the visual system tell us about

the relatively old grandmother cell theory?

(2)

Gross, 1969). Due to the increasing amount of ob-servations of highly specialized perceptual neurons, the idea of grandmother cells never seized to exist.

In the end of the 90’s an important develop-ment occurred; the use of implanted intracranial EEG electrodes became a more widely used clini-cal method in patients who suffered from severe epilepsy to determine the source of their affliction. These implanted electrodes had the additional ben-efit that small wires could be co-implanted with the EEG electrodes that allowed single cell recordings to be made. This allowed neuroscientists to record from single neurons in awake and behaving humans, something that is otherwise impossible.

What have these studies contributed to the grandmother cell theory? To answer this question we must first explore the original definition of the grandmother cell theory. To fully understand the possible existence of grandmother cells we must then dive into the way information is processed by the visual system. Finally the influence of current research conducted using intracranial recordings on the grandmother cell theory will be discussed.

The Grandmother Cell; history and

definition

The grandmother cell has many names and many similar but slightly different interpretations concern-ing its nature. It has also been called pontifical cell, cardinal cell and gnostic cell. All these names have been coined by different scientists who each came to their definition by different means (for a review see Gross 2002). In this chapter the origin of the idea of such a cell shall be discussed and the precise defini-tion of its funcdefini-tional properties will be explored.

The first documented description of a unit that meets the description of a grandmother cell was in 1940, in Charles Scott Sherrington’s book Man on His Nature. In his book Sherrington talks about the unity of our perception and postulates that our feel-ing of ‘now’ is always a situation with a sfeel-ingle mean-ing. He was aware that information from different modalities converges in the brain and hypothesized that this integration could ultimately converge to a single pontifical cell. Sherrington’s pontifical neu-rons were multi-modal by definition, they respond to a complete sensory scene which is built up by information from all the five senses. It is important to emphasize that Sherrington had no empirical evi-dence whatsoever supporting the existence of such

a cell, they were purely hypothetical. Even though Sherrington suggested the idea of “supreme con-vergence on one ultimate pontifical nerve cell” he ultimately dismissed this idea in favor of his million-fold democracy concept in which single units have no significant meaning.

The first empirical evidence in favor of sin-gle units which respond to complex stimuli came 12 years later in Barlow’s paper “Summation and Inhi-bition in the Frog’s Retina” (1953). He described an experiment in which he found ganglion cells which responded strongly to a small black disk moving quickly back and forth within the cell’s receptive field. When stimulating such a “fly detector” cell in a living frog the behavioral response was to move to-wards the stimulated region of the visual space and exhibit feeding responses. However, since there was no real theoretical framework in favor of the exist-ence of such cells yet, the general scientific commu-nity did not take notice of this discovery.

The theoretical framework Barlow needed was eventually developed by Jerzy Konorski in his book “Integrative Activity of the Brain” (1967). Una-ware of Barlow’s discovery and without anything in the literature that pointed him towards this direc-tion he conceived the idea that the cortex might have highly specialized sensory neurons that re-spond uniquely to, for example, hands and faces. He called these cells gnostic neurons, these cells would also group on the cortex in gnostic fields. With this statement he was the first to predict the existence of a cortical area that responds to a specific class of visual objects (e.g. the fusiform face area).

The term grandmother cell was coined by Jerry Lettvin in a lecture which he gave for his M.I.T. course “Biological Foundations for Perception and Knowledge” around 1969. At this time Lettvin was

(3)

only speculating on how the brain would represent individual objects, he was not aware that this prob-lem had been thoroughly addressed by Konorski several years earlier. He told his students a made-up story about the neurosurgeon Akakhi Akakhievitch who had identified 18,000 neurons that responded only to a mother, in every possible way she could be presented. After the doctor had removed all these neurons in his patient Portnoy, who suffered from a severe obsession with his mother, the concept of ‘mother’ had been completely wiped out from Port-noy’s brain. Dr Akakhievitch then went on to search for grandmother cells. This interpretation is very similar to the gnostic cell theory; a grandmother cell is a highly specialized cell that responds uniquely to one complex visual stimulus, independent from all different variations in which said stimulus may be presented.

From this point on the term grandmother cell spread through the scientific community quickly and a debate concerning the existence of such cells was started. Barlow took this opportunity to put his previously ignored work back into the spotlights. In 1972 he published an elaborate paper concerning the involvement of single units in perception. He de-fined cardinal cells as a group of neurons that togeth-er respond to a single complex stimulus. In contrast to grandmother cells, cardinal cells each respond to a different feature of the stimulus and together they signal the presence of the stimulus they encode.

Nowadays, the most widely used term and definition is the grandmother cell, which is almost similar to the gnostic cell by its functional descrip-tion. Nowadays the existence of grandmother cells is still under debate in the literature. What can modern day science tell us about this relatively old theory? Since the first conception of the grandmother cell theory there has been a massive increase in knowl-edge concerning the way the visual system recogniz-es individual objects and facrecogniz-es.

Visual processing; from retina to

recognition

The visual system is one of the most remarkable pieces of machinery the brain has at its disposal to generate our consciousness. Without any apparent effort the visual system recognizes multiple objects and persons in the visual field and classifies them into their corresponding categories. The brain is somehow capable of efficiently identifying an object

irrespective of all the possible variations in which the object can appear in the visual field. A single ob-ject can produce an infinite amount of images that can be projected onto the retina, the object can vary in size, illumination, orientation, structure and it can be partially covered by a different object. Not two instances of the presentation of the same object will produce the same image on the retina and there-fore the same activation of retinal ganglion cells. Still we effortlessly use the information from the retina to recognize and classify all objects present in the visual field. The visual information is then somehow transformed from a purely visual sensation into a concept or an idea which can be associated with re-lated concepts or ideas and subsequently be stored in the memory for later recall.

How does the brain perform this extraordi-nary feat? The key seems to lay in the way that the visual cortex is organized. Hubel and Wiesel (1962) were the first to suggest that the visual cortex might be hierarchically organized. They distinguished be-tween simple cells and complex cells. Both these cells respond to oriented edges and gratings, how-ever, the complex cells show a degree of spatial in-variance. Because the complex cells pool their inputs from a number of simple cells they are less sensitive to where an object is in the visual field. From this Hubel and Wiesel reasoned that this might be the first step in hierarchically organized system in which every level shows a higher degree of invariance to-wards, for example, location on the retina.

And indeed, a vast body of research is con-sistent with this view. Visual information from the retina is sent to the lateral geniculate part of the thalamus from which is it conveyed to V1; the first cortical area that receives visual information (Felle-man & Van Essen, 1991). V1 seems to stand at the base of a set of cortical areas that are arranged in a pyramid fashion and are interconnected by both feedforward and feedback connections (Lamme & Roelfsema, 2000; Supér et al., 2001; Van Rullen, 2008; Van Kleef et al., 2010). From V1 we can dis-tinguish two general streams of information: the dorsal and the ventral stream (Goodale et al., 1991; Goodale & Milner, 1992; Mishkin et al., 1983; Wang et al., 2012). The dorsal stream is a set of areas that directs information dorsally towards the parietal lobe. This stream of information is implicated with the encoding of movement and spatial relationships and is therefore also called the ‘where’ pathway. The ventral stream sends information ventrally towards the temporal lobe. This stream is thought to be

(4)

re-sponsible for object recognition and is also called the ‘what’ pathway. Since the focus of this paper is on the recognition of objects the attention will be directed towards the ventral stream of processing.

The ventral stream; sparseness of

representation

The visual system applies a processing method in which information ascends through multiple lev-els. In each level computations take place that are meaningful for the level above. One would logically assume that the lowest level carries out a large num-ber of relatively simple computations which would result in a non-specific global representation. This in-formation can be conveyed to the level above which would apply more complex computations to further process the visual scene. The result of this would be that in each subsequent level the computations become more complex but the number of compu-tations needed would become less. And indeed, if we look at the visual system we can observe such a phenomenon.

Consider the three visual areas V1, V2 and V4. V1 is the first cortical area that receives input from the retina and should therefore carry out the simplest computations of the three. Hubel and Wie-sel (1959, 1962) already showed that V1 responds most strongly to light stimuli, oriented bars and mov-ing gratmov-ings. These are all relatively simple stimuli if we consider the geometrical richness of any visual scene.

Following the reasoning set out before, V2 should then use the information from V1 in a mean-ingful way. It has been shown that V2 responds most strongly to complex stimuli such as arcs, intersect-ing lines and angles (Hegdé & Van Essen, 2000). By making use of the detection of oriented bars by V1, V2 can successfully decide whether there is a corner present. This information is subsequently propagat-ed to V4 which in turn makes even higher order deci-sions. From the already somewhat complex shapes that have been found in V2, V4 constructs objects. V4 is also implicated with the maintenance of color constancy, figure-ground segmentation and covert attention (for a review see Roe et al., 2012). Further-more, lesions to V4 result in deficits in the detection of identity preserving transformations of objects. A monkey with a V4 lesion is unable to correctly iden-tify a previously learned object when it is changed in, for example, size or contrast (Schiller, 1995). This

indicates that V4 already shows a degree of invari-ance towards the detection of objects and is able to signal the presence of an object irrespective of the way in which it is presented.

The literature provides us with a large body of research that point towards a view in which in each subsequent level of processing the decisions that are made become more and more complex. Consider for example the image of a table. If V1 were to be presented with this picture it could detect eight lines; its four sides and four legs. From four of these lines V2 could decide that they form a square, for they are connected with each other with four corners. This combination of geometrical elements can be grouped into a single entity, it is unlikely that the identity of the object is being determined at this stage. This last and crucial step results in a massive reduction of redundancy. A combination of geo-metrical patterns requires a large amount of corti-cal processing while the representation of a single object can be represented by a far smaller amount of neurons.

When looking at the anatomical organization of the visual cortex we can indeed see that the corti-cal areas become smaller when following the ven-tral stream. V1 and V2 do not differ much in size; V1 approximately consists of 190 million neurons while V2 uses ̴150 million neurons (Felleman & Van Essen, 1991). Some accounts even claim that V2 is larger than V1 (Levitt et al., 1994). However, based upon the functions V1 and V2 carry out, a large re-duction in cortical surface is not expected. Because V2 extracts more information from the simple line segments detected by V1 and does not group these segments into concepts one would indeed not be surprised that V2 is indeed larger than V1. It has to contain more information without the reduction of redundancy.

However, a large reduction of cortical surface is found between V2 and V4; it is less than half as big. This is a sensible reduction in size regarding the grouping of features into overarching objects. Also, the receptive fields of V4 neurons are bigger in size as compared to V1 and V2 neurons. V4 mainly pro-jects to the posterior inferotemporal cortex (pIT). Following the line of reasoning it is sensible to as-sume that the pIT pools over the complex shapes detected by V4 to enable an even higher order of object recognition. If V4 would detect a door, a win-dow and a roof, the pIT might group these objects into the concept of a house. This claim is supported by the notion that the pIT is again half the size of V4,

(5)

indicating that its specificity is of a higher level and therefore lesser neurons are required. This is also reflected in the finding that the receptive fields of IT neurons are very large and almost always include the fovea (Gross et al., 1969).

So how specific is the information conveyed by each neuron in the IT? A distinction can be made between two mutually exclusive theories that re-gard neuronal representations. The ‘distributed population coding’ view states that objects or con-cepts are represented by a large number of neurons. Only when taken together these populations signal meaningful information concerning object iden-tity (deCharms & Zador, 2000). Alternatively, in the ‘sparse coding’ view objects are represented by a far smaller neuronal ensembles. This way of encoding could be beneficial for it is a more efficient way to process complex data. It allows for higher levels of processing to read out information from lower levels more easily (Olshausen & Field, 2004).

Which of the two theories would most ac-curately describe the representation of objects and concepts in the IT? And is it possible that at this stage of processing the representation of information has become so sparse that neurons have become ‘grand-mother cells’?

Are there Grandmother Cells in the IT?

The idea that the IT would contain highly specific ‘object detectors’ originates from the early accounts of Gross et al. (1969, 1972). They described the dis-covery of a cell that selectively responded to a hand and a face. The discovery of these highly specific cells argued in favor of the grandmother cell theory that was formulated several years earlier. The empir-ical evidence supporting the notion of grandmother cells received a warm welcome amongst its support-ers. However, these were case studies of individual neurons. It was unknown whether these neurons were prevalent or just incidental. In the years to come more and more evidence came to light that the neurons that were described by Gross et al. were the exception and not the rule.

Desimone et al. (1984) claim in “the first sys-tematic survey of stimulus selectivity in IT cortex” that IT neurons respond to many different complex shapes and are therefore not highly selective detec-tors of specific objects. They measured 151 single units in the macaque IT cortex while the animal was passively viewing images that were presented on a

screen. From the 151 neurons they recorded, twen-ty were selective for faces and only two responded solely to hands. The remainder of the responsive neurons responded to a large variety of objects.

Since then a large number of studies have been conducted regarding the level of sparseness of representation in the IT cortex. Zoccolan et al. (2007) were interested in the relationship between the specificity of a neuron and its tolerance to object preserving transformations. From a grandmother cell perspective, one would expect that neurons that are highly selective also show a high level of toler-ance. A neuron responding to a specific face should reliably respond to that face regardless of, for exam-ple, facial expression.

By presenting a large set of various objects and variations to these objects regarding size, posi-tion, contrast and visual clutter, the selectivity and the tolerance could be determined for each neuron. The majority of the recorded neurons were not se-lective for one object but responded to a large vari-ety of objects (Fig. 1). More surprisingly, there was a negative correlation between selectivity and toler-ance. The neurons that were most specific in their response were most likely to have a low tolerance to object preserving transformations. For example, a highly specific neuron that responds to a car would only respond to that specific presentation of the car and ceased to respond if the car would vary in size. Figure 1 | The depiction of sparseness of 92 IT neurons; information is not represented in a sparse coding fashion in the IT. Firing rates are normalized to the object the neuron responded the strongest to and firing rates are plotted for each consecutive object in elicited firing rate. Edited from

(6)

This argues against the concept of functionally im-portant grandmother cells.

The representation of objects using

object-manifolds

Presenting a set of images and probing the respons-es of individual neurons can be considered as a bottom-up approach in trying to infer in which way the IT represents sensory information. A top-down approach might be a better way of doing this. In this case first a theoretical framework needs to be developed concerning the functioning of the visual system. One can then investigate the nature of the representation of stimuli in the IT, reasoning from the perspective of this theoretical framework.

Such a framework has been developed by DiCarlo and Cox (2007, 2010). They pose that ob-jects in the visual space are represented as object manifolds by the neuronal population. Consider the image of a car that is presented to the retina. This event will trigger a specific neuronal response in a large number of neurons; this specific response can be represented as a point in a multi-dimensional space of which the amount of dimensions is equal to the number of neurons in the population. Now consider the presentation of the same car but in a different orientation (e.g. seen from the side instead of the front). This will elicit a different response in

the population, corresponding to a different point in the multi-dimensional space. All the different pos-sibilities in which the car can be perceived due to changes in size, orientation, illumination, visual clut-ter, and so on, produce a set of points which together form a sheet (Fig. 2a). This sheet is called an object manifold.

DiCarlo and Cox try to use this approach to-wards object recognition to tackle its biggest chal-lenge: the invariance problem. This is the problem that is pointed out in the first paragraph of this chap-ter; the fact that one object can produce an infinite amount of images projected to the retina while the brain still categorizes all these projections to the same object.

So how is the brain capable of doing this? In V1, the manifolds of multiple objects are extremely tangled, like sheets of paper crumbled together (Fig. 2b). This has been derived from taking the response functions of typical V1 neurons and presenting them with the faces of two individuals from all different viewing points. However, if you do this for a group of typical IT neurons a completely different picture emerges. The manifolds for the two individuals are flattened and untangled at this stage of visual processing (Fig. 2c). This allows a simple linear classifier to quickly and accurately decide which category an object be-longs to (Hung et al., 2005).

DiCarlo and Cox argue that this way of inter-preting visual processing will lead to understanding the computational algorithms that underlay object Figure 2 | The depiction of object manifolds as edited from DiCarlo and Cox (2007). An object manifold is a 2-dimensional sheet in a multi-dimensional space in which the number of dimensions is defined by the number of neurons in the population that encodes said object. a Every variation in pose of a face produces a different point in the multi-dimensional space, taking all these points together they form a 2-dimensional surface. When adding more degrees of freedom (e.g. size, color), the surface becomes more complex. b The manifolds from two faces in the V1 representation as a result from the simulation of 500 V1 neurons. The object manifolds of the two faces (red and blue) are tangled together and are impossible to separate. c However, the simulation of 500 IT neurons resulted in two manifolds that were untangled in the 500-dimensional space. These manifolds can be easily separated by a simple non-linear classifier.

(7)

recognition. The question then resides by what or-der of magnitude the size of a population encoding for an individual object is. The theoretical framework itself does not pose any statements concerning this issue. An object manifold can be formed by twelve neurons but it can also be the result of the orches-trated activity of a thousand neurons.

The idea of the linear classifier that decides which category an object belongs to was tested in practice by Hung et al. (2005). They used the infor-mation derived from a large number of multi-unit re-cording sites in the macaque IT and tested whether a linear classifier could reliably perform this catego-rization.

When the linear classifier received the infor-mation of 100 multi-unit recording sites in the ma-caque IT it was able to correctly identify a presented object out of 77 possibilities with a hit rate of 49%. However, when the number of sites went up to 256, the hit rate rose to 72%. The classifier’s performance increased more or less linearly with the logarithm of the number of recording sites it used (Fig. 3). This indicates that object recognition is supported by a distributed population of at least several hundred neurons.

Taken together these lines of evidence negate the existence of grandmother cell like neurons in the IT. However, following the reasoning that the sparse-ness of representation increases when ascending the levels of visual processing it is sensible to focus the attention to the level above the IT. The IT pri-marily projects to the medial temporal lobe (MTL) making this the next most likely candidate to contain grandmother cells.

Grandmother cells in the medial

temporal lobe

The medial temporal lobe consists of the perirhinal cortex, parahippocampal cortex, enthorinal cortex, the amygdala and the hippocampus (Suzuki, 1996). The perirhinal and parahippocampal cortex receive widespread inputs from sensory cortices and convey this information to the enthorinal cortex; the gate-way to the hippocampus. (for a review see Squire et al., 2004).

Patients who are suffering from severe epi-lepsy may be implanted with electrodes to deter-mine the precise origin of their seizures. These elec-trodes are implanted solely on clinical grounds and are often placed in the MTL, for this is the most com-mon source of epilepsy. With the patients consent these electrodes can be used to obtain single cell recordings while the patient is awake and passively viewing images on a screen. For the first time, this allowed neuroscientists to probe the responses of individual neurons in humans (Fried et al., 1997).

The first notable finding that arose from this methodological approach is the discovery of the popularly named ‘Jennifer Aniston neuron’. Quiroga et al. (2005) recorded 993 units from eight patients suffering from epilepsy while they were viewing im-ages of well-known actors, animals and landmarks. They found a neuron that solely responded to pic-tures of actress Jennifer Aniston, regardless of fa-cial expression, pose, illumination or other identity preserving transformations (Fig. 4). The properties of this neuron resemble in a high degree the postu-lated properties of a grandmother cell. The neuron is highly invariant to object preserving transformations and is extremely selective in its response. The neu-ron even seized to respond to pictures of Jennifer Aniston taken together with husband at the time, Brad Pitt.

132 out of 993 units Quiroga and colleagues meas-ured responded significantly to at least one image Figure 3 | The performance of a linear classifier as a

function of the number of multi-unit recording sites it received information from. The red line depicts performance in the categorization into eight possible groups and the blue line the identification into 77 possible objects.

(8)

that was presented. From these 132 responsive units, they found 51 units that were selective and invariant towards one individual, landmark, animal or object. The majority of these neurons responded to individuals (38 out of 51) as opposed to objects and landmarks.

The discovery of these neurons seems to directly support the grandmother cell theory. How-ever, remember that after Gross’ discovery of face selective cells this was also received as evidence for grandmother cells. After more careful research this claim was weakened. Is something similar applica-ble to this study? To further explore the notion of grandmother cells residing in the MTL we shall first go over the theoretical side of the question. Is the existence of grandmother cells biologically plausible with our current understanding of information pro-cessing in the brain?

Arguments in favor of grandmother cell

like coding

Taking into account the hierarchical fashion in which visual information is processed the existence of grandmother cells is a logical conclusion. On each level the specialization of neuronal representations increases by pooling over the inputs from the level below. Following this line of reasoning it is not un-imaginable that ultimately there is one neuron that flags the presence of a specific object by pooling its inputs over the neurons that encode the individual

features of said object.

In the view of DiCarlo & Cox (2007) a simple linear classifier is used to identify and categorize ob-jects. As mentioned, the linear classifier needs the input of several hundred neurons to reliably make a decision concerning object identity. However, no claim is made regarding the amount of neurons this linear classifier consists of. Since the classifier must take into account all information from the object’s characteristics, it is a logical assumption that this classifier consists of only one neuron. This is the most efficient way of making a decision. The single neuron can simultaneously read out the activity of all the neurons that are involved with coding the characteristics of the objects that are present in the visual field and make a decision. The single neuron would be tuned to one specific object or individual and by becoming active it would decide whether this object is present in the visual scene.

Another advantage of this type of coding is that it saves energy. Reasoning from an evolutionary position, a situation which requires the least amount of energy would always be favored if this situation would not have other disadvantages.

Arguments against the existence of

grandmother cells

Storing information in a grandmother cell like fash-ion results in a situatfash-ion in which there has to be a single cell for all possible objects that have been en-Figure 4 | Highly selective and invariant neurons found in the medial temporal lobe of humans. a Firing rates of an individual neuron elicited by the presentation of different images. The neuron responded strongly to presentations of actress Jennifer Aniston as opposed to being almost completely silent during the presentation of other well-known people. b Depiction of the number of spikes to all images that were shown. Red bars represent the number of spikes elicited by the presentation of pictures of Jennifer Aniston and the blue bars the response to all other images.

(9)

countered in one’s life. Even if we only take into ac-count the objects or persons that can be recognized again on a later time point this number will rise into the millions. Considering the total amount of neu-rons in the brain this would not be problematic but the real problem resides in the fact that this large fraction of neurons is relatively useless. A neuron has potentially much more use than solely recogniz-ing one object.

If information is stored on the level of one neuron a problem arises if this neuron gets lost. Due to various reasons neurons can die or stop function-ing all together. If this happens to a neuron that was supposed to recognize Barack Obama this would re-sult in a situation in which a perfectly healthy per-son would suddenly lose the ability to recognize the president of the United States. Not only is this situa-tion highly unlikely there are, to the authors’ knowl-edge, no documented cases of such occurrences.

Another practical problem with the grand-mother cell theory is that, except for rare excep-tions, one neuron is physically incapable of exciting other neurons. For this, the simultaneous input from many neurons is needed. If one neuron would have to signal the presence of an individual in a sensory scene it would simply not be able to do so.

Finally, for the conscious perception of a visual scene all the elements that are present in the scene must be extracted. Consider a situation in which a population of hundreds of millions would contain fifty units that signal the individual ele-ments that compose the current visual scene. This extremely small fraction of neurons must somehow be found amongst millions of others. This is a situa-tion that is almost impossible to maintain.

The paradox

The arguments against grandmother cell coding are of such a strong nature that this way of process-ing information becomes highly unlikely. However, grandmother cell like neurons have been found in the human MTL. There are two possible explana-tions to this situation. Firstly, it is possible that the arguments applied to rule out the possibility of grandmother cells are flawed and the brain does work in such a way. The other possibility is that the neurons that have been found might seem to have grandmother cells properties at first glance but that their true function actually extends this.

The latter seems the most plausible

possibil-ity because the existence of grandmother cells is in a complete contradiction with our current under-standing of how the brain processes information. Moreover, because one neuron simply cannot excite other neurons this way of representing information is even physically impossible.

Is the Jennifer Aniston neuron a

Grandmother Cell?

From a theoretical viewpoint it is unlikely that the highly specialized cells that have been found in the MTL are actual grandmother cells as described in the 60’s. And indeed, although the cells that have been found seem to resemble grandmother cells in every aspect, several arguments can be put forth to greatly weaken this claim.

The neurons that have been found exclusively reside in the medial temporal lobe. This collection of areas is closely linked with memory function (Squire et al., 2004). The original idea of grandmother cells was that they mediated recognition of an object or individual. The anatomical location where Quiroga and his co-workers found this ‘Jennifer Aniston neu-ron’ seems to suggest that this particular neuron serves a different purpose than recognition per se.

Although the very first account of a cell that had grandmother cell like properties defined this cell as being multi-modal by definition, all later notions describe this cell as responding primarily to visual information. Since the general idea was that grand-mother cells mediate recognition this is a sensible assumption. However, the medial temporal lobe re-ceives input from all sensory areas and is not a pure-ly visual area. It could therefore very well be that the highly selective neurons that have been found in the MTL also respond to different modalities. For exam-ple, it is not unlikely that the ‘Jennifer Aniston neu-ron’ also responds to hearing the voice of Jennifer Aniston for they both belong to the same individual. This has, to the author’s knowledge, not been tested and can therefore not be confirmed.

Furthermore, it is very likely that neurons that were found to be selective for one individual would also respond to other persons. During a re-cording session, which lasts on average 30 minutes, only a limited number of pictures can be shown. It is impossible to present pictures of all individuals known to the subject, therefore one cannot out rule the possibility that neurons that seem to respond solely to one individual actually respond to a wide

(10)

variety of individuals.

From the selective neurons there were actu-ally a few that did respond to more than one object or person. A neuron that responded to pictures of Jennifer Aniston also responded to Lisa Kudrow, an-other actress from the tv-series ‘Friends’. Anan-other neuron responded to pictures of the Eiffel Tower and the Tower of Pisa. Yet another to different images of characters from the cartoon ‘The Simpsons’. It is important to note that this overlap only occurs in en-tities that are related to each other.

Finally, Waydo et al. (2006) used the data from Quiroga and colleagues to estimate the level of sparseness using a Bayesian probability argument. They defined the sparseness a as the probability that a neuron responds to a random stimulus from the population of all possible stimuli U. If stimuli were to be represented on a grandmother cell fashion then a = 1/U. From the data set of 1425 MTL units from 34 experimental sessions a could be predicted as 0.54%. This seems a small number but considering the number of neurons in the MTL and the possible number of stimuli in the world it would still result in a situation in which one stimulus is represented by many neurons. Moreover, this value predicts that

one neuron would respond to 50 - 150 different enti-ties.

From these arguments it can be reasoned that the neurons that have been found by Quiroga et al. are not grandmother cells in the classical interpretation. Although the representation of stimuli seems to be very sparse, a representation as described by the grandmother cell theory is highly implausible. Also the function of these neurons seems to extend the sole representation of stimuli.

What is the function of these highly

selective units?

An interesting phenomenon can be observed in the timing of the responses elicited by images projected to the retina. From the retina the visual information has to ascend through various stages of process-ing, each stage requires time to process the infor-mation before it can be sent to the next level. The time it takes for each stage to start responding to a presented visual stimulus can be measured (Fig. 5). In macaques, after ̴100 ms the neurons in the IT Figure 5 | The timing of responses as measured in macaque studies. There is a successive lag of approximately 10 ms between the areas that compose the ventral stream of visual processing. An exessively long time delay is observed in the propagation of information from the anterior inferotemporal cortex (aIT) to the medial temporal lobe (MTL). Adapted from Thorpe &

(11)

start responding to a presented visual scene, this is the last purely visual area and can be considered the end-point of visual processing. This time period is the summation of all the delays of the various stages of processing. The delay each stage adds to the total amount of time it takes to process a visual scene has been estimated as approximately 10 ms (Thorpe & Fabre-Thorpe, 2001).

It is interesting to note that the times report-ed in Figure 5 are slightly higher in human comparreport-ed to monkeys. In humans the IT and the prefrontal cor-tex (PFC) are reached only after a delay of ̴150 ms compared to ̴100 ms in monkeys. One possible ex-planation for this difference is that the human brain is simply bigger than the monkey brain and that in-formation just takes more time to travel from area to area.

Also the reaction times of monkeys in go/no-go catego/no-gorization tasks are faster than those of hu-mans. In an average time of 250 to 260 ms monkeys can report whether an animal is present in a briefly flashed image, some responses are even as quickly as 180 ms. In humans the fastest recorded respons-es occur only at 230 ms after stimulus prrespons-esentation, this is 50 ms slower than monkeys (Fabre-Thorpe, 1998).

Since the time each level of processing re-quires to propagate its information to the level above is roughly 10 ms, one would expect that MTL neurons would start to fire approximately 160 ms af-ter stimulus onset. This is because the IT is reached after ̴150 ms and the IT directly outputs to the MTL. However, the first observed responses in the MTL are around 300 ms after stimulus onset. This is an excessively long lag as compared to all the delays be-tween the former visual areas.

What is happening during this time span? Quiroga et al. (2008) argue that this additional time is being used to transform purely visual percepts into concepts that can be stored in the memory. It is sen-sible to memorize concepts (e.g. Steve was playing soccer) instead of the entire visual scene with all its details. The visual scene can then be retrieved again from the concepts that were present in the scene (namely the concept of Steve and the concept of soccer).

Their argument is based upon the fact that humans recognize faces in ̴230 ms, this is even be-fore the first MTL neurons start to fire. This would entail that the MTL neurons are not involved with recognition because recognition takes place before the neurons even start firing.

However, these behavioral responses are measured using go/no-go tasks in which subjects have to de-cide whether a face is present in the picture. It is likely that the processes that decide whether a face is present act much faster than the process that is involved with determining the precise identity of the face. Therefore, the decision whether there is a face present in the scene can be taken very quickly and without the information concerning its precise iden-tity.

Still, the notion that the additional lag be-tween de IT and the MTL is caused by the processes that transform visual information into concepts is a very sensible one. The abstraction of visual input is crucial for all subsequent brain processes, including the storing and retrieving of memories.

Consider the visual presentation of your car. The decision that it is a car can be made by the visual system, most likely on the level of the IT. However, this information is useless to the brain unless it is linked with the concept of a car and more specifi-cally the concept of your car. The concept of a car would for example state that it is a means of trans-portation, you can get in it, it runs on fuels, etcetera. The concept of your car would tell the brain that it is a property of yours, you have paid for it, it is your responsibility, etcetera. Without this kind of infor-mation the brain cannot make sense of the world. In other words, the transformation into concepts gives meaning to visual stimuli.

An indication that the highly selective MTL neurons encode concepts is for example that the ‘Halle Berry’ neuron that has been found by Quiroga et al. (2005) not only responded to pictures of her but also to the written words of her name. This can result from the fact that her name is grouped in the concept of Halle Berry and is therefore responded upon.

Another advantage of abstraction is that it enables the formation of associations and higher order concepts. This is something in which the MTL neurons are probably involved with as well. We have seen that there were neurons in the MTL that re-sponded to Jennifer Aniston and Lisa Kudrow. These two persons can be linked with each other on sev-eral different levels. They can be classified into the category of actresses; this kind of category specific activation has indeed been found in single units in the MTL (Kreiman et al., 2000). More specifically, they can be grouped in the higher order concept of ‘Friends’. This concept would contain the infor-mation that it is a television show and would

(12)

trig-ger memories concerning this show together with your personal opinion. By associating concepts and grouping them in higher order concepts the process of storage and retrieval of memories can be facili-tated.

In short, the highly selective MTL neurons respond most likely to concepts of objects and per-sons. This enables the brain to attribute meaning to visual input and establish associations between dif-ferent concepts. These concepts can be stored in the memory but can also trigger memory traces to link current events with previous occurrences.

Conclusion

Grandmother cells or gnostic cells are neurons that respond uniquely to one object or individual that is present in the visual field. Until recently single units could only be recorded in monkeys but nowadays this is also possible in patients who are implanted with clinical electrodes to determine the source of their epilepsy. The use of this technique led to the discovery of selective and invariant neurons in the MTL that resemble grandmother cells to a high de-gree. For example, a neuron has been found that responds solely to the presentation of the face of actress Jennifer Aniston.

The representation of objects or individuals be-comes sparser when ascending the levels of visual processing. However, that this sparseness would in-crease to the level of a single neuron is unlikely for several reasons. (i) There would have to be a single neuron for all objects we have ever encountered in life. (ii) If one neuron gets lost we would suddenly lose the capability of identifying a certain object. (iii) One neuron is incapable of exciting other neu-rons. (iv) This one neuron would have to be found amongst millions of others.

Herein resides a paradox, on one hand we have a number of arguments that would render the existence of grandmother cells impossible and on the other hand we have the discovery of precisely those neurons that theoretically could not exist.

The most likely explanation is that the neu-rons that have been found in the MTL closely resem-ble grandmother cell properties but are not actual grandmother cells. Several observations support this notion. (i) The MTL is closely linked with mem-ory function and not the representation of visual in-put. It also receives widespread input from different modalities and is not a purely visual area. (ii) Some

of the neurons that have been found responded to more than one individual or object. (iii) Probabilistic estimations based upon empirical data assess that one neuron would respond to 50 to 150 different en-tities.

It is evident that the representation of stimuli in the MTL is very sparse. However, a combination of theoretical and empirical arguments negates a one-to-one mapping of stimuli to neurons.

Instead of representing one single object in the visual field, the neurons that have been found in the MTL are probably involved in the encoding of concepts. They can respond to a wide variety of concepts that are linked together to form high level associations. The transformation of visual percepts in to concepts and associations enables the brain to store visual events in the memory and to link them with previous events. Moreover, linking a perceived object to the abstract concept of that object gives meaning to the presence of the object in the visual field. The brain cannot use the visual information that there is a chair present in the visual field if it is not linked with the concept of a chair, which tells the brain that you can sit on it.

In summary, the recent human single cell recordings are not compatible with view as it is de-scribed by the grandmother cell theory. The highly selective neurons that have been found using this technique are most likely not grandmother cells for they respond to more than one stimulus and they serve a different purpose.

(13)

References

Barlow, H.B. (1953). Summation and Inhibition in the Frog’s Retina. J Physiol 119, 69–88.

Barlow, H.B. (1972). Single units and sensation: a neuron doc-trine for perceptual psychology. Perception 1, 371–394. Boussaoud, D., Desimone, R., and Ungerleider, L.G. (1991).

Visual topography of area TEO in the macaque. J. Comp. Neurol. 306, 554–575.

Bowers, J.S. (2009). On the biological plausibility of grand-mother cells: Implications for neural network theories in psychology and neuroscience. Psychological Review 116, 220–251.

Cowey, A., and Gross, C.G. (1970). Effects of foveal prestriate and inferotemporal lesions on visual discrimination by rhesus monkeys. Experimental Brain Research 11, 128– 144.

deCharms, R.C., and Zador, A. (2000). Neural Representation and the Cortical Code. Annual Review of Neuroscience 23, 613–647.

Desimone, R., Albright, T.D., Gross, C.G., and Bruce, C. (1984). Stimulus-Selective Properties of Inferior Temporal Neu-rons in the Macaque. J. Neurosci. 4, 2051–2062.

DiCarlo, J.J., and Cox, D.D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences 11, 333–341. DiCarlo, J.J., Zoccolan, D., and Rust, N.C. (2012). How Does the

Brain Solve Visual Object Recognition? Neuron 73, 415– 434.

Fabre-Thorpe, M., Delorme, A., Marlot, C., and Thorpe, S. (2001). A Limit to the Speed of Processing in Ultra-Rapid Visual Categorization of Novel Natural Scenes. Journal of Cognitive Neuroscience 13, 171–180.

Fabre-Thorpe, M., Richard, G., and Thorpe, S.J. (1998). Rapid categorization of natural images by rhesus monkeys. [Miscellaneous Article]. Neuroreport January 26, 1998 9, 303–308.

Felleman, D.J., and Van Essen, D.C. (1991). Distributed hierar-chical processing in the primate cerebral cortex. Cerebral Cortex 1, 1–47.

Fried, I., MacDonald, K.A., and Wilson, C.L. (1997). Single Neu-ron Activity in Human Hippocampus and Amygdala during Recognition of Faces and Objects. Neuron 18, 753–765. Goodale, M.A., Milner, A.D., Jakobson, L.S., and Carey, D.P.

(1991). A neurological dissociation between perceiving objects and grasping them. Nature 349, 154–156. Goodale, M.A., and Milner, A.D. (1992). Separate visual

path-ways for perception and action. Trends in Neurosciences 15, 20–25.

Gross, C.G. (1994). How Inferior Temporal Cortex Became a Visual Area. Cerebral Cortex 4, 455 –469.

Gross, C.G. (2002). Genealogy of the “Grandmother Cell.”The Neuroscientist 8, 512–518.

Gross, C.G., Bender, D.B., and Rocha-Miranda, C.E. (1969). Vis-ual Receptive Fields of Neurons in Inferotemporal Cortex of the Monkey. Science 166, 1303–1306.

Gross, C.G., Rocha-Miranda, C.E., and Bender, D.B. (1972). Vis-ual Properties of Neurons in Inferotemporal Cortex of the Macaque. J Neurophysiol 35, 96–111.

Gross, C.G., and Sergent, J. (1992). Face recognition. Current Opinion in Neurobiology 2, 156–161.

Hegdé, J., and Van Essen, D.C. (2000). Selectivity for Complex Shapes in Primate Visual Area V2. J. Neurosci. 20, RC61–

RC61.

Hubel, D.H., and Wiesel, T.N. (1959). Receptive fields of single neurons in the cat’s striate cortex. The Journal of Physiol-ogy 148, 574 –591.

Hubel, D.H., and Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology 160, 106 –154.

Hung, C.P., Kreiman, G., Poggio, T., and DiCarlo, J.J. (2005). Fast readout of object identity from macaque inferior tempo-ral cortex. Science 310, 863–866.

Iwai, E., and Mishkin, M. (1969). Further evidence on the locus of the visual area in the temporal lobe of the monkey. Ex-perimental Neurology 25, 585–594.

Kiani, R., Esteky, H., Mirpour, K., and Tanaka, K. (2007). Object Category Structure in Response Patterns of Neuronal Population in Monkey Inferior Temporal Cortex. Journal of Neurophysiology 97, 4296–4309.

Konorski, J. (1967). Integrative Activity of the Brain. An Inter-disciplinary Approach. (University of Chicago Press, Chi-cago).

Kreiman, G., Koch, C., and Fried, I. (2000). Category-specific visual responses of single neurons in the human medial temporal lobe. Nature Neuroscience 3, 946–953.

Lamme, V.A.F., and Roelfsema, P.R. (2000). The distinct modes of vision offered by feedforward and recurrent process-ing. Trends in Neurosciences 23, 571–579.

Lehky, S.R., Kiani, R., Esteky, H., and Tanaka, K. (2011). Statis-tics of visual responses in primate inferotemporal cortex to object stimuli. Journal of Neurophysiology 106, 1097– 1117.

Leveroni, C.L., Seidenberg, M., Mayer, A.R., Mead, L.A., Binder, J.R., and Rao, S.M. (2000). Neural Systems Underlying the Recognition of Familiar and Newly Learned Faces. J. Neu-rosci. 20, 878–886.

Levitt, J.B., Kiper, D.C., and Movshon, J.A. (1994). Receptive Fields and Functional Architecture of Macaque V2. J Neu-rophysiol 71, 2517–2542.

Mishkin, M., Ungerleider, L.G., and Macko, K.A. (1983). Object vision and spatial vision: two cortical pathways. Tins 6, 414–417.

Miyashita, Y., and Chang, H.S. (1988). Neuronal correlate of pic-torial short-term memory in the primate temporal cortex. Nature 331, 68–70.

Olshausen, B.A., and Field, D.J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609.

Olshausen, B.A., and Field, D.J. (2004). Sparse coding of senso-ry inputs. Current Opinion in Neurobiology 14, 481–487. Orban, G.A. (2008). Higher Order Visual Processing in Macaque

Extrastriate Cortex. Physiol Rev 88, 59–89.

Quiroga, R.Q., Kreiman, G., Koch, C., and Fried, I. (2008a). Sparse but not “Grandmother-cell” coding in the medial temporal lobe. Trends in Cognitive Sciences 12, 87–91. Quiroga, R.Q., Mukamel, R., Isham, E.A., Malach, R., and Fried,

I. (2008b). Human single-neuron responses at the thresh-old of conscious recognition. Proc Natl Acad Sci U S A 105, 3599–3604.

Quiroga, R.Q., Reddy, L., Koch, C., and Fried, I. (2007). Decoding Visual Inputs From Multiple Neurons in the Human Tem-poral Lobe. Journal of Neurophysiology 98, 1997–2007.

(14)

Quiroga, R.Q., Reddy, L., Kreiman, G., Koch, C., and Fried, I. (2005). Invariant visual representation by single neurons in the human brain. Nature 435, 1102–1107.

Roe, A.W., Chelazzi, L., Connor, C.E., Conway, B.R., Fujita, I., Gal-lant, J.L., Lu, H., and Vanduffel, W. (2012). Toward a Uni-fied Theory of Visual Area V4. Neuron 74, 12–29.

Schiller, P.H. (1995). Effect of lesions in visual cortical area V4 on the recognition of transformed objects. Nature 376, 342–344.

Sherrington, C.S. (1941). Man on His Nature (Cambridge Uni-versity Press).

Shipp, S., Watson, J.D.G., Frackowiak, R.S.J., and Zeri, S. (1995). Retinotopic Maps in Human Prestriate Visual Cortex: The Demarcation of Areas V2 and V3. NeuroImage 2, 125–132. Smith, S., and Häusser, M. (2010). Parallel processing of visual

space by neighboring neurons in mouse visual cortex. Nat Neurosci 13, 1144–1149.

Squire, L.R., Stark, C.E.L., and Clark, R.E. (2004). The Medial Temporal Lobe. Annual Review of Neuroscience 27, 279– 306.

Sugase, Y., Yamane, S., Ueno, S., and Kawano, K. (1999). Global and fine information coded by single neurons in the tem-poral visual cortex. Nature 400, 869–873.

Suzuki, W.A. (1996). Neuroanatomy of the monkey entorhinal, perirhinal and parahippocampal cortices: Organization of cortical inputs and interconnections with amygdala and striatum. Seminars in Neuroscience 8, 3–12.

Thorpe, S.J., and Fabre-Thorpe, M. (2001). Seeking categories in the brain. Science 291, 260–263.

Wang, G., Obama, S., Yamashita, W., Sugihara, T., and Tanaka, K. (2005). Prior experience of rotation is not required for recognizing objects seen from different angles. Nat Neu-rosci 8, 1768–1775.

Wang, Q., Sporns, O., and Burkhalter, A. (2012). Network Anal-ysis of Corticocortical Connections Reveals Ventral and Dorsal Processing Streams in Mouse Visual Cortex. J. Neu-rosci. 32, 4386–4399.

Waydo, S., and Koch, C. (2008). Unsupervised Learning of Indi-viduals and Categories from Images. Neural Computation 20, 1165–1178.

Waydo, S., Kraskov, A., Quian Quiroga, R., Fried, I., and Koch, C. (2006). Sparse Representation in the Human Medial Tem-poral Lobe. J. Neurosci. 26, 10232–10234.

Weiskrantz, L., and Saunders, R.C. (1984). Impairments of visual object transforms in monkeys. Brain 107 ( Pt 4), 1033–1072.

Westerberg, C.E., Voss, J.L., Reber, P.J., and Paller, K.A. (2011). Medial temporal contributions to successful face-name learning. Human Brain Mapping.

Zoccolan, D., Kouh, M., Poggio, T., and DiCarlo, J.J. (2007). Trade-Off Between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex. J. Neurosci. 27, 12292–12307.

Referenties

GERELATEERDE DOCUMENTEN

By comparing the presence of unique genetic tags (barcodes) in antigen-specific effector and memory T cell populations in systemic and local infection models, at different

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded.

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded from: https://hdl.handle.net/1887/18361..

In this thesis I wished to investigate I) how different antigen-specific CD8 + T cell clones contribute to the heterogeneity within the CD8 + T cell respons, II) at what point

Under conditions of either local or systemic infection, it was found that each naive T cell gives rise to both effector and memory T cells, indicating that the progeny of a

In an attempt to explain why some activated T cells would survive beyond the contraction phase and others not, Ahmed and Gray proposed the decreasing potential

Together, the current data demonstrate that cellular barcoding can be used to dissect the migration patterns of T cell families in vivo and show that the majority of

To investigate the lineage relationship of CD8 + T cells that are found in different organs during the effector and memory phase, naïve barcode-labeled OT-I T cells were