Publications by authors named "Tomaso Poggio"

41 Publications

Theoretical issues in deep networks.

Proc Natl Acad Sci U S A 2020 12 9;117(48):30039-30045. Epub 2020 Jun 9.

Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139.

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A theoretical characterization of deep learning should answer questions about their approximation power, the dynamics of optimization, and good out-of-sample performance, despite overparameterization and the absence of explicit regularization. We review our recent results toward this goal. In approximation theory both shallow and deep networks are known to approximate any continuous functions at an exponential cost. However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality. In characterizing minimization of the empirical exponential loss we consider the gradient flow of the weight directions rather than the weights themselves, since the relevant function underlying classification corresponds to normalized networks. The dynamics of normalized weights turn out to be equivalent to those of the constrained problem of minimizing the loss subject to a unit norm constraint. In particular, the dynamics of typical gradient descent have the same critical points as the constrained problem. Thus there is implicit regularization in training deep networks under exponential-type loss functions during gradient flow. As a consequence, the critical points correspond to minimum norm infima of the loss. This result is especially relevant because it has been recently shown that, for overparameterized models, selection of a minimum norm solution optimizes cross-validation leave-one-out stability and thereby the expected error. Thus our results imply that gradient descent in deep networks minimize the expected error.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1907369117DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720221PMC
December 2020

Complexity control by gradient descent in deep networks.

Nat Commun 2020 02 24;11(1):1027. Epub 2020 Feb 24.

Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, USA.

Overparametrized deep networks predict well, despite the lack of an explicit complexity control during training, such as an explicit regularization term. For exponential-type loss functions, we solve this puzzle by showing an effective regularization effect of gradient descent in terms of the normalized weights that are relevant for classification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-020-14663-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7039878PMC
February 2020

Scale and translation-invariance for novel objects in human vision.

Sci Rep 2020 Jan 29;10(1):1411. Epub 2020 Jan 29.

Center for Brains, Minds and Machines, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America.

Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. To understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. Our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons' receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-019-57261-6DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989457PMC
January 2020

Invariant Recognition Shapes Neural Representations of Visual Input.

Annu Rev Vis Sci 2018 09 27;4:403-422. Epub 2018 Jul 27.

Center for Brains, Minds and Machines, MIT, Cambridge, Massachusetts 02139, USA; email: , ,

Recognizing the people, objects, and actions in the world around us is a crucial aspect of human perception that allows us to plan and act in our environment. Remarkably, our proficiency in recognizing semantic categories from visual input is unhindered by transformations that substantially alter their appearance (e.g., changes in lighting or position). The ability to generalize across these complex transformations is a hallmark of human visual intelligence, which has been the focus of wide-ranging investigation in systems and computational neuroscience. However, while the neural machinery of human visual perception has been thoroughly described, the computational principles dictating its functioning remain unknown. Here, we review recent results in brain imaging, neurophysiology, and computational neuroscience in support of the hypothesis that the ability to support the invariant recognition of semantic entities in the visual world shapes which neural representations of sensory input are computed by human visual cortex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1146/annurev-vision-091517-034103DOI Listing
September 2018

Invariant recognition drives neural representations of action sequences.

PLoS Comput Biol 2017 12 18;13(12):e1005859. Epub 2017 Dec 18.

Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, United States.

Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs), that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1005859DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5749869PMC
December 2017

A fast, invariant representation for human action in the visual system.

J Neurophysiol 2018 02 8;119(2):631-640. Epub 2017 Nov 8.

Center for Brains, Minds, and Machines, Massachusetts Institute of Technology , Cambridge, Massachusetts.

Humans can effortlessly recognize others' actions in the presence of complex transformations, such as changes in viewpoint. Several studies have located the regions in the brain involved in invariant action recognition; however, the underlying neural computations remain poorly understood. We use magnetoencephalography decoding and a data set of well-controlled, naturalistic videos of five actions (run, walk, jump, eat, drink) performed by different actors at different viewpoints to study the computational steps used to recognize actions across complex transformations. In particular, we ask when the brain discriminates between different actions, and when it does so in a manner that is invariant to changes in 3D viewpoint. We measure the latency difference between invariant and noninvariant action decoding when subjects view full videos as well as form-depleted and motion-depleted stimuli. We were unable to detect a difference in decoding latency or temporal profile between invariant and noninvariant action recognition in full videos. However, when either form or motion information is removed from the stimulus set, we observe a decrease and delay in invariant action decoding. Our results suggest that the brain recognizes actions and builds invariance to complex transformations at the same time and that both form and motion information are crucial for fast, invariant action recognition. NEW & NOTEWORTHY The human brain can quickly recognize actions despite transformations that change their visual appearance. We use neural timing data to uncover the computations underlying this ability. We find that within 200 ms action can be read out of magnetoencephalography data and that this representation is invariant to changes in viewpoint. We find form and motion are needed for this fast action decoding, suggesting that the brain quickly integrates complex spatiotemporal features to form invariant action representations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/jn.00642.2017DOI Listing
February 2018

View-Tolerant Face Recognition and Hebbian Learning Imply Mirror-Symmetric Neural Tuning to Head Orientation.

Curr Biol 2017 Jan 1;27(1):62-67. Epub 2016 Dec 1.

Center for Brains, Minds, and Machines and McGovern Institute for Brain Research at MIT, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and robust against identity-preserving transformations, like depth rotations [1, 2]. Current computational models of object recognition, including recent deep-learning networks, generate these properties through a hierarchy of alternating selectivity-increasing filtering and tolerance-increasing pooling operations, similar to simple-complex cells operations [3-6]. Here, we prove that a class of hierarchical architectures and a broad set of biologically plausible learning rules generate approximate invariance to identity-preserving transformations at the top level of the processing hierarchy. However, all past models tested failed to reproduce the most salient property of an intermediate representation of a three-level face-processing hierarchy in the brain: mirror-symmetric tuning to head orientation [7]. Here, we demonstrate that one specific biologically plausible Hebb-type learning rule generates mirror-symmetric tuning to bilaterally symmetric stimuli, like faces, at intermediate levels of the architecture and show why it does so. Thus, the tuning properties of individual cells inside the visual stream appear to result from group properties of the stimuli they encode and to reflect the learning rules that sculpted the information-processing system within which they reside.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2016.10.015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5319833PMC
January 2017

Neural Tuning Size in a Model of Primate Visual Processing Accounts for Three Key Markers of Holistic Face Processing.

PLoS One 2016 17;11(3):e0150980. Epub 2016 Mar 17.

McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.

Faces are an important and unique class of visual stimuli, and have been of interest to neuroscientists for many years. Faces are known to elicit certain characteristic behavioral markers, collectively labeled "holistic processing", while non-face objects are not processed holistically. However, little is known about the underlying neural mechanisms. The main aim of this computational simulation work is to investigate the neural mechanisms that make face processing holistic. Using a model of primate visual processing, we show that a single key factor, "neural tuning size", is able to account for three important markers of holistic face processing: the Composite Face Effect (CFE), Face Inversion Effect (FIE) and Whole-Part Effect (WPE). Our proof-of-principle specifies the precise neurophysiological property that corresponds to the poorly-understood notion of holism, and shows that this one neural property controls three classic behavioral markers of holism. Our work is consistent with neurophysiological evidence, and makes further testable predictions. Overall, we provide a parsimonious account of holistic face processing, connecting computation, behavior and neurophysiology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0150980PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795648PMC
August 2016

The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex.

PLoS Comput Biol 2015 Oct 23;11(10):e1004390. Epub 2015 Oct 23.

Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America; McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America; Istituto Italiano di Tecnologia, Genova, Italy.

Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to new objects that share properties with the old, then the recognition system's optimal organization must be one containing specialized modules for different object classes. Our analysis starts from a premise we call the invariance hypothesis: that the computational goal of the ventral stream is to compute an invariant-to-transformations and discriminative signature for recognition. The key condition enabling approximate transfer of invariance without sacrificing discriminability turns out to be that the learned and novel objects transform similarly. This implies that the optimal recognition system must contain subsystems trained only with data from similarly-transforming objects and suggests a novel interpretation of domain-specific regions like the fusiform face area (FFA). Furthermore, we can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions in agreement with the available data. The result is a unifying account linking the large literature on view-based recognition with the wealth of experimental evidence concerning domain-specific regions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004390DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4619805PMC
October 2015

DONALD ARTHUR GLASER: 21 SEPTEMBER 1926 - 28 FEBRUARY 2013.

Proc Am Philos Soc 2014 Sep;158(3):311-5

View Article and Find Full Text PDF

Download full-text PDF

Source
September 2014

The dynamics of invariant object recognition in the human visual system.

J Neurophysiol 2014 Jan 2;111(1):91-102. Epub 2013 Oct 2.

Center for Biological and Computational Learning, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts;

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/jn.00394.2013DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4280161PMC
January 2014

Vision: are models of object recognition catching up with the brain?

Ann N Y Acad Sci 2013 Dec 17;1305:72-82. Epub 2013 Jun 17.

Department of Brain and Cognitive Sciences, McGovern Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts.

Object recognition has been a central yet elusive goal of computational vision. For many years, computer performance seemed highly deficient and unable to emulate the basic capabilities of the human recognition system. Over the past decade or so, computer scientists and neuroscientists have developed algorithms and systems-and models of visual cortex-that have come much closer to human performance in visual identification and categorization. In this personal perspective, we discuss the ongoing struggle of visual models to catch up with the visual cortex, identify key reasons for the relatively rapid improvement of artificial systems and models, and identify open problems for computational vision in this domain.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/nyas.12148DOI Listing
December 2013

Donald Arthur Glaser (1926-2013).

Authors:
Tomaso Poggio

Nature 2013 Apr;496(7443):32

Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/496032aDOI Listing
April 2013

The Levels of Understanding framework, revised.

Authors:
Tomaso Poggio

Perception 2012 ;41(9):1017-23

McGovern Institute for Brain Research, Center for Biological & Computational Learning, Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

I discuss the "levels of understanding" framework described in Marr's Vision and propose an updated version to capture the changes in computation and neuroscience over the last 30 years.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1068/p7299DOI Listing
April 2013

Learning and disrupting invariance in visual recognition with a temporal association rule.

Front Comput Neurosci 2012 25;6:37. Epub 2012 Jun 25.

Center for Biological and Computational Learning, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge MA, USA.

Learning by temporal association rules such as Foldiak's trace rule is an attractive hypothesis that explains the development of invariance in visual recognition. Consistent with these rules, several recent experiments have shown that invariance can be broken at both the psychophysical and single cell levels. We show (1) that temporal association learning provides appropriate invariance in models of object recognition inspired by the visual cortex, (2) that we can replicate the "invariance disruption" experiments using these models with a temporal association learning rule to develop and maintain invariance, and (3) that despite dramatic single cell effects, a population of cells is very robust to these disruptions. We argue that these models account for the stability of perceptual invariance despite the underlying plasticity of the system, the variability of the visual world and expected noise in the biological mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fncom.2012.00037DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3385587PMC
August 2012

Object decoding with attention in inferior temporal cortex.

Proc Natl Acad Sci U S A 2011 May 9;108(21):8850-5. Epub 2011 May 9.

Department of Brain and Cognitive Sciences, McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Recognizing objects in cluttered scenes requires attentional mechanisms to filter out distracting information. Previous studies have found several physiological correlates of attention in visual cortex, including larger responses for attended objects. However, it has been unclear whether these attention-related changes have a large impact on information about objects at the neural population level. To address this question, we trained monkeys to covertly deploy their visual attention from a central fixation point to one of three objects displayed in the periphery, and we decoded information about the identity and position of the objects from populations of ∼ 200 neurons from the inferior temporal cortex using a pattern classifier. The results show that before attention was deployed, information about the identity and position of each object was greatly reduced relative to when these objects were shown in isolation. However, when a monkey attended to an object, the pattern of neural activity, represented as a vector with dimensionality equal to the size of the neural population, was restored toward the vector representing the isolated object. Despite this nearly exclusive representation of the attended object, an increase in the salience of nonattended objects caused "bottom-up" mechanisms to override these "top-down" attentional enhancements. The method described here can be used to assess which attention-related physiological changes are directly related to object recognition, and should be helpful in assessing the role of additional physiological changes in the future.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1100999108DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102370PMC
May 2011

Automated home-cage behavioural phenotyping of mice.

Nat Commun 2010 Sep 7;1:68. Epub 2010 Sep 7.

Department of Brain and Cognitive Sciences, McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Neurobehavioural analysis of mouse phenotypes requires the monitoring of mouse behaviour over long periods of time. In this study, we describe a trainable computer vision system enabling the automated analysis of complex mouse behaviours. We provide software and an extensive manually annotated video database used for training and testing the system. Our system performs on par with human scoring, as measured from ground-truth manual annotations of thousands of clips of freely behaving mice. As a validation of the system, we characterized the home-cage behaviours of two standard inbred and two non-standard mouse strains. From these data, we were able to predict in a blind test the strain identity of individual animals with high accuracy. Our video-based software will complement existing sensor-based automated approaches and enable an adaptable, comprehensive, high-throughput, fine-grained, automated analysis of mouse behaviour.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms1064DOI Listing
September 2010

Prefrontal cortex activity during flexible categorization.

J Neurosci 2010 Jun;30(25):8519-28

The Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

Items are categorized differently depending on the behavioral context. For instance, a lion can be categorized as an African animal or a type of cat. We recorded lateral prefrontal cortex (PFC) neural activity while monkeys switched between categorizing the same image set along two different category schemes with orthogonal boundaries. We found that each category scheme was largely represented by independent PFC neuronal populations and that activity reflecting a category distinction was weaker, but not absent, when that category was irrelevant. We suggest that the PFC represents competing category representations independently to reduce interference between them.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.4837-09.2010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3709835PMC
June 2010

What and where: a Bayesian inference theory of attention.

Vision Res 2010 Oct 20;50(22):2233-47. Epub 2010 May 20.

McGovern Institute for Brain Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, United States.

In the theoretical framework of this paper, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spatial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in cluttered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenomena--including bottom-up pop-out effects, multiplicative modulation of neuronal tuning curves and shift in contrast responses--all emerge naturally as predictions of the model. We also show that the Bayesian model predicts well human eye fixations (considered as a proxy for shifts of attention) in natural scenes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.visres.2010.05.013DOI Listing
October 2010

Dynamic population coding of category information in inferior temporal and prefrontal cortex.

J Neurophysiol 2008 Sep 18;100(3):1407-19. Epub 2008 Jun 18.

Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA.

Most electrophysiology studies analyze the activity of each neuron separately. While such studies have given much insight into properties of the visual system, they have also potentially overlooked important aspects of information coded in changing patterns of activity that are distributed over larger populations of neurons. In this work, we apply a population decoding method to better estimate what information is available in neuronal ensembles and how this information is coded in dynamic patterns of neural activity in data recorded from inferior temporal cortex (ITC) and prefrontal cortex (PFC) as macaque monkeys engaged in a delayed match-to-category task. Analyses of activity patterns in ITC and PFC revealed that both areas contain "abstract" category information (i.e., category information that is not directly correlated with properties of the stimuli); however, in general, PFC has more task-relevant information, and ITC has more detailed visual information. Analyses examining how information coded in these areas show that almost all category information is available in a small fraction of the neurons in the population. Most remarkably, our results also show that category information is coded by a nonstationary pattern of activity that changes over the course of a trial with individual neurons containing information on much shorter time scales than the population as a whole.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/jn.90248.2008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2544466PMC
September 2008

A canonical neural circuit for cortical nonlinear operations.

Neural Comput 2008 Jun;20(6):1427-51

Center for Biological and Computational Learning, and McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

A few distinct cortical operations have been postulated over the past few years, suggested by experimental data on nonlinear neural response across different areas in the cortex. Among these, the energy model proposes the summation of quadrature pairs following a squaring nonlinearity in order to explain phase invariance of complex V1 cells. The divisive normalization model assumes a gain-controlling, divisive inhibition to explain sigmoid-like response profiles within a pool of neurons. A gaussian-like operation hypothesizes a bell-shaped response tuned to a specific, optimal pattern of activation of the presynaptic inputs. A max-like operation assumes the selection and transmission of the most active response among a set of neural inputs. We propose that these distinct neural operations can be computed by the same canonical circuitry, involving divisive normalization and polynomial nonlinearities, for different parameter values within the circuit. Hence, this canonical circuit may provide a unifying framework for several circuit models, such as the divisive normalization and the energy models. As a case in point, we consider a feedforward hierarchical model of the ventral pathway of the primate visual cortex, which is built on a combination of the gaussian-like and max-like operations. We show that when the two operations are approximated by the circuit proposed here, the model is capable of generating selective and invariant neural responses and performing object recognition, in good agreement with neurophysiological data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1162/neco.2008.02-07-466DOI Listing
June 2008

Trade-off between object selectivity and tolerance in monkey inferotemporal cortex.

J Neurosci 2007 Nov;27(45):12292-307

McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.

Object recognition requires both selectivity among different objects and tolerance to vastly different retinal images of the same object, resulting from natural variation in (e.g.) position, size, illumination, and clutter. Thus, discovering neuronal responses that have object selectivity and tolerance to identity-preserving transformations is fundamental to understanding object recognition. Although selectivity and tolerance are found at the highest level of the primate ventral visual stream [the inferotemporal cortex (IT)], both properties are highly varied and poorly understood. If an IT neuron has very sharp selectivity for a unique combination of object features ("diagnostic features"), this might automatically endow it with high tolerance. However, this relationship cannot be taken as given; although some IT neurons are highly object selective and some are highly tolerant, the empirical connection of these key properties is unknown. In this study, we systematically measured both object selectivity and tolerance to different identity-preserving image transformations in the spiking responses of a population of monkey IT neurons. We found that IT neurons with high object selectivity typically have low tolerance (and vice versa), regardless of how object selectivity was quantified and the type of tolerance examined. The discovery of this trade-off illuminates object selectivity and tolerance in IT and unifies a range of previous, seemingly disparate results. This finding also argues against the idea that diagnostic conjunctions of features guarantee tolerance. Instead, it is naturally explained by object recognition models in which object selectivity is built through AND-like tuning mechanisms.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.1897-07.2007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6673257PMC
November 2007

A quantitative theory of immediate visual recognition.

Prog Brain Res 2007 ;165:33-56

Center for Biological and Computational Learning, McGovern Institute for Brain Research, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge 02139, USA.

Human and non-human primates excel at visual recognition tasks. The primate visual system exhibits a strong degree of selectivity while at the same time being robust to changes in the input image. We have developed a quantitative theory to account for the computations performed by the feedforward path in the ventral stream of the primate visual cortex. Here we review recent predictions by a model instantiating the theory about physiological observations in higher visual areas. We also show that the model can perform recognition tasks on datasets of complex natural images at a level comparable to psychophysical measurements on human observers during rapid categorization tasks. In sum, the evidence suggests that the theory may provide a framework to explain the first 100-150 ms of visual object recognition. The model also constitutes a vivid example of how computational models can interact with experimental observations in order to advance our understanding of a complex phenomenon. We conclude by suggesting a number of open questions, predictions, and specific experiments for visual physiology and psychophysics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/S0079-6123(06)65004-8DOI Listing
July 2008

A model of V4 shape selectivity and invariance.

J Neurophysiol 2007 Sep 27;98(3):1733-50. Epub 2007 Jun 27.

Center for Biological and Computational Learning, McGovern Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Object recognition in primates is mediated by the ventral visual pathway and is classically described as a feedforward hierarchy of increasingly sophisticated representations. Neurons in macaque monkey area V4, an intermediate stage along the ventral pathway, have been shown to exhibit selectivity to complex boundary conformation and invariance to spatial translation. How could such a representation be derived from the signals in lower visual areas such as V1? We show that a quantitative model of hierarchical processing, which is part of a larger model of object recognition in the ventral pathway, provides a plausible mechanism for the translation-invariant shape representation observed in area V4. Simulated model neurons successfully reproduce V4 selectivity and invariance through a nonlinear, translation-invariant combination of locally selective subunits, suggesting that a similar transformation may occur or culminate in area V4. Specifically, this mechanism models the selectivity of individual V4 neurons to boundary conformation stimuli, exhibits the same degree of translation invariance observed in V4, and produces observed V4 population responses to bars and non-Cartesian gratings. This work provides a quantitative model of the widely described shape selectivity and invariance properties of area V4 and points toward a possible canonical mechanism operating throughout the ventral pathway.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1152/jn.01265.2006DOI Listing
September 2007

A feedforward architecture accounts for rapid categorization.

Proc Natl Acad Sci U S A 2007 Apr 2;104(15):6424-9. Epub 2007 Apr 2.

Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Primates are remarkably good at recognizing objects. The level of performance of their visual system and its robustness to image degradations still surpasses the best computer vision systems despite decades of engineering effort. In particular, the high accuracy of primates in ultra rapid object categorization and rapid serial visual presentation tasks is remarkable. Given the number of processing stages involved and typical neural latencies, such rapid visual processing is likely to be mostly feedforward. Here we show that a specific implementation of a class of feedforward theories of object recognition (that extend the Hubel and Wiesel simple-to-complex cell hierarchy and account for many anatomical and physiological constraints) can predict the level and the pattern of performance achieved by humans on a rapid masked animal vs. non-animal categorization task.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.0700622104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1847457PMC
April 2007

Robust object recognition with cortex-like mechanisms.

IEEE Trans Pattern Anal Mach Intell 2007 Mar;29(3):411-26

Massachusetts Institute of Technology, Center for Biological and Computational Learning, McGovern Institute for Brain Research and Brain & Cognitive Sciences Department, MA 02139, USA.

We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2007.56DOI Listing
March 2007

Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex.

Neuron 2006 Feb;49(3):433-45

McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

Local field potentials (LFPs) arise largely from dendritic activity over large brain regions and thus provide a measure of the input to and local processing within an area. We characterized LFPs and their relationship to spikes (multi and single unit) in monkey inferior temporal cortex (IT). LFP responses in IT to complex objects showed strong selectivity at 44% of the sites and tolerance to retinal position and size. The LFP preferences were poorly predicted by the spike preferences at the same site but were better explained by averaging spikes within approximately 3 mm. A comparison of separate sites suggests that selectivity is similar on a scale of approximately 800 microm for spikes and approximately 5 mm for LFPs. These observations imply that inputs to IT neurons convey selectivity for complex shapes and that such input may have an underlying organization spanning several millimeters.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neuron.2005.12.019DOI Listing
February 2006

Experience-dependent sharpening of visual shape selectivity in inferior temporal cortex.

Cereb Cortex 2006 Nov 28;16(11):1631-44. Epub 2005 Dec 28.

The Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Whereas much is known about the visual shape selectivity of neurons in the inferior temporal cortex (ITC), less is known about the role of visual learning in the development and refinement of ITC shape selectivity. To address this, we trained monkeys to perform a visual categorization task with a parametric set of highly familiar stimuli. During training, the stimuli were always presented at the same orientation. In this experiment, we recorded from ITC neurons while monkeys viewed the trained stimuli in addition to image-plane rotated versions of those stimuli. We found that, concomitant with the monkeys' behavioral performance, neuronal stimulus selectivity was stronger for stimuli presented at the trained orientation than for rotated versions of the same stimuli. We also recorded from ITC neurons while monkeys viewed sets of novel and familiar (but not explicitly trained) randomly chosen complex stimuli. We again found that ITC stimulus selectivity was sharper for familiar than novel stimuli, suggesting that enhanced shape tuning in ITC can arise for both passively experienced and explicitly trained stimuli.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/cercor/bhj100DOI Listing
November 2006

Fast readout of object identity from macaque inferior temporal cortex.

Science 2005 Nov;310(5749):863-6

McGovern Institute for Brain Research, Cambridge, MA 02139, USA.

Understanding the brain computations leading to object recognition requires quantitative characterization of the information represented in inferior temporal (IT) cortex. We used a biologically plausible, classifier-based readout technique to investigate the neural coding of selectivity and invariance at the IT population level. The activity of small neuronal populations (approximately 100 randomly selected cells) over very short time intervals (as small as 12.5 milliseconds) contained unexpectedly accurate and robust information about both object "identity" and "category." This information generalized over a range of object positions and scales, even for novel objects. Coarse information about position and scale could also be read out from the same population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/science.1117593DOI Listing
November 2005

Identification and analysis of alternative splicing events conserved in human and mouse.

Proc Natl Acad Sci U S A 2005 Feb 11;102(8):2850-5. Epub 2005 Feb 11.

Department of Biology and Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA 02319, USA.

Alternative pre-mRNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few of such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call alternative conserved exons (ACEs), from other orthologous human/mouse exons and integrate these features into an exon classification algorithm, acescan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified approximately 2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues by using an RT-PCR-sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of acescan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons that had EST or cDNA evidence of AS in human but were not predicted ACEs, and AS was never observed for 11 negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve the reading frame and less likely to disrupt protein domains than other AS events and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing, and development. Our results also imply that the vast majority of AS events represented in the human EST database are not conserved in mouse.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.0409742102DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC548664PMC
February 2005
-->