首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inferior temporal (IT) cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total), testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet) along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network). We compared the representational dissimilarity matrices (RDMs) of the model representations with the RDMs obtained from human IT (measured with fMRI) and monkey IT (measured with cell recording) for the same set of stimuli (not used in training the models). Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.  相似文献   

2.
Why is Real-World Visual Object Recognition Hard?   总被引:1,自引:0,他引:1  
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, “natural” images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled “natural” images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientist's “null” model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a “simpler” recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.  相似文献   

3.
4.
We propose a conceptual framework for artificial object recognition systems based on findings from neurophysiological and neuropsychological research on the visual system in primate cortex. We identify some essential questions, which have to be addressed in the course of designing object recognition systems. As answers, we review some major aspects of biological object recognition, which are then translated into the technical field of computer vision. The key suggestions are the use of incremental and view-based approaches together with the ability of online feature selection and the interconnection of object-views to form an overall object representation. The effectiveness of the computational approach is estimated by testing a possible realization in various tasks and conditions explicitly designed to allow for a direct comparison with the biological counterpart. The results exhibit excellent performance with regard to recognition accuracy, the creation of sparse models and the selection of appropriate features.  相似文献   

5.
While many models of biological object recognition share a common set of “broad-stroke” properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model—e.g., the number of units per layer, the size of pooling kernels, exponents in normalization operations, etc. Since the number of such parameters (explicit or implicit) is typically large and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea or because the correct “parts” have not been tuned correctly, assembled at sufficient scale, or provided with enough training. Here, we present a high-throughput approach to the exploration of such parameter sets, leveraging recent advances in stream processing hardware (high-end NVIDIA graphic cards and the PlayStation 3''s IBM Cell Processor). In analogy to high-throughput screening approaches in molecular biology and genetics, we explored thousands of potential network architectures and parameter instantiations, screening those that show promising object recognition performance for further analysis. We show that this approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature. As the scale of available computational power continues to expand, we argue that this approach has the potential to greatly accelerate progress in both artificial vision and our understanding of the computational underpinning of biological vision.  相似文献   

6.
The automatic place recognition problem is one of the key challenges in SLAM approaches for loop closure detection. Most of the appearance-based solutions to this problem share the idea of image feature extraction, memorization, and matching search. The weakness of these solutions is the storage and computational costs which increase drastically with the environment size. In this regard, the major constraints to overcome are the required visual information storage and the complexity of similarity computation. In this paper, a novel formulation is proposed that allows the computation time reduction while no visual information are stored and matched explicitly. The proposed solution relies on the incremental building of a bio-inspired visual memory using a Fuzzy ART network. This network considers the properties discovered in primate brain. The performance evaluation of the proposed method has been conducted using two datasets representing different large scale outdoor environments. The method has been compared with RatSLAM and FAB-MAP approaches and has demonstrated a decreased time and storage costs with broadly comparable precision recall performance.  相似文献   

7.
Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.  相似文献   

8.
Growing evidence indicates a moderate but significant relationship between processing speed in visuo-cognitive tasks and general intelligence. On the other hand, findings from neuroscience proposed that the primate visual system consists of two major pathways, the ventral pathway for objects recognition and the dorsal pathway for spatial processing and attentive analysis. Previous studies seeking for visuo-cognitive factors of human intelligence indicated a significant correlation between fluid intelligence and the inspection time (IT), an index for a speed of object recognition performed in the ventral pathway. We thus presently examined a possibility that neural processing speed in the dorsal pathway also represented a factor of intelligence. Specifically, we used the mental rotation (MR) task, a popular psychometric measure for mental speed of spatial processing in the dorsal pathway. We found that the speed of MR was significantly correlated with intelligence scores, while it had no correlation with one’s IT (recognition speed of visual objects). Our results support the new possibility that intelligence could be explained by two types of mental speed, one related to object recognition (IT) and another for manipulation of mental images (MR).  相似文献   

9.
An object in the peripheral visual field is more difficult to recognize when surrounded by other objects. This phenomenon is called “crowding”. Crowding places a fundamental constraint on human vision that limits performance on numerous tasks. It has been suggested that crowding results from spatial feature integration necessary for object recognition. However, in the absence of convincing models, this theory has remained controversial. Here, we present a quantitative and physiologically plausible model for spatial integration of orientation signals, based on the principles of population coding. Using simulations, we demonstrate that this model coherently accounts for fundamental properties of crowding, including critical spacing, “compulsory averaging”, and a foveal-peripheral anisotropy. Moreover, we show that the model predicts increased responses to correlated visual stimuli. Altogether, these results suggest that crowding has little immediate bearing on object recognition but is a by-product of a general, elementary integration mechanism in early vision aimed at improving signal quality.  相似文献   

10.
The recognition of object categories is effortlessly accomplished in everyday life, yet its neural underpinnings remain not fully understood. In this electroencephalography (EEG) study, we used single-trial classification to perform a Representational Similarity Analysis (RSA) of categorical representation of objects in human visual cortex. Brain responses were recorded while participants viewed a set of 72 photographs of objects with a planned category structure. The Representational Dissimilarity Matrix (RDM) used for RSA was derived from confusions of a linear classifier operating on single EEG trials. In contrast to past studies, which used pairwise correlation or classification to derive the RDM, we used confusion matrices from multi-class classifications, which provided novel self-similarity measures that were used to derive the overall size of the representational space. We additionally performed classifications on subsets of the brain response in order to identify spatial and temporal EEG components that best discriminated object categories and exemplars. Results from category-level classifications revealed that brain responses to images of human faces formed the most distinct category, while responses to images from the two inanimate categories formed a single category cluster. Exemplar-level classifications produced a broadly similar category structure, as well as sub-clusters corresponding to natural language categories. Spatiotemporal components of the brain response that differentiated exemplars within a category were found to differ from those implicated in differentiating between categories. Our results show that a classification approach can be successfully applied to single-trial scalp-recorded EEG to recover fine-grained object category structure, as well as to identify interpretable spatiotemporal components underlying object processing. Finally, object category can be decoded from purely temporal information recorded at single electrodes.  相似文献   

11.
Recognizing an object takes just a fraction of a second, less than the blink of an eye. Applying multivariate pattern analysis, or “brain decoding”, methods to magnetoencephalography (MEG) data has allowed researchers to characterize, in high temporal resolution, the emerging representation of object categories that underlie our capacity for rapid recognition. Shortly after stimulus onset, object exemplars cluster by category in a high-dimensional activation space in the brain. In this emerging activation space, the decodability of exemplar category varies over time, reflecting the brain’s transformation of visual inputs into coherent category representations. How do these emerging representations relate to categorization behavior? Recently it has been proposed that the distance of an exemplar representation from a categorical boundary in an activation space is critical for perceptual decision-making, and that reaction times should therefore correlate with distance from the boundary. The predictions of this distance hypothesis have been born out in human inferior temporal cortex (IT), an area of the brain crucial for the representation of object categories. When viewed in the context of a time varying neural signal, the optimal time to “read out” category information is when category representations in the brain are most decodable. Here, we show that the distance from a decision boundary through activation space, as measured using MEG decoding methods, correlates with reaction times for visual categorization during the period of peak decodability. Our results suggest that the brain begins to read out information about exemplar category at the optimal time for use in choice behaviour, and support the hypothesis that the structure of the representation for objects in the visual system is partially constitutive of the decision process in recognition.  相似文献   

12.
This paper introduces a new approach to assess visual representations underlying the recognition of objects. Human performance is modeled by CLARET, a machine learning and matching system, based on inductive logic programming and graph matching principles. The model is applied to data of a learning experiment addressing the role of prior experience in the ontogenesis of mental object representations. Prior experience was varied in terms of sensory modality, i.e. visual versus haptic versus visuohaptic. The analysis revealed distinct differences between the representational formats used by subjects with haptic versus those with no prior object experience. These differences suggest that prior haptic exploration stimulates the evolution of object representations which are characterized by an increased differentiation between attribute values and a pronounced structural encoding.  相似文献   

13.
BACKGROUND: When we view static scenes that imply motion - such as an object dropping off a shelf - recognition memory for the position of the object is extrapolated forward. It is as if the object in our mind's eye comes alive and continues on its course. This phenomenon is known as representational momentum and results in a distortion of recognition memory in the implied direction of motion. Representational momentum is modifiable; simply labelling a drawing of a pointed object as 'rocket' will facilitate the effect, whereas the label 'steeple' will impede it. We used functional magnetic resonance imaging (fMRI) to explore the neural substrate for representational momentum. RESULTS: Subjects participated in two experiments. In the first, they were presented with video excerpts of objects in motion (versus the same objects in a resting position). This identified brain areas responsible for motion perception. In the second experiment, they were presented with still photographs of the same target items, only some of which implied motion (representational momentum stimuli). When viewing still photographs of scenes implying motion, activity was revealed in secondary visual cortical regions that overlap with areas responsible for the perception of actual motion. Additional bilateral activity was revealed within a posterior satellite of V5 for the representational momentum stimuli. Activation was also engendered in the anterior cingulate cortex. CONCLUSIONS: Considering the implicit nature of representational momentum and its modifiability, the findings suggest that higher-order semantic information can act on secondary visual cortex to alter perception without explicit awareness.  相似文献   

14.
We generated panoramic imagery by simulating a fly-like robot carrying an imaging sensor, moving in free flight through a virtual arena bounded by walls, and containing obstructions. Flight was conducted under closed-loop control by a bio-inspired algorithm for visual guidance with feedback signals corresponding to the true optic flow that would be induced on an imager (computed by known kinematics and position of the robot relative to the environment). The robot had dynamics representative of a housefly-sized organism, although simplified to two-degree-of-freedom flight to generate uniaxial (azimuthal) optic flow on the retina in the plane of travel. Surfaces in the environment contained images of natural and man-made scenes that were captured by the moving sensor. Two bio-inspired motion detection algorithms and two computational optic flow estimation algorithms were applied to sequences of image data, and their performance as optic flow estimators was evaluated by estimating the mutual information between outputs and true optic flow in an equatorial section of the visual field. Mutual information for individual estimators at particular locations within the visual field was surprisingly low (less than 1 bit in all cases) and considerably poorer for the bio-inspired algorithms that the man-made computational algorithms. However, mutual information between weighted sums of these signals and comparable sums of the true optic flow showed significant increases for the bio-inspired algorithms, whereas such improvement did not occur for the computational algorithms. Such summation is representative of the spatial integration performed by wide-field motion-sensitive neurons in the third optic ganglia of flies.  相似文献   

15.
Computations in the early visual cortex.   总被引:1,自引:0,他引:1  
This paper reviews some of the recent neurophysiological studies that explore the variety of visual computations in the early visual cortex in relation to geometric inference, i.e. the inference of contours, surfaces and shapes. It attempts to draw connections between ideas from computational vision and findings from awake primate electrophysiology. In the classical feed-forward, modular view of visual processing, the early visual areas (LGN, V1 and V2) are modules that serve to extract local features, while higher extrastriate areas are responsible for shape inference and invariant object recognition. However, recent findings in primate early visual systems reveal that the computations in the early visual cortex are rather complex and dynamic, as well as interactive and plastic, subject to influence from global context, higher order perceptual inference, task requirement and behavioral experience. The evidence argues that the early visual cortex does not merely participate in the first stage of visual processing, but is involved in many levels of visual computation.  相似文献   

16.
Understanding the neural mechanisms of object and face recognition is one of the fundamental challenges of visual neuroscience. The neurons in inferior temporal (IT) cortex have been reported to exhibit dynamic responses to face stimuli. However, little is known about how the dynamic properties of IT neurons emerge in the face information processing. To address this issue, we made a model of IT cortex, which performs face perception via an interaction between different IT networks. The model was based on the face information processed by three resolution maps in early visual areas. The network model of IT cortex consists of four kinds of networks, in which the information about a whole face is combined with the information about its face parts and their arrangements. We show here that the learning of face stimuli makes the functional connections between these IT networks, causing a high spike correlation of IT neuron pairs. A dynamic property of subthreshold membrane potential of IT neuron, produced by Hodgkin–Huxley model, enables the coordination of temporal information without changing the firing rate, providing the basis of the mechanism underlying face perception. We show also that the hierarchical processing of face information allows IT cortex to perform a “coarse-to-fine” processing of face information. The results presented here seem to be compatible with experimental data about dynamic properties of IT neurons.  相似文献   

17.
Inferior temporal (IT) cortex as the final stage of the ventral visual pathway is involved in visual object recognition. In our everyday life we need to recognize visual objects that are degraded by noise. Psychophysical studies have shown that the accuracy and speed of the object recognition decreases as the amount of visual noise increases. However, the neural representation of ambiguous visual objects and the underlying neural mechanisms of such changes in the behavior are not known. Here, by recording the neuronal spiking activity of macaque monkeys’ IT we explored the relationship between stimulus ambiguity and the IT neural activity. We found smaller amplitude, later onset, earlier offset and shorter duration of the response as visual ambiguity increased. All of these modulations were gradual and correlated with the level of stimulus ambiguity. We found that while category selectivity of IT neurons decreased with noise, it was preserved for a large extent of visual ambiguity. This noise tolerance for category selectivity in IT was lost at 60% noise level. Interestingly, while the response of the IT neurons to visual stimuli at 60% noise level was significantly larger than their baseline activity and full (100%) noise, it was not category selective anymore. The latter finding shows a neural representation that signals the presence of visual stimulus without signaling what it is. In general these findings, in the context of a drift diffusion model, explain the neural mechanisms of perceptual accuracy and speed changes in the process of recognizing ambiguous objects.  相似文献   

18.
Lateral and recurrent connections are ubiquitous in biological neural circuits. Yet while the strong computational abilities of feedforward networks have been extensively studied, our understanding of the role and advantages of recurrent computations that might explain their prevalence remains an important open challenge. Foundational studies by Minsky and Roelfsema argued that computations that require propagation of global information for local computation to take place would particularly benefit from the sequential, parallel nature of processing in recurrent networks. Such “tag propagation” algorithms perform repeated, local propagation of information and were originally introduced in the context of detecting connectedness, a task that is challenging for feedforward networks. Here, we advance the understanding of the utility of lateral and recurrent computation by first performing a large-scale empirical study of neural architectures for the computation of connectedness to explore feedforward solutions more fully and establish robustly the importance of recurrent architectures. In addition, we highlight a tradeoff between computation time and performance and construct hybrid feedforward/recurrent models that perform well even in the presence of varying computational time limitations. We then generalize tag propagation architectures to propagating multiple interacting tags and demonstrate that these are efficient computational substrates for more general computations of connectedness by introducing and solving an abstracted biologically inspired decision-making task. Our work thus clarifies and expands the set of computational tasks that can be solved efficiently by recurrent computation, yielding hypotheses for structure in population activity that may be present in such tasks.  相似文献   

19.
Object recognition is achieved through neural mechanisms reliant on the activity of distributed coordinated neural assemblies. In the initial steps of this process, an object''s features are thought to be coded very rapidly in distinct neural assemblies. These features play different functional roles in the recognition process - while colour facilitates recognition, additional contours and edges delay it. Here, we selectively varied the amount and role of object features in an entry-level categorization paradigm and related them to the electrical activity of the human brain. We found that early synchronizations (approx. 100 ms) increased quantitatively when more image features had to be coded, without reflecting their qualitative contribution to the recognition process. Later activity (approx. 200–400 ms) was modulated by the representational role of object features. These findings demonstrate that although early synchronizations may be sufficient for relatively crude discrimination of objects in visual scenes, they cannot support entry-level categorization. This was subserved by later processes of object model selection, which utilized the representational value of object features such as colour or edges to select the appropriate model and achieve identification.  相似文献   

20.
In this article, we present a neurologically motivated computational architecture for visual information processing. The computational architecture’s focus lies in multiple strategies: hierarchical processing, parallel and concurrent processing, and modularity. The architecture is modular and expandable in both hardware and software, so that it can also cope with multisensory integrations – making it an ideal tool for validating and applying computational neuroscience models in real time under real-world conditions. We apply our architecture in real time to validate a long-standing biologically inspired visual object recognition model, HMAX. In this context, the overall aim is to supply a humanoid robot with the ability to perceive and understand its environment with a focus on the active aspect of real-time spatiotemporal visual processing. We show that our approach is capable of simulating information processing in the visual cortex in real time and that our entropy-adaptive modification of HMAX has a higher efficiency and classification performance than the standard model (up to \(\sim \!+6\,\% \) ).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号