首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Recognizing an object takes just a fraction of a second, less than the blink of an eye. Applying multivariate pattern analysis, or “brain decoding”, methods to magnetoencephalography (MEG) data has allowed researchers to characterize, in high temporal resolution, the emerging representation of object categories that underlie our capacity for rapid recognition. Shortly after stimulus onset, object exemplars cluster by category in a high-dimensional activation space in the brain. In this emerging activation space, the decodability of exemplar category varies over time, reflecting the brain’s transformation of visual inputs into coherent category representations. How do these emerging representations relate to categorization behavior? Recently it has been proposed that the distance of an exemplar representation from a categorical boundary in an activation space is critical for perceptual decision-making, and that reaction times should therefore correlate with distance from the boundary. The predictions of this distance hypothesis have been born out in human inferior temporal cortex (IT), an area of the brain crucial for the representation of object categories. When viewed in the context of a time varying neural signal, the optimal time to “read out” category information is when category representations in the brain are most decodable. Here, we show that the distance from a decision boundary through activation space, as measured using MEG decoding methods, correlates with reaction times for visual categorization during the period of peak decodability. Our results suggest that the brain begins to read out information about exemplar category at the optimal time for use in choice behaviour, and support the hypothesis that the structure of the representation for objects in the visual system is partially constitutive of the decision process in recognition.  相似文献   

2.
Why is Real-World Visual Object Recognition Hard?   总被引:1,自引:0,他引:1  
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, “natural” images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled “natural” images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientist's “null” model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a “simpler” recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.  相似文献   

3.
An object in the peripheral visual field is more difficult to recognize when surrounded by other objects. This phenomenon is called “crowding”. Crowding places a fundamental constraint on human vision that limits performance on numerous tasks. It has been suggested that crowding results from spatial feature integration necessary for object recognition. However, in the absence of convincing models, this theory has remained controversial. Here, we present a quantitative and physiologically plausible model for spatial integration of orientation signals, based on the principles of population coding. Using simulations, we demonstrate that this model coherently accounts for fundamental properties of crowding, including critical spacing, “compulsory averaging”, and a foveal-peripheral anisotropy. Moreover, we show that the model predicts increased responses to correlated visual stimuli. Altogether, these results suggest that crowding has little immediate bearing on object recognition but is a by-product of a general, elementary integration mechanism in early vision aimed at improving signal quality.  相似文献   

4.
A much-debated question in object recognition is whether expertise for faces and expertise for non-face objects utilize common perceptual information. We investigated this issue by assessing the diagnostic information required for different types of expertise. Specifically, we asked whether face categorization and expert car categorization at the subordinate level relies on the same spatial frequency (SF) scales. Fifteen car experts and fifteen novices performed a category verification task with spatially filtered images of faces, cars, and airplanes. Images were categorized based on their basic (e.g. “car”) and subordinate level (e.g. “Japanese car”) identity. The effect of expertise was not evident when objects were categorized at the basic level. However, when the car experts categorized faces and cars at the subordinate level, the two types of expertise required different kinds of SF information. Subordinate categorization of faces relied on low SFs more than on high SFs, whereas subordinate expert car categorization relied on high SFs more than on low SFs. These findings suggest that expertise in the recognition of objects and faces do not utilize the same type of information. Rather, different types of expertise require different types of diagnostic visual information.  相似文献   

5.
Mechanisms of explicit object recognition are often difficult to investigate and require stimuli with controlled features whose expression can be manipulated in a precise quantitative fashion. Here, we developed a novel method (called "Dots"), for generating visual stimuli, which is based on the progressive deformation of a regular lattice of dots, driven by local contour information from images of objects. By applying progressively larger deformation to the lattice, the latter conveys progressively more information about the target object. Stimuli generated with the presented method enable a precise control of object-related information content while preserving low-level image statistics, globally, and affecting them only little, locally. We show that such stimuli are useful for investigating object recognition under a naturalistic setting--free visual exploration--enabling a clear dissociation between object detection and explicit recognition. Using the introduced stimuli, we show that top-down modulation induced by previous exposure to target objects can greatly influence perceptual decisions, lowering perceptual thresholds not only for object recognition but also for object detection (visual hysteresis). Visual hysteresis is target-specific, its expression and magnitude depending on the identity of individual objects. Relying on the particular features of dot stimuli and on eye-tracking measurements, we further demonstrate that top-down processes guide visual exploration, controlling how visual information is integrated by successive fixations. Prior knowledge about objects can guide saccades/fixations to sample locations that are supposed to be highly informative, even when the actual information is missing from those locations in the stimulus. The duration of individual fixations is modulated by the novelty and difficulty of the stimulus, likely reflecting cognitive demand.  相似文献   

6.
Once people perceive what is in the hidden figure such as Dallenbach’s cow and Dalmatian, they seldom seem to come back to the previous state when they were ignorant of the answer. This special type of learning process can be accomplished in a short time, with the effect of learning lasting for a long time (visual one-shot learning). Although it is an intriguing cognitive phenomenon, the lack of the control of difficulty of stimuli presented has been a problem in research. Here we propose a novel paradigm to create new hidden figures systematically by using a morphing technique. Through gradual changes from a blurred and binarized two-tone image to a blurred grayscale image of the original photograph including objects in a natural scene, spontaneous one-shot learning can occur at a certain stage of morphing when a sufficient amount of information is restored to the degraded image. A negative correlation between confidence levels and reaction times is observed, giving support to the fluency theory of one-shot learning. The correlation between confidence ratings and correct recognition rates indicates that participants had an accurate introspective ability (metacognition). The learning effect could be tested later by verifying whether or not the target object was recognized quicker in the second exposure. The present method opens a way for a systematic production of “good” hidden figures, which can be used to demystify the nature of visual one-shot learning.  相似文献   

7.
8.
Yao JG  Gao X  Yan HM  Li CY 《PloS one》2011,6(1):e16343

Background

Instantaneous object discrimination and categorization are fundamental cognitive capacities performed with the guidance of visual attention. Visual attention enables selection of a salient object within a limited area of the visual field; we referred to as “field of attention” (FA). Though there is some evidence concerning the spatial extent of object recognition, the following questions still remain unknown: (a) how large is the FA for rapid object categorization, (b) how accuracy of attention is distributed over the FA, and (c) how fast complex objects can be categorized when presented against backgrounds formed by natural scenes.

Methodology/Principal Findings

To answer these questions, we used a visual perceptual task in which subjects were asked to focus their attention on a point while being required to categorize briefly flashed (20 ms) photographs of natural scenes by indicating whether or not these contained an animal. By measuring the accuracy of categorization at different eccentricities from the fixation point, we were able to determine the spatial extent and the distribution of accuracy over the FA, as well as the speed of categorizing objects using stimulus onset asynchrony (SOA). Our results revealed that subjects are able to rapidly categorize complex natural images within about 0.1 s without eye movement, and showed that the FA for instantaneous image categorization covers a visual field extending 20°×24°, and accuracy was highest (>90%) at the center of FA and declined with increasing eccentricity.

Conclusions/Significance

In conclusion, human beings are able to categorize complex natural images at a glance over a large extent of the visual field without eye movement.  相似文献   

9.
10.
Viewpoint-dependent recognition performance of 3-D objects has often been taken as an indication of a viewpoint-dependent object representation. This viewpoint dependence is most often found using metrically manipulated objects. We aim to investigate whether instead these results can be explained by viewpoint and object property (e.g. curvature) information not being processed independently at a lower level, prior to object recognition itself. Multidimensional signal detection theory offers a useful framework, allowing us to model this as a low-level correlation between the internal noise distributions of viewpoint and object property dimensions. In Experiment 1, we measured these correlations using both Yes/No and adjustment tasks. We found a good correspondence across tasks, but large individual differences. In Experiment 2, we compared these results to the viewpoint dependence of object recognition through a Yes/No categorization task. We found that viewpoint-independent object recognition could not be fully reached using our stimuli, and that the pattern of viewpoint dependence was strongly correlated with the low-level correlations we measured earlier. In part, however, the viewpoint was abstracted despite these correlations. We conclude that low-level correlations do exist prior to object recognition, and can offer an explanation for some viewpoint effects on the discrimination of metrically manipulated 3-D objects.  相似文献   

11.
Scene analysis, the process of converting sensory information from peripheral receptors into a representation of objects in the external world, is central to our human experience of perception. Through our efforts to design systems for object recognition and for robot navigation, we have come to appreciate that a number of common themes apply across the sensory modalities of vision, audition, and olfaction; and many apply across species ranging from invertebrates to mammals. These themes include the need for adaptation in the periphery and trade-offs between selectivity for frequency or molecular structure with resolution in time or space. In addition, neural mechanisms involving coincidence detection are found in many different subsystems that appear to implement cross-correlation or autocorrelation computations.  相似文献   

12.
Perception and encoding of object size is an important feature of sensory systems. In the visual system object size is encoded by the visual angle (visual aperture) on the retina, but the aperture depends on the distance of the object. As object distance is not unambiguously encoded in the visual system, higher computational mechanisms are needed. This phenomenon is termed “size constancy”. It is assumed to reflect an automatic re-scaling of visual aperture with perceived object distance. Recently, it was found that in echolocating bats, the ‘sonar aperture’, i.e., the range of angles from which sound is reflected from an object back to the bat, is unambiguously perceived and neurally encoded. Moreover, it is well known that object distance is accurately perceived and explicitly encoded in bat sonar. Here, we addressed size constancy in bat biosonar, recruiting virtual-object techniques. Bats of the species Phyllostomus discolor learned to discriminate two simple virtual objects that only differed in sonar aperture. Upon successful discrimination, test trials were randomly interspersed using virtual objects that differed in both aperture and distance. It was tested whether the bats spontaneously assigned absolute width information to these objects by combining distance and aperture. The results showed that while the isolated perceptual cues encoding object width, aperture, and distance were all perceptually well resolved by the bats, the animals did not assign absolute width information to the test objects. This lack of sonar size constancy may result from the bats relying on different modalities to extract size information at different distances. Alternatively, it is conceivable that familiarity with a behaviorally relevant, conspicuous object is required for sonar size constancy, as it has been argued for visual size constancy. Based on the current data, it appears that size constancy is not necessarily an essential feature of sonar perception in bats.  相似文献   

13.
Rainer G  Miller EK 《Neuron》2000,27(1):179-189
The perception and recognition of objects are improved by experience. Here, we show that monkeys' ability to recognize degraded objects was improved by several days of practice with these objects. This improvement was reflected in the activity of neurons in the prefrontal (PF) cortex, a brain region critical for a wide range of visual behaviors. Familiar objects activated fewer neurons than did novel objects, but these neurons were more narrowly tuned, and the object representation was more resistant to the effects of degradation, after experience. These results demonstrate a neural correlate of visual learning in the PF cortex of adult monkeys.  相似文献   

14.
It is well known that motion facilitates the visual perception of solid object shape, particularly when surface texture or other identifiable features (e.g., corners) are present. Conventional models of structure-from-motion require the presence of texture or identifiable object features in order to recover 3-D structure. Is the facilitation in 3-D shape perception similar in magnitude when surface texture is absent? On any given trial in the current experiments, participants were presented with a single randomly-selected solid object (bell pepper or randomly-shaped “glaven”) for 12 seconds and were required to indicate which of 12 (for bell peppers) or 8 (for glavens) simultaneously visible objects possessed the same shape. The initial single object’s shape was defined either by boundary contours alone (i.e., presented as a silhouette), specular highlights alone, specular highlights combined with boundary contours, or texture. In addition, there was a haptic condition: in this condition, the participants haptically explored with both hands (but could not see) the initial single object for 12 seconds; they then performed the same shape-matching task used in the visual conditions. For both the visual and haptic conditions, motion (rotation in depth or active object manipulation) was present in half of the trials and was not present for the remaining trials. The effect of motion was quantitatively similar for all of the visual and haptic conditions–e.g., the participants’ performance in Experiment 1 was 93.5 percent higher in the motion or active haptic manipulation conditions (when compared to the static conditions). The current results demonstrate that deforming specular highlights or boundary contours facilitate 3-D shape perception as much as the motion of objects that possess texture. The current results also indicate that the improvement with motion that occurs for haptics is similar in magnitude to that which occurs for vision.  相似文献   

15.
Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.  相似文献   

16.

Background

How do people sustain a visual representation of the environment? Currently, many researchers argue that a single visual working memory system sustains non-spatial object information such as colors and shapes. However, previous studies tested visual working memory for two-dimensional objects only. In consequence, the nature of visual working memory for three-dimensional (3D) object representation remains unknown.

Methodology/Principal Findings

Here, I show that when sustaining information about 3D objects, visual working memory clearly divides into two separate, specialized memory systems, rather than one system, as was previously thought. One memory system gradually accumulates sensory information, forming an increasingly precise view-dependent representation of the scene over the course of several seconds. A second memory system sustains view-invariant representations of 3D objects. The view-dependent memory system has a storage capacity of 3–4 representations and the view-invariant memory system has a storage capacity of 1–2 representations. These systems can operate independently from one another and do not compete for working memory storage resources.

Conclusions/Significance

These results provide evidence that visual working memory sustains object information in two separate, specialized memory systems. One memory system sustains view-dependent representations of the scene, akin to the view-specific representations that guide place recognition during navigation in humans, rodents and insects. The second memory system sustains view-invariant representations of 3D objects, akin to the object-based representations that underlie object cognition.  相似文献   

17.
Harris IM  Dux PE  Benito CT  Leek EC 《PloS one》2008,3(5):e2256

Background

An ongoing debate in the object recognition literature centers on whether the shape representations used in recognition are coded in an orientation-dependent or orientation-invariant manner. In this study, we asked whether the nature of the object representation (orientation-dependent vs orientation-invariant) depends on the information-processing stages tapped by the task.

Methodology/ Findings

We employed a repetition priming paradigm in which briefly presented masked objects (primes) were followed by an upright target object which had to be named as rapidly as possible. The primes were presented for variable durations (ranging from 16 to 350 ms) and in various image-plane orientations (from 0° to 180°, in 30° steps). Significant priming was obtained for prime durations above 70 ms, but not for prime durations of 16 ms and 47 ms, and did not vary as a function of prime orientation. In contrast, naming the same objects that served as primes resulted in orientation-dependent reaction time costs.

Conclusions/Significance

These results suggest that initial processing of object identity is mediated by orientation-independent information and that orientation costs in performance arise when objects are consolidated in visual short-term memory in order to be reported.  相似文献   

18.
Objective and effective image quality assessment (IQA) is directly related to the application of optical remote sensing images (ORSI). In this study, a new IQA method of standardizing the target object recognition rate (ORR) is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment.  相似文献   

19.
The relation of gamma-band synchrony to holistic perception in which concerns the effects of sensory processing, high level perceptual gestalt formation, motor planning and response is still controversial. To provide a more direct link to emergent perceptual states we have used holistic EEG/ERP paradigms where the moment of perceptual “discovery” of a global pattern was variable. Using a rapid visual presentation of short-lived Mooney objects we found an increase of gamma-band activity locked to perceptual events. Additional experiments using dynamic Mooney stimuli showed that gamma activity increases well before the report of an emergent holistic percept. To confirm these findings in a data driven manner we have further used a support vector machine classification approach to distinguish between perceptual vs. non perceptual states, based on time-frequency features. Sensitivity, specificity and accuracy were all above 95%. Modulations in the 30–75 Hz range were larger for perception states. Interestingly, phase synchrony was larger for perception states for high frequency bands. By focusing on global gestalt mechanisms instead of local processing we conclude that gamma-band activity and synchrony provide a signature of holistic perceptual states of variable onset, which are separable from sensory and motor processing.  相似文献   

20.
In this article we review current literature on cross-modal recognition and present new findings from our studies on object and scene recognition. Specifically, we address the questions of what is the nature of the representation underlying each sensory system that facilitates convergence across the senses and how perception is modified by the interaction of the senses. In the first set of our experiments, the recognition of unfamiliar objects within and across the visual and haptic modalities was investigated under conditions of changes in orientation (0 degrees or 180 degrees ). An orientation change increased recognition errors within each modality but this effect was reduced across modalities. Our results suggest that cross-modal object representations of objects are mediated by surface-dependent representations. In a second series of experiments, we investigated how spatial information is integrated across modalities and viewpoint using scenes of familiar, 3D objects as stimuli. We found that scene recognition performance was less efficient when there was either a change in modality, or in orientation, between learning and test. Furthermore, haptic learning was selectively disrupted by a verbal interpolation task. Our findings are discussed with reference to separate spatial encoding of visual and haptic scenes. We conclude by discussing a number of constraints under which cross-modal integration is optimal for object recognition. These constraints include the nature of the task, and the amount of spatial and temporal congruency of information across the modalities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号