首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Xu J  Yang Z  Tsien JZ 《PloS one》2010,5(12):e15796
Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.  相似文献   

2.
The visual system is the most studied sensory pathway, which is partly because visual stimuli have rather intuitive properties. There are reasons to think that the underlying principle ruling coding, however, is the same for vision and any other type of sensory signal, namely the code has to satisfy some notion of optimality--understood as minimum redundancy or as maximum transmitted information. Given the huge variability of natural stimuli, it would seem that attaining an optimal code is almost impossible; however, regularities and symmetries in the stimuli can be used to simplify the task: symmetries allow predicting one part of a stimulus from another, that is, they imply a structured type of redundancy. Optimal coding can only be achieved once the intrinsic symmetries of natural scenes are understood and used to the best performance of the neural encoder. In this paper, we review the concepts of optimal coding and discuss the known redundancies and symmetries that visual scenes have. We discuss in depth the only approach which implements the three of them known so far: translational invariance, scale invariance and multiscaling. Not surprisingly, the resulting code possesses features observed in real visual systems in mammals.  相似文献   

3.
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.  相似文献   

4.
Graham DJ  Field DJ 《Spatial Vision》2007,21(1-2):149-164
Paintings are the product of a process that begins with ordinary vision in the natural world and ends with manipulation of pigments on canvas. Because artists must produce images that can be seen by a visual system that is thought to take advantage of statistical regularities in natural scenes, artists are likely to replicate many of these regularities in their painted art. We have tested this notion by computing basic statistical properties and modeled cell response properties for a large set of digitized paintings and natural scenes. We find that both representational and non-representational (abstract) paintings from our sample (124 images) show basic similarities to a sample of natural scenes in terms of their spatial frequency amplitude spectra, but the paintings and natural scenes show significantly different mean amplitude spectrum slopes. We also find that the intensity distributions of paintings show a lower skewness and sparseness than natural scenes. We account for this by considering the range of luminances found in the environment compared to the range available in the medium of paint. A painting's range is limited by the reflective properties of its materials. We argue that artists do not simply scale the intensity range down but use a compressive nonlinearity. In our studies, modeled retinal and cortical filter responses to the images were less sparse for the paintings than for the natural scenes. But when a compressive nonlinearity was applied to the images, both the paintings' sparseness and the modeled responses to the paintings showed the same or greater sparseness compared to the natural scenes. This suggests that artists achieve some degree of nonlinear compression in their paintings. Because paintings have captivated humans for millennia, finding basic statistical regularities in paintings' spatial structure could grant insights into the range of spatial patterns that humans find compelling.  相似文献   

5.
A number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI’s poor temporal resolution. In this study, we leveraged the high temporal resolution of EEG to develop an encoding technique based on the distribution of responses generated by a population of real-world scenes. This approach maps neural signals to each pixel within a given image and reveals location-specific transformations of the visual code, providing a spatiotemporal signature for the image at each electrode. Our analyses of the mapping results revealed that scenes undergo a series of nonuniform transformations that prioritize different spatial frequencies at different regions of scenes over time. This mapping technique offers a potential avenue for future studies to explore how dynamic feedforward and recurrent processes inform and refine high-level representations of our visual world.  相似文献   

6.
Conventional imagers and almost all vision processes use and rely on theories that are based on the principle of static image-frames. A frame is a 2D matrix that represents the spatial locations of intensities of a scene projected on the imager. The notion of a frame itself is so embedded in machine vision, that it is usually taken for granted that this is how biological systems store light information. This paper presents a biosinpired event-based image formation principle, which output data rely on an asynchronous acquisition process. The generated information is stored in temporal volumes, which size and information depend only on the dynamic content of observed scenes. Practical analysis of such information will shows that the processing of visual information can only be based on a semiotic process. The paper also provides a general definition of the notion of visual features as the interpretation of signs according to different possible readings of the codified visual signal.  相似文献   

7.
Because of the limited processing capacity of eyes, retinal networks must adapt constantly to best present the ever changing visual world to the brain. However, we still know little about how adaptation in retinal networks shapes neural encoding of changing information. To study this question, we recorded voltage responses from photoreceptors (R1–R6) and their output neurons (LMCs) in the Drosophila eye to repeated patterns of contrast values, collected from natural scenes. By analyzing the continuous photoreceptor-to-LMC transformations of these graded-potential neurons, we show that the efficiency of coding is dynamically improved by adaptation. In particular, adaptation enhances both the frequency and amplitude distribution of LMC output by improving sensitivity to under-represented signals within seconds. Moreover, the signal-to-noise ratio of LMC output increases in the same time scale. We suggest that these coding properties can be used to study network adaptation using the genetic tools in Drosophila, as shown in a companion paper (Part II).  相似文献   

8.
Both natural scenes and visual art are often perceived as esthetically pleasing. It is therefore conceivable that the two types of visual stimuli share statistical properties. For example, natural scenes display a Fourier power spectrum that tends to fall with spatial frequency according to a power-law. This result indicates that natural scenes have fractal-like, scale-invariant properties. In the present study, we asked whether visual art displays similar statistical properties by measuring their Fourier power spectra. Our analysis was restricted to graphic art from the Western hemisphere. For comparison, we also analyzed images, which generally display relatively low or no esthetic quality (household and laboratory objects, parts of plants, and scientific illustrations). Graphic art, but not the other image categories, resembles natural scenes in showing fractal-like, scale-invariant statistics. This property is universal in our sample of graphic art; it is independent of cultural variables, such as century and country of origin, techniques used or subject matter. We speculate that both graphic art and natural scenes share statistical properties because visual art is adapted to the structure of the visual system which, in turn, is adapted to process optimally the image statistics of natural scenes.  相似文献   

9.
The human body is a highly familiar and socially very important object. Does this mean that the human body has a special status with respect to visual attention? In the current paper we tested whether people in natural scenes attract attention and “pop out” or, alternatively, are at least searched for more efficiently than targets of another category (machines). Observers in our study searched a visual array for dynamic or static scenes containing humans amidst scenes containing machines and vice versa. The arrays consisted of 2, 4, 6 or 8 scenes arranged in a circular array, with targets being present or absent. Search times increased with set size for dynamic and static human and machine targets, arguing against pop out. However, search for human targets was more efficient than for machine targets as indicated by shallower search slopes for human targets. Eye tracking further revealed that observers made more first fixations to human than to machine targets and that their on-target fixation durations were shorter for human compared to machine targets. In summary, our results suggest that searching for people in natural scenes is more efficient than searching for other categories even though people do not pop out.  相似文献   

10.
The primary visual cortex (V1) is the first cortical area to receive visual input, and inferior temporal (IT) areas are among the last along the ventral visual pathway. We recorded, in area V1 of anaesthetized cats and area IT of awake macaque monkeys, responses of neurons to videos of natural scenes. Responses were analysed to test various hypotheses concerning the nature of neural coding in these two regions. A variety of spike-train statistics were measured including spike-count distributions, interspike interval distributions, coefficients of variation, power spectra, Fano factors and different sparseness measures. All statistics showed non-Poisson characteristics and several revealed self-similarity of the spike trains. Spike-count distributions were approximately exponential in both visual areas for eight different videos and for counting windows ranging from 50 ms to 5 seconds. The results suggest that the neurons maximize their information carrying capacity while maintaining a fixed long-term-average firing rate, or equivalently, minimize their average firing rate for a fixed information carrying capacity.  相似文献   

11.
Mante V  Bonin V  Carandini M 《Neuron》2008,58(4):625-638
Functional models of the early visual system should predict responses not only to simple artificial stimuli but also to sequences of complex natural scenes. An ideal testbed for such models is the lateral geniculate nucleus (LGN). Mechanisms shaping LGN responses include the linear receptive field and two fast adaptation processes, sensitive to luminance and contrast. We propose a compact functional model for these mechanisms that operates on sequences of arbitrary images. With the same parameters that fit the firing rate responses to simple stimuli, it predicts the bulk of the firing rate responses to complex stimuli, including natural scenes. Further improvements could result by adding a spiking mechanism, possibly one capable of bursts, but not by adding mechanisms of slow adaptation. We conclude that up to the LGN the responses to natural scenes can be largely explained through insights gained with simple artificial stimuli.  相似文献   

12.
Felsen G  Touryan J  Han F  Dan Y 《PLoS biology》2005,3(10):e342
A central hypothesis concerning sensory processing is that the neuronal circuits are specifically adapted to represent natural stimuli efficiently. Here we show a novel effect in cortical coding of natural images. Using spike-triggered average or spike-triggered covariance analyses, we first identified the visual features selectively represented by each cortical neuron from its responses to natural images. We then measured the neuronal sensitivity to these features when they were present in either natural images or random stimuli. We found that in the responses of complex cells, but not of simple cells, the sensitivity was markedly higher for natural images than for random stimuli. Such elevated sensitivity leads to increased detectability of the visual features and thus an improved cortical representation of natural scenes. Interestingly, this effect is due not to the spatial power spectra of natural images, but to their phase regularities. These results point to a distinct visual-coding strategy that is mediated by contextual modulation of cortical responses tuned to the spatial-phase structure of natural scenes.  相似文献   

13.
Hu M  Wang Y  Wang Y 《PloS one》2011,6(10):e25410
The visual information we receive during natural vision changes rapidly and continuously. The visual system must adapt to the spatiotemporal contents of the environment in order to efficiently process the dynamic signals. However, neuronal responses to luminance contrast are usually measured using drifting or stationary gratings presented for a prolonged duration. Since motion in our visual field is continuous, the signals received by the visual system contain an abundance of transient components in the contrast domain. Here using a modified reverse correlation method, we studied the properties of responses of neurons in the cat primary visual cortex to different contrasts of grating stimuli presented statically and transiently for 40 ms, and showed that neurons can effectively discriminate the rapidly changing contrasts. The change in the contrast response function (CRF) over time mainly consisted of an increment in contrast gain (CRF shifts to left) in the developing phase of temporal responses and a decrement in response gain (CRF shifts downward) in the decay phase. When the distribution range of stimulus contrasts was increased, neurons demonstrated decrement in contrast gain and response gain. Our results suggest that contrast gain control (contrast adaptation) and response gain control mechanisms are well established during the first tens of milliseconds after stimulus onset and may cooperatively mediate the rapid dynamic responses of visual cortical neurons to the continuously changing contrast. This fast contrast adaptation may play a role in detecting contrast contours in the context of visual scenes that are varying rapidly.  相似文献   

14.
Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.  相似文献   

15.
Lightness illusions are fundamental to human perception, and yet why we see them is still the focus of much research. Here we address the question by modelling not human physiology or perception directly as is typically the case but our natural visual world and the need for robust behaviour. Artificial neural networks were trained to predict the reflectance of surfaces in a synthetic ecology consisting of 3-D “dead-leaves” scenes under non-uniform illumination. The networks learned to solve this task accurately and robustly given only ambiguous sense data. In addition—and as a direct consequence of their experience—the networks also made systematic “errors” in their behaviour commensurate with human illusions, which includes brightness contrast and assimilation—although assimilation (specifically White's illusion) only emerged when the virtual ecology included 3-D, as opposed to 2-D scenes. Subtle variations in these illusions, also found in human perception, were observed, such as the asymmetry of brightness contrast. These data suggest that “illusions” arise in humans because (i) natural stimuli are ambiguous, and (ii) this ambiguity is resolved empirically by encoding the statistical relationship between images and scenes in past visual experience. Since resolving stimulus ambiguity is a challenge faced by all visual systems, a corollary of these findings is that human illusions must be experienced by all visual animals regardless of their particular neural machinery. The data also provide a more formal definition of illusion: the condition in which the true source of a stimulus differs from what is its most likely (and thus perceived) source. As such, illusions are not fundamentally different from non-illusory percepts, all being direct manifestations of the statistical relationship between images and scenes.  相似文献   

16.
Ma LB  Wu S 《生理学报》2011,63(5):463-471
效率编码理论认为经过漫长历史进化,大脑感知系统有效地适应了自然环境.自然图像统计规律计算建模对视觉信息处理机理的理解大有裨益.本文简要回顾近期视觉系统对自然图像效率编码的最新进展.  相似文献   

17.
Yao JG  Gao X  Yan HM  Li CY 《PloS one》2011,6(1):e16343

Background

Instantaneous object discrimination and categorization are fundamental cognitive capacities performed with the guidance of visual attention. Visual attention enables selection of a salient object within a limited area of the visual field; we referred to as “field of attention” (FA). Though there is some evidence concerning the spatial extent of object recognition, the following questions still remain unknown: (a) how large is the FA for rapid object categorization, (b) how accuracy of attention is distributed over the FA, and (c) how fast complex objects can be categorized when presented against backgrounds formed by natural scenes.

Methodology/Principal Findings

To answer these questions, we used a visual perceptual task in which subjects were asked to focus their attention on a point while being required to categorize briefly flashed (20 ms) photographs of natural scenes by indicating whether or not these contained an animal. By measuring the accuracy of categorization at different eccentricities from the fixation point, we were able to determine the spatial extent and the distribution of accuracy over the FA, as well as the speed of categorizing objects using stimulus onset asynchrony (SOA). Our results revealed that subjects are able to rapidly categorize complex natural images within about 0.1 s without eye movement, and showed that the FA for instantaneous image categorization covers a visual field extending 20°×24°, and accuracy was highest (>90%) at the center of FA and declined with increasing eccentricity.

Conclusions/Significance

In conclusion, human beings are able to categorize complex natural images at a glance over a large extent of the visual field without eye movement.  相似文献   

18.
He X  Yang Z  Tsien JZ 《PloS one》2011,6(5):e20002
Humans can categorize objects in complex natural scenes within 100-150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ~6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization.To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (~100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes.  相似文献   

19.
《Journal of Physiology》2013,107(5):369-398
An important property of visual systems is to be simultaneously both selective to specific patterns found in the sensory input and invariant to possible variations. Selectivity and invariance (tolerance) are opposing requirements. It has been suggested that they could be joined by iterating a sequence of elementary selectivity and tolerance computations. It is, however, unknown what should be selected or tolerated at each level of the hierarchy. We approach this issue by learning the computations from natural images. We propose and estimate a probabilistic model of natural images that consists of three processing layers. Two natural image data sets are considered: image patches, and complete visual scenes downsampled to the size of small patches. For both data sets, we find that in the first two layers, simple and complex cell-like computations are performed. In the third layer, we mainly find selectivity to longer contours; for patch data, we further find some selectivity to texture, while for the downsampled complete scenes, some selectivity to curvature is observed.  相似文献   

20.
For a long time, the images of the Upper Paleolithic were represented in a way that emphasized their aesthetic qualities and ignored the specific characteristics of their natural supports. It is more damageable as the architecture of the decorated caves plays a real part, at different levels, in the elaboration of the parietal devices. As omnipresent elements of these devices, the signs play a determining part in this architecture. Analysing the links between the signs and their direct supports brings to light the various ways in which parietal devices can be used and also the integration and research for the volumes and frame. A study led inside the caves on a sample of 692 signs, distributed in the franco-cantabric paleolithic space, makes apparent the variety of the graphic choices left to the artists standing in front of the wall.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号