首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Scene content selected by active vision   总被引:5,自引:0,他引:5  
The primate visual system actively selects visual information from the environment for detailed processing through mechanisms of visual attention and saccadic eye movements. This study examines the statistical properties of the scene content selected by active vision. Eye movements were recorded while participants free-viewed digitized images of natural and artificial scenes. Fixation locations were determined for each image and image patches were extracted around the observed fixation locations. Measures of local contrast, local spatial correlation and spatial frequency content were calculated on the extracted image patches. Replicating previous results, local contrast was found to be greater at the points of fixation when compared to either the contrast for image patches extracted at random locations or at the observed fixation locations using an image-shuffled database. Contrary to some results and in agreement with other results in the literature, a significant decorrelation of image intensity is observed between the locations of fixation and other neighboring locations. A discussion and analysis of methodological techniques is given that provides an explanation for the discrepancy in results. The results of our analyses indicate that both the local contrast and correlation at the points of fixation are a function of image type and, furthermore, that the magnitude of these effects depend on the levels of contrast and correlation present overall in the images. Finally, the largest effect sizes in local contrast and correlation are found at distances of approximately 1 deg of visual angle, which agrees well with measures of optimal spatial scale selectivity in the visual periphery where visual information for potential saccade targets is processed.  相似文献   

2.
Both natural scenes and visual art are often perceived as esthetically pleasing. It is therefore conceivable that the two types of visual stimuli share statistical properties. For example, natural scenes display a Fourier power spectrum that tends to fall with spatial frequency according to a power-law. This result indicates that natural scenes have fractal-like, scale-invariant properties. In the present study, we asked whether visual art displays similar statistical properties by measuring their Fourier power spectra. Our analysis was restricted to graphic art from the Western hemisphere. For comparison, we also analyzed images, which generally display relatively low or no esthetic quality (household and laboratory objects, parts of plants, and scientific illustrations). Graphic art, but not the other image categories, resembles natural scenes in showing fractal-like, scale-invariant statistics. This property is universal in our sample of graphic art; it is independent of cultural variables, such as century and country of origin, techniques used or subject matter. We speculate that both graphic art and natural scenes share statistical properties because visual art is adapted to the structure of the visual system which, in turn, is adapted to process optimally the image statistics of natural scenes.  相似文献   

3.
The visual system has a remarkable ability to extract categorical information from complex natural scenes. In order to elucidate the role of low-level image features for the recognition of objects in natural scenes, we recorded saccadic eye movements and event-related potentials (ERPs) in two experiments, in which human subjects had to detect animals in previously unseen natural images. We used a new natural image database (ANID) that is free of some of the potential artifacts that have plagued the widely used COREL images. Color and grayscale images picked from the ANID and COREL databases were used. In all experiments, color images induced a greater N1 EEG component at earlier time points than grayscale images. We suggest that this influence of color in animal detection may be masked by later processes when measuring reation times. The ERP results of go/nogo and forced choice tasks were similar to those reported earlier. The non-animal stimuli induced bigger N1 than animal stimuli both in the COREL and ANID databases. This result indicates ultra-fast processing of animal images is possible irrespective of the particular database. With the ANID images, the difference between color and grayscale images is more pronounced than with the COREL images. The earlier use of the COREL images might have led to an underestimation of the contribution of color. Therefore, we conclude that the ANID image database is better suited for the investigation of the processing of natural scenes than other databases commonly used.  相似文献   

4.
Graham DJ  Field DJ 《Spatial Vision》2007,21(1-2):149-164
Paintings are the product of a process that begins with ordinary vision in the natural world and ends with manipulation of pigments on canvas. Because artists must produce images that can be seen by a visual system that is thought to take advantage of statistical regularities in natural scenes, artists are likely to replicate many of these regularities in their painted art. We have tested this notion by computing basic statistical properties and modeled cell response properties for a large set of digitized paintings and natural scenes. We find that both representational and non-representational (abstract) paintings from our sample (124 images) show basic similarities to a sample of natural scenes in terms of their spatial frequency amplitude spectra, but the paintings and natural scenes show significantly different mean amplitude spectrum slopes. We also find that the intensity distributions of paintings show a lower skewness and sparseness than natural scenes. We account for this by considering the range of luminances found in the environment compared to the range available in the medium of paint. A painting's range is limited by the reflective properties of its materials. We argue that artists do not simply scale the intensity range down but use a compressive nonlinearity. In our studies, modeled retinal and cortical filter responses to the images were less sparse for the paintings than for the natural scenes. But when a compressive nonlinearity was applied to the images, both the paintings' sparseness and the modeled responses to the paintings showed the same or greater sparseness compared to the natural scenes. This suggests that artists achieve some degree of nonlinear compression in their paintings. Because paintings have captivated humans for millennia, finding basic statistical regularities in paintings' spatial structure could grant insights into the range of spatial patterns that humans find compelling.  相似文献   

5.
Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.  相似文献   

6.
In recent years, there has been considerable interest in visual attention models (saliency map of visual attention). These models can be used to predict eye fixation locations, and thus will have many applications in various fields which leads to obtain better performance in machine vision systems. Most of these models need to be improved because they are based on bottom-up computation that does not consider top-down image semantic contents and often does not match actual eye fixation locations. In this study, we recorded the eye movements (i.e., fixations) of fourteen individuals who viewed images which consist natural (e.g., landscape, animal) and man-made (e.g., building, vehicles) scenes. We extracted the fixation locations of eye movements in two image categories. After extraction of the fixation areas (a patch around each fixation location), characteristics of these areas were evaluated as compared to non-fixation areas. The extracted features in each patch included the orientation and spatial frequency. After feature extraction phase, different statistical classifiers were trained for prediction of eye fixation locations by these features. This study connects eye-tracking results to automatic prediction of saliency regions of the images. The results showed that it is possible to predict the eye fixation locations by using of the image patches around subjects’ fixation points.  相似文献   

7.
Orientation selectivity is the most striking feature of simple cell coding in V1 that has been shown to emerge from the reduction of higher-order correlations in natural images in a large variety of statistical image models. The most parsimonious one among these models is linear Independent Component Analysis (ICA), whereas second-order decorrelation transformations such as Principal Component Analysis (PCA) do not yield oriented filters. Because of this finding, it has been suggested that the emergence of orientation selectivity may be explained by higher-order redundancy reduction. To assess the tenability of this hypothesis, it is an important empirical question how much more redundancy can be removed with ICA in comparison to PCA or other second-order decorrelation methods. Although some previous studies have concluded that the amount of higher-order correlation in natural images is generally insignificant, other studies reported an extra gain for ICA of more than 100%. A consistent conclusion about the role of higher-order correlations in natural images can be reached only by the development of reliable quantitative evaluation methods. Here, we present a very careful and comprehensive analysis using three evaluation criteria related to redundancy reduction: In addition to the multi-information and the average log-loss, we compute complete rate–distortion curves for ICA in comparison with PCA. Without exception, we find that the advantage of the ICA filters is small. At the same time, we show that a simple spherically symmetric distribution with only two parameters can fit the data significantly better than the probabilistic model underlying ICA. This finding suggests that, although the amount of higher-order correlation in natural images can in fact be significant, the feature of orientation selectivity does not yield a large contribution to redundancy reduction within the linear filter bank models of V1 simple cells.  相似文献   

8.
We investigated whether low-level processed image properties that are shared by natural scenes and artworks – but not veridical face photographs – affect the perception of facial attractiveness and age. Specifically, we considered the slope of the radially averaged Fourier power spectrum in a log-log plot. This slope is a measure of the distribution of special frequency power in an image. Images of natural scenes and artworks possess – compared to face images – a relatively shallow slope (i.e., increased high spatial frequency power). Since aesthetic perception might be based on the efficient processing of images with natural scene statistics, we assumed that the perception of facial attractiveness might also be affected by these properties. We calculated Fourier slope and other beauty-associated measurements in face images and correlated them with ratings of attractiveness and age of the depicted persons (Study 1). We found that Fourier slope – in contrast to the other tested image properties – did not predict attractiveness ratings when we controlled for age. In Study 2A, we overlaid face images with random-phase patterns with different statistics. Patterns with a slope similar to those in natural scenes and artworks resulted in lower attractiveness and higher age ratings. In Studies 2B and 2C, we directly manipulated the Fourier slope of face images and found that images with shallower slopes were rated as more attractive. Additionally, attractiveness of unaltered faces was affected by the Fourier slope of a random-phase background (Study 3). Faces in front of backgrounds with statistics similar to natural scenes and faces were rated as more attractive. We conclude that facial attractiveness ratings are affected by specific image properties. An explanation might be the efficient coding hypothesis.  相似文献   

9.
10.
Observers can rapidly perform a variety of visual tasks such as categorizing a scene as open, as outdoor, or as a beach. Although we know that different tasks are typically associated with systematic differences in behavioral responses, to date, little is known about the underlying mechanisms. Here, we implemented a single integrated paradigm that links perceptual processes with categorization processes. Using a large image database of natural scenes, we trained machine-learning classifiers to derive quantitative measures of task-specific perceptual discriminability based on the distance between individual images and different categorization boundaries. We showed that the resulting discriminability measure accurately predicts variations in behavioral responses across categorization tasks and stimulus sets. We further used the model to design an experiment, which challenged previous interpretations of the so-called “superordinate advantage.” Overall, our study suggests that observed differences in behavioral responses across rapid categorization tasks reflect natural variations in perceptual discriminability.  相似文献   

11.
Nuding U  Zetzsche C 《Bio Systems》2007,89(1-3):273-279
We investigate a non-linear network with two processing stages optimized to reduce the statistical dependencies in natural images. This network serves as a model for the neural information processing in the higher visual areas of primates (visual cortices V2-V4). The resulting population is analyzed with regard to non-linear selectivity and invariance properties. We find units that are very selective with respect to the space spanned by all possible input signals and units that are invariant with respect to certain stimulus classes. In comparison to the measured distribution of selectivity in V2 neurons, the selectivity histogram of the network units shows an even more pronounced tendency towards higher selectivities. A special property of the system is the emergence of non-linear interactions between coefficients from different scales and orientations, which are necessary for the exploitation of higher-order statistical redundancies of natural images. We extend the concept to multi-layer systems and present some simulation results.  相似文献   

12.
By learning to discriminate among visual stimuli, human observers can become experts at specific visual tasks. The same is true for Rhesus monkeys, the major animal model of human visual perception. Here, we systematically compare how humans and monkeys solve a simple visual task. We trained humans and monkeys to discriminate between the members of small natural-image sets. We employed the "Bubbles" procedure to determine the stimulus features used by the observers. On average, monkeys used image features drawn from a diagnostic region covering about 7% +/- 2% of the images. Humans were able to use image features drawn from a much larger diagnostic region covering on average 51% +/- 4% of the images. Similarly for the two species, however, about 2% of the image needed to be visible within the diagnostic region on any individual trial for correct performance. We characterize the low-level image properties of the diagnostic regions and discuss individual differences among the monkeys. Our results reveal that monkeys base their behavior on confined image patches and essentially ignore a large fraction of the visual input, whereas humans are able to gather visual information with greater flexibility from large image regions.  相似文献   

13.
Yao JG  Gao X  Yan HM  Li CY 《PloS one》2011,6(1):e16343

Background

Instantaneous object discrimination and categorization are fundamental cognitive capacities performed with the guidance of visual attention. Visual attention enables selection of a salient object within a limited area of the visual field; we referred to as “field of attention” (FA). Though there is some evidence concerning the spatial extent of object recognition, the following questions still remain unknown: (a) how large is the FA for rapid object categorization, (b) how accuracy of attention is distributed over the FA, and (c) how fast complex objects can be categorized when presented against backgrounds formed by natural scenes.

Methodology/Principal Findings

To answer these questions, we used a visual perceptual task in which subjects were asked to focus their attention on a point while being required to categorize briefly flashed (20 ms) photographs of natural scenes by indicating whether or not these contained an animal. By measuring the accuracy of categorization at different eccentricities from the fixation point, we were able to determine the spatial extent and the distribution of accuracy over the FA, as well as the speed of categorizing objects using stimulus onset asynchrony (SOA). Our results revealed that subjects are able to rapidly categorize complex natural images within about 0.1 s without eye movement, and showed that the FA for instantaneous image categorization covers a visual field extending 20°×24°, and accuracy was highest (>90%) at the center of FA and declined with increasing eccentricity.

Conclusions/Significance

In conclusion, human beings are able to categorize complex natural images at a glance over a large extent of the visual field without eye movement.  相似文献   

14.
Felsen G  Touryan J  Han F  Dan Y 《PLoS biology》2005,3(10):e342
A central hypothesis concerning sensory processing is that the neuronal circuits are specifically adapted to represent natural stimuli efficiently. Here we show a novel effect in cortical coding of natural images. Using spike-triggered average or spike-triggered covariance analyses, we first identified the visual features selectively represented by each cortical neuron from its responses to natural images. We then measured the neuronal sensitivity to these features when they were present in either natural images or random stimuli. We found that in the responses of complex cells, but not of simple cells, the sensitivity was markedly higher for natural images than for random stimuli. Such elevated sensitivity leads to increased detectability of the visual features and thus an improved cortical representation of natural scenes. Interestingly, this effect is due not to the spatial power spectra of natural images, but to their phase regularities. These results point to a distinct visual-coding strategy that is mediated by contextual modulation of cortical responses tuned to the spatial-phase structure of natural scenes.  相似文献   

15.
Natural visual scenes are rich in information, and any neural system analysing them must piece together the many messages from large arrays of diverse feature detectors. It is known how threshold detection of compound visual stimuli (sinusoidal gratings) is determined by their components' thresholds. We investigate whether similar combination rules apply to the perception of the complex and suprathreshold visual elements in naturalistic visual images. Observers gave magnitude estimations (ratings) of the perceived differences between pairs of images made from photographs of natural scenes. Images in some pairs differed along one stimulus dimension such as object colour, location, size or blur. But, for other image pairs, there were composite differences along two dimensions (e.g. both colour and object-location might change). We examined whether the ratings for such composite pairs could be predicted from the two ratings for the respective pairs in which only one stimulus dimension had changed. We found a pooling relationship similar to that proposed for simple stimuli: Minkowski summation with exponent 2.84 yielded the best predictive power (r=0.96), an exponent similar to that generally reported for compound grating detection. This suggests that theories based on detecting simple stimuli can encompass visual processing of complex, suprathreshold stimuli.  相似文献   

16.
Xu J  Yang Z  Tsien JZ 《PloS one》2010,5(12):e15796
Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.  相似文献   

17.
A fundamental tenet of visual science is that the detailed properties of visual systems are not capricious accidents, but are closely matched by evolution and neonatal experience to the environments and lifestyles in which those visual systems must work. This has been shown most convincingly for fish and insects. For mammalian vision, however, this tenet is based more upon theoretical arguments than upon direct observations. Here, we describe experiments that require human observers to discriminate between pictures of slightly different faces or objects. These are produced by a morphing technique that allows small, quantifiable changes to be made in the stimulus images. The independent variable is designed to give increasing deviation from natural visual scenes, and is a measure of the Fourier composition of the image (its second-order statistics). Performance in these tests was best when the pictures had natural second-order spatial statistics, and degraded when the images were made less natural. Furthermore, performance can be explained with a simple model of contrast coding, based upon the properties of simple cells in the mammalian visual cortex. The findings thus provide direct empirical support for the notion that human spatial vision is optimised to the second-order statistics of the optical environment.  相似文献   

18.
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.  相似文献   

19.
In this paper, we focus on animal object detection and species classification in camera-trap images collected in highly cluttered natural scenes. Using a deep neural network (DNN) model training for animal- background image classification, we analyze the input camera-trap images to generate a multi-level visual representation of the input image. We detect semantic regions of interest for animals from this representation using k-mean clustering and graph cut in the DNN feature domain. These animal regions are then classified into animal species using multi-class deep neural network model. According the experimental results, our method achieves 99.75% accuracy for classifying animals and background and 90.89% accuracy for classifying 26 animal species on the Snapshot Serengeti dataset, outperforming existing image classification methods.  相似文献   

20.
We present a computational study of the formation of simple-cell receptive field patterns in the primary visual cortex. Based on the observation that the spatial frequency of the retinal filter increases postnatally, our results explain differences in the time course of the development of orientation selectivity in binocularly deprived and normally reared kittens. Development after eye-opening in normal animals is modelled by training with natural images, whereas in the case of binocular deprivation noise-like stimulation continues. Further, it is shown that different orientation selectivities are obtained for network models trained with natural images in contrast to random phase images of identical second order statistics. The latter finding suggests that higher-order statistics of the inputs influences development of primary visual cortex. Finally, we search for quantities that identify possible signatures of natural image statistics in order to specify the amount of constructiveness that visual experience has on the formation of receptive fields.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号