首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Scene content selected by active vision   总被引:5,自引:0,他引:5  
The primate visual system actively selects visual information from the environment for detailed processing through mechanisms of visual attention and saccadic eye movements. This study examines the statistical properties of the scene content selected by active vision. Eye movements were recorded while participants free-viewed digitized images of natural and artificial scenes. Fixation locations were determined for each image and image patches were extracted around the observed fixation locations. Measures of local contrast, local spatial correlation and spatial frequency content were calculated on the extracted image patches. Replicating previous results, local contrast was found to be greater at the points of fixation when compared to either the contrast for image patches extracted at random locations or at the observed fixation locations using an image-shuffled database. Contrary to some results and in agreement with other results in the literature, a significant decorrelation of image intensity is observed between the locations of fixation and other neighboring locations. A discussion and analysis of methodological techniques is given that provides an explanation for the discrepancy in results. The results of our analyses indicate that both the local contrast and correlation at the points of fixation are a function of image type and, furthermore, that the magnitude of these effects depend on the levels of contrast and correlation present overall in the images. Finally, the largest effect sizes in local contrast and correlation are found at distances of approximately 1 deg of visual angle, which agrees well with measures of optimal spatial scale selectivity in the visual periphery where visual information for potential saccade targets is processed.  相似文献   

2.
In this study we investigated visual attention properties of freely behaving barn owls, using a miniature wireless camera attached to their heads. The tubular eye structure of barn owls makes them ideal subjects for this research since it limits their eye movements. Video sequences recorded from the owl’s point of view capture part of the visual scene as seen by the owl. Automated analysis of video sequences revealed that during an active search task, owls repeatedly and consistently direct their gaze in a way that brings objects of interest to a specific retinal location (retinal fixation area). Using a projective model that captures the geometry between the eye and the camera, we recovered the corresponding location in the recorded images (image fixation area). Recording in various types of environments (aviary, office, outdoors) revealed significant statistical differences of low level image properties at the image fixation area compared to values extracted at random image patches. These differences are in agreement with results obtained in primates in similar studies. To investigate the role of saliency and its contribution to drawing the owl’s attention, we used a popular bottom-up computational model. Saliency values at the image fixation area were typically greater than at random patches, yet were only 20% out of the maximal saliency value, suggesting a top-down modulation of gaze control.  相似文献   

3.
During free-viewing of natural scenes, eye movements are guided by bottom-up factors inherent to the stimulus, as well as top-down factors inherent to the observer. The question of how these two different sources of information interact and contribute to fixation behavior has recently received a lot of attention. Here, a battery of 15 visual stimulus features was used to quantify the contribution of stimulus properties during free-viewing of 4 different categories of images (Natural, Urban, Fractal and Pink Noise). Behaviorally relevant information was estimated in the form of topographical interestingness maps by asking an independent set of subjects to click at image regions that they subjectively found most interesting. Using a Bayesian scheme, we computed saliency functions that described the probability of a given feature to be fixated. In the case of stimulus features, the precise shape of the saliency functions was strongly dependent upon image category and overall the saliency associated with these features was generally weak. When testing multiple features jointly, a linear additive integration model of individual saliencies performed satisfactorily. We found that the saliency associated with interesting locations was much higher than any low-level image feature and any pair-wise combination thereof. Furthermore, the low-level image features were found to be maximally salient at those locations that had already high interestingness ratings. Temporal analysis showed that regions with high interestingness ratings were fixated as early as the third fixation following stimulus onset. Paralleling these findings, fixation durations were found to be dependent mainly on interestingness ratings and to a lesser extent on the low-level image features. Our results suggest that both low- and high-level sources of information play a significant role during exploration of complex scenes with behaviorally relevant information being more effective compared to stimulus features.  相似文献   

4.
Multimedia analysis benefits from understanding the emotional content of a scene in a variety of tasks such as video genre classification and content-based image retrieval. Recently, there has been an increasing interest in applying human bio-signals, particularly eye movements, to recognize the emotional gist of a scene such as its valence. In order to determine the emotional category of images using eye movements, the existing methods often learn a classifier using several features that are extracted from eye movements. Although it has been shown that eye movement is potentially useful for recognition of scene valence, the contribution of each feature is not well-studied. To address the issue, we study the contribution of features extracted from eye movements in the classification of images into pleasant, neutral, and unpleasant categories. We assess ten features and their fusion. The features are histogram of saccade orientation, histogram of saccade slope, histogram of saccade length, histogram of saccade duration, histogram of saccade velocity, histogram of fixation duration, fixation histogram, top-ten salient coordinates, and saliency map. We utilize machine learning approach to analyze the performance of features by learning a support vector machine and exploiting various feature fusion schemes. The experiments reveal that ‘saliency map’, ‘fixation histogram’, ‘histogram of fixation duration’, and ‘histogram of saccade slope’ are the most contributing features. The selected features signify the influence of fixation information and angular behavior of eye movements in the recognition of the valence of images.  相似文献   

5.
Saliency maps produced by different algorithms are often evaluated by comparing output to fixated image locations appearing in human eye tracking data. There are challenges in evaluation based on fixation data due to bias in the data. Properties of eye movement patterns that are independent of image content may limit the validity of evaluation results, including spatial bias in fixation data. To address this problem, we present modeling and evaluation results for data derived from different perceptual tasks related to the concept of saliency. We also present a novel approach to benchmarking to deal with some of the challenges posed by spatial bias. The results presented establish the value of alternatives to fixation data to drive improvement and development of models. We also demonstrate an approach to approximate the output of alternative perceptual tasks based on computational saliency and/or eye gaze data. As a whole, this work presents novel benchmarking results and methods, establishes a new performance baseline for perceptual tasks that provide an alternative window into visual saliency, and demonstrates the capacity for saliency to serve in approximating human behaviour for one visual task given data from another.  相似文献   

6.
视觉图像辨认眼动中的Top-down信息处理   总被引:2,自引:0,他引:2  
在视觉图像辨认过程中,眼球不是均匀地扫描全幅图像,而是通过一系列快速的眼球跳动来改变注视点位置,有选择地通过注视停顿来采集图象中的关键信息。通过实验对不同图像刺激时的眼动轨迹进行记录与分析,发现:(1)对于简单的几何图形,眼动注视停顿主要集中在图像中几何特征之处,亦即与周围不同的奇异点上;(2)对复杂图象刺激,眼动注视点位置决定于受试者的已有概念模型及其兴趣所在;(3)对中文单字进行辩认时,其眼动模式也是取决于受试者对该单字的知识(也即概念模型)。以上结果提示,视觉图象辨认主要是通过自上而下(top-down)的信息处理方式才完成.由中枢控制眼球运动,将注视点落到中枢决定的图形奇点上来,通过注视停顿对中枢认为的关键信息之处进行抽提,以实现辨认。这种处理方式不是只取决于输入的图像信息,也不必对目标图像的每个象素进行处理,而只需对图象中少量的关键信息部位进行重点的检测和处理,从而提高了图象信息处理的能力及效率。  相似文献   

7.
视觉图像辨认眼动中的Top-down信息处理   总被引:2,自引:0,他引:2  
在视觉图像辨认过程中,眼球不是均匀地扫描全幅图像,而是通过一系列快速的眼球跳动来改变注视点位置,有选择地通过注视停顿来采集图象中的关键信息。通过实验对不同图像刺激时的眼动轨迹进行记录与分析,发现:(1)对于简单的几何图形,眼动注视停顿主要集中在图像中几何特征之处,亦即与周围不同的奇异点上;(2)对复杂图象刺激,眼动注视点位置决定于受试者的已有概念模型及其兴趣所在;(3)对中文单字进行辩认时,其眼动模式也是取决于受试者对该单字的知识(也即概念模型)。以上结果提示,视觉图象辨认主要是通过自上而下(top-down)的信息处理方式才完成.由中枢控制眼球运动,将注视点落到中枢决定的图形奇点上来,通过注视停顿对中枢认为的关键信息之处进行抽提,以实现辨认。这种处理方式不是只取决于输入的图像信息,也不必对目标图像的每个象素进行处理,而只需对图象中少量的关键信息部位进行重点的检测和处理,从而提高了图象信息处理的能力及效率。  相似文献   

8.
An important requirement for vision is to identify interesting and relevant regions of the environment for further processing. Some models assume that salient locations from a visual scene are encoded in a dedicated spatial saliency map [1, 2]. Then, a winner-take-all (WTA) mechanism [1, 2] is often believed to threshold the graded saliency representation and identify the most salient position in the visual field. Here we aimed to assess whether neural representations of graded saliency and the subsequent WTA mechanism can be dissociated. We presented images of natural scenes while subjects were in a scanner performing a demanding fixation task, and thus their attention was directed away. Signals in early visual cortex and posterior intraparietal sulcus (IPS) correlated with graded saliency as defined by a computational saliency model. Multivariate pattern classification [3, 4] revealed that the most salient position in the visual field was encoded in anterior IPS and frontal eye fields (FEF), thus reflecting a potential WTA stage. Our results thus confirm that graded saliency and WTA-thresholded saliency are encoded in distinct neural structures. This could provide the neural representation required for rapid and automatic orientation toward salient events in natural environments.  相似文献   

9.
Airport detection in remote sensing images: a method based on saliency map   总被引:1,自引:0,他引:1  
The detection of airport attracts lots of attention and becomes a hot topic recently because of its applications and importance in military and civil aviation fields. However, the complicated background around airports brings much difficulty into the detection. This paper presents a new method for airport detection in remote sensing images. Distinct from other methods which analyze images pixel by pixel, we introduce visual attention mechanism into detection of airport and improve the efficiency of detection greatly. Firstly, Hough transform is used to judge whether an airport exists in an image. Then an improved graph-based visual saliency model is applied to compute the saliency map and extract regions of interest (ROIs). The airport target is finally detected according to the scale-invariant feature transform features which are extracted from each ROI and classified by hierarchical discriminant regression tree. Experimental results show that the proposed method is faster and more accurate than existing methods, and has lower false alarm rate and better anti-noise performance simultaneously.  相似文献   

10.
Computational modelling of visual attention   总被引:3,自引:0,他引:3  
Five important trends have emerged from recent work on computational models of focal visual attention that emphasize the bottom-up, image-based control of attentional deployment. First, the perceptual saliency of stimuli critically depends on the surrounding context. Second, a unique 'saliency map' that topographically encodes for stimulus conspicuity over the visual scene has proved to be an efficient and plausible bottom-up control strategy. Third, inhibition of return, the process by which the currently attended location is prevented from being attended again, is a crucial element of attentional deployment. Fourth, attention and eye movements tightly interplay, posing computational challenges with respect to the coordinate system used to control attention. And last, scene understanding and object recognition strongly constrain the selection of attended locations. Insights from these five key areas provide a framework for a computational and neurobiological understanding of visual attention.  相似文献   

11.
Saccadic target selection as a function of time   总被引:2,自引:0,他引:2  
Recent evidence indicates that stimulus-driven and goal-directed control of visual selection operate independently and in different time windows (van Zoest et al., 2004). The present study further investigates how eye movements are affected by stimulus-driven and goal-directed control. Observers were presented with search displays consisting of one target, multiple non-targets and one distractor element. The task of observers was to make a fast eye movement to a target immediately following the offset of a central fixation point, an event that either co-occurred with or soon followed the presentation of the search display. Distractor saliency and target-distractor similarity were independently manipulated. The results demonstrated that the effect of distractor saliency was transient and only present for the fastest eye movements, whereas the effect of target-distractor similarity was sustained and present in all but the fastest eye movements. The results support an independent timing account of visual selection.  相似文献   

12.
In this paper we propose a computational model of bottom–up visual attention based on a pulsed principal component analysis (PCA) transform, which simply exploits the signs of the PCA coefficients to generate spatial and motional saliency. We further extend the pulsed PCA transform to a pulsed cosine transform that is not only data-independent but also very fast in computation. The proposed model has the following biological plausibilities. First, the PCA projection vectors in the model can be obtained by using the Hebbian rule in neural networks. Second, the outputs of the pulsed PCA transform, which are inherently binary, simulate the neuronal pulses in the human brain. Third, like many Fourier transform-based approaches, our model also accomplishes the cortical center-surround suppression in frequency domain. Experimental results on psychophysical patterns and natural images show that the proposed model is more effective in saliency detection and predict human eye fixations better than the state-of-the-art attention models.  相似文献   

13.
Among the various possible criteria guiding eye movement selection, we investigate the role of position uncertainty in the peripheral visual field. In particular, we suggest that, in everyday life situations of object tracking, eye movement selection probably includes a principle of reduction of uncertainty. To evaluate this hypothesis, we confront the movement predictions of computational models with human results from a psychophysical task. This task is a freely moving eye version of the multiple object tracking task, where the eye movements may be used to compensate for low peripheral resolution. We design several Bayesian models of eye movement selection with increasing complexity, whose layered structures are inspired by the neurobiology of the brain areas implied in this process. Finally, we compare the relative performances of these models with regard to the prediction of the recorded human movements, and show the advantage of taking explicitly into account uncertainty for the prediction of eye movements.  相似文献   

14.
Interacting in the peripersonal space requires coordinated arm and eye movements to visual targets in depth. In primates, the medial posterior parietal cortex (PPC) represents a crucial node in the process of visual-to-motor signal transformations. The medial PPC area V6A is a key region engaged in the control of these processes because it jointly processes visual information, eye position and arm movement related signals. However, to date, there is no evidence in the medial PPC of spatial encoding in three dimensions. Here, using single neuron recordings in behaving macaques, we studied the neural signals related to binocular eye position in a task that required the monkeys to perform saccades and fixate targets at different locations in peripersonal and extrapersonal space. A significant proportion of neurons were modulated by both gaze direction and depth, i.e., by the location of the foveated target in 3D space. The population activity of these neurons displayed a strong preference for peripersonal space in a time interval around the saccade that preceded fixation and during fixation as well. This preference for targets within reaching distance during both target capturing and fixation suggests that binocular eye position signals are implemented functionally in V6A to support its role in reaching and grasping.  相似文献   

15.
Visual saliency is a fundamental yet hard to define property of objects or locations in the visual world. In a context where objects and their representations compete to dominate our perception, saliency can be thought of as the "juice" that makes objects win the race. It is often assumed that saliency is extracted and represented in an explicit saliency map, which serves to determine the location of spatial attention at any given time. It is then by drawing attention to a salient object that it can be recognized or categorized. I argue against this classical view that visual "bottom-up" saliency automatically recruits the attentional system prior to object recognition. A number of visual processing tasks are clearly performed too fast for such a costly strategy to be employed. Rather, visual attention could simply act by biasing a saliency-based object recognition system. Under natural conditions of stimulation, saliency can be represented implicitly throughout the ventral visual pathway, independent of any explicit saliency map. At any given level, the most activated cells of the neural population simply represent the most salient locations. The notion of saliency itself grows increasingly complex throughout the system, mostly based on luminance contrast until information reaches visual cortex, gradually incorporating information about features such as orientation or color in primary visual cortex and early extrastriate areas, and finally the identity and behavioral relevance of objects in temporal cortex and beyond. Under these conditions the object that dominates perception, i.e. the object yielding the strongest (or the first) selective neural response, is by definition the one whose features are most "salient"--without the need for any external saliency map. In addition, I suggest that such an implicit representation of saliency can be best encoded in the relative times of the first spikes fired in a given neuronal population. In accordance with our subjective experience that saliency and attention do not modify the appearance of objects, the feed-forward propagation of this first spike wave could serve to trigger saliency-based object recognition outside the realm of awareness, while conscious perceptions could be mediated by the remaining discharges of longer neuronal spike trains.  相似文献   

16.
A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1) select more informative image locations upon which to fixate their eyes, or 2) extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.  相似文献   

17.
微眼动是视觉注视过程中幅度最大、速度最快的眼动,可以消除由于神经系统适应性而产生的视觉衰退现象,在视觉信息处理过程中发挥着重要作用.基于微眼动与视觉感知功能的相关性,设计实验研究猕猴完成显性、隐性注意任务以及不同难度显性注意任务时,视觉注视情况下微眼动的差异.通过对不同难度显性注意任务下微眼动的参数进行比较,发现随着任务难度的增加,微眼动的幅度、速率和频率都被抑制.另一方面,对比不同类型的视觉感知任务(显性注意和隐性注意),发现在相似的实验范式下,隐性注意对微眼动的频率有明显的抑制作用,但幅度和频率没有得到一致的结果,这表明视觉注意任务类型的不同或将导致猕猴完成任务的策略不同.这些工作将为今后进一步研究微眼动产生的神经机制以及视觉注意过程中眼动的作用机制奠定良好的基础.  相似文献   

18.
The goal of the current study is to clarify the relationship between social information processing (e.g., visual attention to cues of hostility, hostility attribution bias, and facial expression emotion labeling) and aggressive tendencies. Thirty adults were recruited in the eye-tracking study that measured various components in social information processing. Baseline aggressive tendencies were measured using the Buss-Perry Aggression Questionnaire (AQ). Visual attention towards hostile objects was measured as the proportion of eye gaze fixation duration on cues of hostility. Hostility attribution bias was measured with the rating results for emotions of characters in the images. The results show that the eye gaze duration on hostile characters was significantly inversely correlated with the AQ score and less eye contact with an angry face. The eye gaze duration on hostile object was not significantly associated with hostility attribution bias, although hostility attribution bias was significantly positively associated with the AQ score. Our findings suggest that eye gaze fixation time towards non-hostile cues may predict aggressive tendencies.  相似文献   

19.
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti''s visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti''s model and consists of two stages. In the first stage, Itti''s model is used to calculate the saliency map, and Otsu''s global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti''s model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti''s model in terms of captured scene texts.  相似文献   

20.
Visual attention: the where,what, how and why of saliency   总被引:6,自引:0,他引:6  
Attention influences the processing of visual information even in the earliest areas of primate visual cortex. There is converging evidence that the interaction of bottom-up sensory information and top-down attentional influences creates an integrated saliency map, that is, a topographic representation of relative stimulus strength and behavioral relevance across visual space. This map appears to be distributed across areas of the visual cortex, and is closely linked to the oculomotor system that controls eye movements and orients the gaze to locations in the visual scene characterized by a high salience.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号