首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
In recent years, there has been considerable interest in visual attention models (saliency map of visual attention). These models can be used to predict eye fixation locations, and thus will have many applications in various fields which leads to obtain better performance in machine vision systems. Most of these models need to be improved because they are based on bottom-up computation that does not consider top-down image semantic contents and often does not match actual eye fixation locations. In this study, we recorded the eye movements (i.e., fixations) of fourteen individuals who viewed images which consist natural (e.g., landscape, animal) and man-made (e.g., building, vehicles) scenes. We extracted the fixation locations of eye movements in two image categories. After extraction of the fixation areas (a patch around each fixation location), characteristics of these areas were evaluated as compared to non-fixation areas. The extracted features in each patch included the orientation and spatial frequency. After feature extraction phase, different statistical classifiers were trained for prediction of eye fixation locations by these features. This study connects eye-tracking results to automatic prediction of saliency regions of the images. The results showed that it is possible to predict the eye fixation locations by using of the image patches around subjects’ fixation points.  相似文献   

2.
Attention is intrinsic to our perceptual representations of sensory inputs. Best characterized in the visual domain, it is typically depicted as a spotlight moving over a saliency map that topographically encodes strengths of visual features and feedback modulations over the visual scene. By introducing smells to two well-established attentional paradigms, the dot-probe and the visual-search paradigms, we find that a smell reflexively directs attention to the congruent visual image and facilitates visual search of that image without the mediation of visual imagery. Furthermore, such effect is independent of, and can override, top-down bias. We thus propose that smell quality acts as an object feature whose presence enhances the perceptual saliency of that object, thereby guiding the spotlight of visual attention. Our discoveries provide robust empirical evidence for a multimodal saliency map that weighs not only visual but also olfactory inputs.  相似文献   

3.
In this study we investigated visual attention properties of freely behaving barn owls, using a miniature wireless camera attached to their heads. The tubular eye structure of barn owls makes them ideal subjects for this research since it limits their eye movements. Video sequences recorded from the owl’s point of view capture part of the visual scene as seen by the owl. Automated analysis of video sequences revealed that during an active search task, owls repeatedly and consistently direct their gaze in a way that brings objects of interest to a specific retinal location (retinal fixation area). Using a projective model that captures the geometry between the eye and the camera, we recovered the corresponding location in the recorded images (image fixation area). Recording in various types of environments (aviary, office, outdoors) revealed significant statistical differences of low level image properties at the image fixation area compared to values extracted at random image patches. These differences are in agreement with results obtained in primates in similar studies. To investigate the role of saliency and its contribution to drawing the owl’s attention, we used a popular bottom-up computational model. Saliency values at the image fixation area were typically greater than at random patches, yet were only 20% out of the maximal saliency value, suggesting a top-down modulation of gaze control.  相似文献   

4.
An important requirement for vision is to identify interesting and relevant regions of the environment for further processing. Some models assume that salient locations from a visual scene are encoded in a dedicated spatial saliency map [1, 2]. Then, a winner-take-all (WTA) mechanism [1, 2] is often believed to threshold the graded saliency representation and identify the most salient position in the visual field. Here we aimed to assess whether neural representations of graded saliency and the subsequent WTA mechanism can be dissociated. We presented images of natural scenes while subjects were in a scanner performing a demanding fixation task, and thus their attention was directed away. Signals in early visual cortex and posterior intraparietal sulcus (IPS) correlated with graded saliency as defined by a computational saliency model. Multivariate pattern classification [3, 4] revealed that the most salient position in the visual field was encoded in anterior IPS and frontal eye fields (FEF), thus reflecting a potential WTA stage. Our results thus confirm that graded saliency and WTA-thresholded saliency are encoded in distinct neural structures. This could provide the neural representation required for rapid and automatic orientation toward salient events in natural environments.  相似文献   

5.
Xu J  Yang Z  Tsien JZ 《PloS one》2010,5(12):e15796
Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.  相似文献   

6.
Saliency detection is widely used in many visual applications like image segmentation, object recognition and classification. In this paper, we will introduce a new method to detect salient objects in natural images. The approach is based on a regional principal color contrast modal, which incorporates low-level and medium-level visual cues. The method allows a simple computation of color features and two categories of spatial relationships to a saliency map, achieving higher F-measure rates. At the same time, we present an interpolation approach to evaluate resulting curves, and analyze parameters selection. Our method enables the effective computation of arbitrary resolution images. Experimental results on a saliency database show that our approach produces high quality saliency maps and performs favorably against ten saliency detection algorithms.  相似文献   

7.
Perceptual tasks such as edge detection, image segmentation, lightness computation and estimation of three-dimensional structure are considered to be low-level or mid-level vision problems and are traditionally approached in a bottom–up, generic and hard-wired way. An alternative to this would be to take a top–down, object-class-specific and example-based approach. In this paper, we present a simple computational model implementing the latter approach. The results generated by our model when tested on edge-detection and view-prediction tasks for three-dimensional objects are consistent with human perceptual expectations. The model's performance is highly tolerant to the problems of sensor noise and incomplete input image information. Results obtained with conventional bottom–up strategies show much less immunity to these problems. We interpret the encouraging performance of our computational model as evidence in support of the hypothesis that the human visual system may learn to perform supposedly low-level perceptual tasks in a top–down fashion.  相似文献   

8.
Multimedia analysis benefits from understanding the emotional content of a scene in a variety of tasks such as video genre classification and content-based image retrieval. Recently, there has been an increasing interest in applying human bio-signals, particularly eye movements, to recognize the emotional gist of a scene such as its valence. In order to determine the emotional category of images using eye movements, the existing methods often learn a classifier using several features that are extracted from eye movements. Although it has been shown that eye movement is potentially useful for recognition of scene valence, the contribution of each feature is not well-studied. To address the issue, we study the contribution of features extracted from eye movements in the classification of images into pleasant, neutral, and unpleasant categories. We assess ten features and their fusion. The features are histogram of saccade orientation, histogram of saccade slope, histogram of saccade length, histogram of saccade duration, histogram of saccade velocity, histogram of fixation duration, fixation histogram, top-ten salient coordinates, and saliency map. We utilize machine learning approach to analyze the performance of features by learning a support vector machine and exploiting various feature fusion schemes. The experiments reveal that ‘saliency map’, ‘fixation histogram’, ‘histogram of fixation duration’, and ‘histogram of saccade slope’ are the most contributing features. The selected features signify the influence of fixation information and angular behavior of eye movements in the recognition of the valence of images.  相似文献   

9.
A simple instance of parallel computation in neural networks occurs when the eye orients to a novel visual target. Consideration of target-elicited saccadic eye movements opens the question of how spatial position is represented in the visual pathways involved in this response. It is argued that a point-for-point retinotopic coding of spatial position (the 'local sign' approach) is inadequate to account for the characteristics of the response. An alternative approach based on distributed coding is developed.  相似文献   

10.
Visual illusions are valuable tools for the scientific examination of the mechanisms underlying perception. In the peripheral drift illusion special drift patterns appear to move although they are static. During fixation small involuntary eye movements generate retinal image slips which need to be suppressed for stable perception. Here we show that the peripheral drift illusion reveals the mechanisms of perceptual stabilization associated with these micromovements. In a series of experiments we found that illusory motion was only observed in the peripheral visual field. The strength of illusory motion varied with the degree of micromovements. However, drift patterns presented in the central (but not the peripheral) visual field modulated the strength of illusory peripheral motion. Moreover, although central drift patterns were not perceived as moving, they elicited illusory motion of neutral peripheral patterns. Central drift patterns modulated illusory peripheral motion even when micromovements remained constant. Interestingly, perceptual stabilization was only affected by static drift patterns, but not by real motion signals. Our findings suggest that perceptual instabilities caused by fixational eye movements are corrected by a mechanism that relies on visual rather than extraretinal (proprioceptive or motor) signals, and that drift patterns systematically bias this compensatory mechanism. These mechanisms may be revealed by utilizing static visual patterns that give rise to the peripheral drift illusion, but remain undetected with other patterns. Accordingly, the peripheral drift illusion is of unique value for examining processes of perceptual stabilization.  相似文献   

11.
The question of whether perceptual illusions influence eye movements is critical for the long-standing debate regarding the separation between action and perception. To test the role of auditory context on a visual illusion and on eye movements, we took advantage of the fact that the presence of an auditory cue can successfully modulate illusory motion perception of an otherwise static flickering object (sound-induced visual motion effect). We found that illusory motion perception modulated by an auditory context consistently affected saccadic eye movements. Specifically, the landing positions of saccades performed towards flickering static bars in the periphery were biased in the direction of illusory motion. Moreover, the magnitude of this bias was strongly correlated with the effect size of the perceptual illusion. These results show that both an audio-visual and a purely visual illusion can significantly affect visuo-motor behavior. Our findings are consistent with arguments for a tight link between perception and action in localization tasks.  相似文献   

12.
Saccadic target selection as a function of time   总被引:2,自引:0,他引:2  
Recent evidence indicates that stimulus-driven and goal-directed control of visual selection operate independently and in different time windows (van Zoest et al., 2004). The present study further investigates how eye movements are affected by stimulus-driven and goal-directed control. Observers were presented with search displays consisting of one target, multiple non-targets and one distractor element. The task of observers was to make a fast eye movement to a target immediately following the offset of a central fixation point, an event that either co-occurred with or soon followed the presentation of the search display. Distractor saliency and target-distractor similarity were independently manipulated. The results demonstrated that the effect of distractor saliency was transient and only present for the fastest eye movements, whereas the effect of target-distractor similarity was sustained and present in all but the fastest eye movements. The results support an independent timing account of visual selection.  相似文献   

13.
During free-viewing of natural scenes, eye movements are guided by bottom-up factors inherent to the stimulus, as well as top-down factors inherent to the observer. The question of how these two different sources of information interact and contribute to fixation behavior has recently received a lot of attention. Here, a battery of 15 visual stimulus features was used to quantify the contribution of stimulus properties during free-viewing of 4 different categories of images (Natural, Urban, Fractal and Pink Noise). Behaviorally relevant information was estimated in the form of topographical interestingness maps by asking an independent set of subjects to click at image regions that they subjectively found most interesting. Using a Bayesian scheme, we computed saliency functions that described the probability of a given feature to be fixated. In the case of stimulus features, the precise shape of the saliency functions was strongly dependent upon image category and overall the saliency associated with these features was generally weak. When testing multiple features jointly, a linear additive integration model of individual saliencies performed satisfactorily. We found that the saliency associated with interesting locations was much higher than any low-level image feature and any pair-wise combination thereof. Furthermore, the low-level image features were found to be maximally salient at those locations that had already high interestingness ratings. Temporal analysis showed that regions with high interestingness ratings were fixated as early as the third fixation following stimulus onset. Paralleling these findings, fixation durations were found to be dependent mainly on interestingness ratings and to a lesser extent on the low-level image features. Our results suggest that both low- and high-level sources of information play a significant role during exploration of complex scenes with behaviorally relevant information being more effective compared to stimulus features.  相似文献   

14.
Scene content selected by active vision   总被引:5,自引:0,他引:5  
The primate visual system actively selects visual information from the environment for detailed processing through mechanisms of visual attention and saccadic eye movements. This study examines the statistical properties of the scene content selected by active vision. Eye movements were recorded while participants free-viewed digitized images of natural and artificial scenes. Fixation locations were determined for each image and image patches were extracted around the observed fixation locations. Measures of local contrast, local spatial correlation and spatial frequency content were calculated on the extracted image patches. Replicating previous results, local contrast was found to be greater at the points of fixation when compared to either the contrast for image patches extracted at random locations or at the observed fixation locations using an image-shuffled database. Contrary to some results and in agreement with other results in the literature, a significant decorrelation of image intensity is observed between the locations of fixation and other neighboring locations. A discussion and analysis of methodological techniques is given that provides an explanation for the discrepancy in results. The results of our analyses indicate that both the local contrast and correlation at the points of fixation are a function of image type and, furthermore, that the magnitude of these effects depend on the levels of contrast and correlation present overall in the images. Finally, the largest effect sizes in local contrast and correlation are found at distances of approximately 1 deg of visual angle, which agrees well with measures of optimal spatial scale selectivity in the visual periphery where visual information for potential saccade targets is processed.  相似文献   

15.
The automatic computerized detection of regions of interest (ROI) is an important step in the process of medical image processing and analysis. The reasons are many, and include an increasing amount of available medical imaging data, existence of inter-observer and inter-scanner variability, and to improve the accuracy in automatic detection in order to assist doctors in diagnosing faster and on time. A novel algorithm, based on visual saliency, is developed here for the identification of tumor regions from MR images of the brain. The GBM saliency detection model is designed by taking cue from the concept of visual saliency in natural scenes. A visually salient region is typically rare in an image, and contains highly discriminating information, with attention getting immediately focused upon it. Although color is typically considered as the most important feature in a bottom-up saliency detection model, we circumvent this issue in the inherently gray scale MR framework. We develop a novel pseudo-coloring scheme, based on the three MRI sequences, viz. FLAIR, T2 and T1C (contrast enhanced with Gadolinium). A bottom-up strategy, based on a new pseudo-color distance and spatial distance between image patches, is defined for highlighting the salient regions in the image. This multi-channel representation of the image and saliency detection model help in automatically and quickly isolating the tumor region, for subsequent delineation, as is necessary in medical diagnosis. The effectiveness of the proposed model is evaluated on MRI of 80 subjects from the BRATS database in terms of the saliency map values. Using ground truth of the tumor regions for both high- and low- grade gliomas, the results are compared with four highly referred saliency detection models from literature. In all cases the AUC scores from the ROC analysis are found to be more than 0.999 ± 0.001 over different tumor grades, sizes and positions.  相似文献   

16.
During steady fixation, observers make small fixational saccades at a rate of around 1–2 per second. Presentation of a visual stimulus triggers a biphasic modulation in fixational saccade rate—an initial inhibition followed by a period of elevated rate and a subsequent return to baseline. Here we show that, during passive viewing, this rate signature is highly sensitive to small changes in stimulus contrast. By training a linear support vector machine to classify trials in which a stimulus is either present or absent, we directly compared the contrast sensitivity of fixational eye movements with individuals'' psychophysical judgements. Classification accuracy closely matched psychophysical performance, and predicted individuals'' threshold estimates with less bias and overall error than those obtained using specific features of the signature. Performance of the classifier was robust to changes in the training set (novel subjects and/or contrasts) and good prediction accuracy was obtained with a practicable number of trials. Our results indicate a tight coupling between the sensitivity of visual perceptual judgements and fixational eye control mechanisms. This raises the possibility that fixational saccades could provide a novel and objective means of estimating visual contrast sensitivity without the need for observers to make any explicit judgement.  相似文献   

17.
Training has been shown to improve perceptual performance on limited sets of stimuli. However, whether training can generally improve top-down biasing of visual search in a target-nonspecific manner remains unknown. We trained subjects over ten days on a visual search task, challenging them with a novel target (top-down goal) on every trial, while bottom-up uncertainty (distribution of distractors) remained constant. We analyzed the changes in saccade statistics and visual behavior over the course of training by recording eye movements as subjects performed the task. Subjects became experts at this task, with twofold increased performance, decreased fixation duration, and stronger tendency to guide gaze toward items with color and spatial frequency (but not necessarily orientation) that resembled the target, suggesting improved general top-down biasing of search.  相似文献   

18.
Localization of objects and events in the environment is critical for survival, as many perceptual and motor tasks rely on estimation of spatial location. Therefore, it seems reasonable to assume that spatial localizations should generally be accurate. Curiously, some previous studies have reported biases in visual and auditory localizations, but these studies have used small sample sizes and the results have been mixed. Therefore, it is not clear (1) if the reported biases in localization responses are real (or due to outliers, sampling bias, or other factors), and (2) whether these putative biases reflect a bias in sensory representations of space or a priori expectations (which may be due to the experimental setup, instructions, or distribution of stimuli). Here, to address these questions, a dataset of unprecedented size (obtained from 384 observers) was analyzed to examine presence, direction, and magnitude of sensory biases, and quantitative computational modeling was used to probe the underlying mechanism(s) driving these effects. Data revealed that, on average, observers were biased towards the center when localizing visual stimuli, and biased towards the periphery when localizing auditory stimuli. Moreover, quantitative analysis using a Bayesian Causal Inference framework suggests that while pre-existing spatial biases for central locations exert some influence, biases in the sensory representations of both visual and auditory space are necessary to fully explain the behavioral data. How are these opposing visual and auditory biases reconciled in conditions in which both auditory and visual stimuli are produced by a single event? Potentially, the bias in one modality could dominate, or the biases could interact/cancel out. The data revealed that when integration occurred in these conditions, the visual bias dominated, but the magnitude of this bias was reduced compared to unisensory conditions. Therefore, multisensory integration not only improves the precision of perceptual estimates, but also the accuracy.  相似文献   

19.
Covert spatial attention produces biases in perceptual and neural responses in the absence of overt orienting movements. The neural mechanism that gives rise to these effects is poorly understood. Here we report the relation between fixational eye movements, namely eye vergence, and covert attention. Visual stimuli modulate the angle of eye vergence as a function of their ability to capture attention. This illustrates the relation between eye vergence and bottom-up attention. In visual and auditory cue/no-cue paradigms, the angle of vergence is greater in the cue condition than in the no-cue condition. This shows a top-down attention component. In conclusion, observations reveal a close link between covert attention and modulation in eye vergence during eye fixation. Our study suggests a basis for the use of eye vergence as a tool for measuring attention and may provide new insights into attention and perceptual disorders.  相似文献   

20.
Computational modelling of visual attention   总被引:3,自引:0,他引:3  
Five important trends have emerged from recent work on computational models of focal visual attention that emphasize the bottom-up, image-based control of attentional deployment. First, the perceptual saliency of stimuli critically depends on the surrounding context. Second, a unique 'saliency map' that topographically encodes for stimulus conspicuity over the visual scene has proved to be an efficient and plausible bottom-up control strategy. Third, inhibition of return, the process by which the currently attended location is prevented from being attended again, is a crucial element of attentional deployment. Fourth, attention and eye movements tightly interplay, posing computational challenges with respect to the coordinate system used to control attention. And last, scene understanding and object recognition strongly constrain the selection of attended locations. Insights from these five key areas provide a framework for a computational and neurobiological understanding of visual attention.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号