首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti''s visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti''s model and consists of two stages. In the first stage, Itti''s model is used to calculate the saliency map, and Otsu''s global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti''s model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti''s model in terms of captured scene texts.  相似文献   

2.
Our ability to interact with the environment hinges on creating a stable visual world despite the continuous changes in retinal input. To achieve visual stability, the brain must distinguish the retinal image shifts caused by eye movements and shifts due to movements of the visual scene. This process appears not to be flawless: during saccades, we often fail to detect whether visual objects remain stable or move, which is called saccadic suppression of displacement (SSD). How does the brain evaluate the memorized information of the presaccadic scene and the actual visual feedback of the postsaccadic visual scene in the computations for visual stability? Using a SSD task, we test how participants localize the presaccadic position of the fixation target, the saccade target or a peripheral non-foveated target that was displaced parallel or orthogonal during a horizontal saccade, and subsequently viewed for three different durations. Results showed different localization errors of the three targets, depending on the viewing time of the postsaccadic stimulus and its spatial separation from the presaccadic location. We modeled the data through a Bayesian causal inference mechanism, in which at the trial level an optimal mixing of two possible strategies, integration vs. separation of the presaccadic memory and the postsaccadic sensory signals, is applied. Fits of this model generally outperformed other plausible decision strategies for producing SSD. Our findings suggest that humans exploit a Bayesian inference process with two causal structures to mediate visual stability.  相似文献   

3.
When subjects direct attention to a particular location in a visual scene, responses in the visual cortex to stimuli presented at that location are enhanced, and the suppressive influences of nearby distractors are reduced. What is the top-down signal that modulates the response to an attended versus an unattended stimulus? Here, we demonstrate increased activity related to attention in the absence of visual stimulation in extrastriate cortex when subjects covertly directed attention to a peripheral location expecting the onset of visual stimuli. Frontal and parietal areas showed a stronger signal increase during this expectation than did visual areas. The increased activity in visual cortex in the absence of visual stimulation may reflect a top-down bias of neural signals in favor of the attended location, which derives from a fronto-parietal network.  相似文献   

4.
This paper presents a novel object detection method using a single instance from the object category. Our method uses biologically inspired global scene context criteria to check whether every individual location of the image can be naturally replaced by the query instance, which indicates whether there is a similar object at this location. Different from the traditional detection methods that only look at individual locations for the desired objects, our method evaluates the consistency of the entire scene. It is therefore robust to large intra-class variations, occlusions, a minor variety of poses, low-revolution conditions, background clutter etc., and there is no off-line training. The experimental results on four datasets and two video sequences clearly show the superior robustness of the proposed method, suggesting that global scene context is important for visual detection/localization.  相似文献   

5.
We present toBeeView, a program that produces from a digital photograph, or a set of photographs, an approximation of the image formed at the sampling station stage in the eye of an animal. toBeeView is freely available from https://github.com/EEZA-CSIC/compound-eye-simulator . toBeeView assumes that sampling stations in the retina are distributed on a hexagonal grid. Each sampling station computes the weighted average of the color of the part of the visual scene projecting on its photoreceptors, and the hexagon of the output image associated with the sampling station is filled in this average color. Users can specify the visual angle subtended by the scene and the basic parameters determining the spatial resolution of the eye: photoreceptor spatial distribution and optic quality of the eye. The photoreceptor distribution is characterized by the vertical and horizontal interommatidial angles—which can vary along the retina. The optic quality depends on the section of the visual scene projecting onto each sampling station, determined by the acceptance angle. The output of toBeeView provides a first approximation to the amount of visual information available at the retina for subsequent processing, summarizing in an intuitive way the interaction between eye optics and receptor density. This tool can be used whenever it is important to determine the visual acuity of a species and will be particularly useful to study processes where object detection and identification is important, such as visual displays, camouflage, and mimicry.  相似文献   

6.
A procedure for selecting the better of two treatments which allows for the possibility of non-selection when the treatments appear to be equivalent (that is, similar) is presented. The proposed procedure is a modification of the indifference zone approach. It is assumed that the treatments are compared with respect to a continuous response variable, which has a normal or a two-parameter exponential distribution. For the normal distribution, each of the parameters is considered as the ranking parameter. For the two-parameter exponential distribution, the guarantee time (location parameter) is the ranking parameter. The values of the estimates of the ranking parameters and the observed distance between these estimates are used in this selection procedure.  相似文献   

7.
We present three experiments on horizon estimation. In Experiment 1 we verify the human ability to estimate the horizon in static images from only visual input. Estimates are given without time constraints with emphasis on precision. The resulting estimates are used as baseline to evaluate horizon estimates from early visual processes. Stimuli are presented for only ms and then masked to purge visual short-term memory and enforcing estimates to rely on early processes, only. The high agreement between estimates and the lack of a training effect shows that enough information about viewpoint is extracted in the first few hundred milliseconds to make accurate horizon estimation possible. In Experiment 3 we investigate several strategies to estimate the horizon in the computer and compare human with machine “behavior” for different image manipulations and image scene types.  相似文献   

8.
In this study we investigated visual attention properties of freely behaving barn owls, using a miniature wireless camera attached to their heads. The tubular eye structure of barn owls makes them ideal subjects for this research since it limits their eye movements. Video sequences recorded from the owl’s point of view capture part of the visual scene as seen by the owl. Automated analysis of video sequences revealed that during an active search task, owls repeatedly and consistently direct their gaze in a way that brings objects of interest to a specific retinal location (retinal fixation area). Using a projective model that captures the geometry between the eye and the camera, we recovered the corresponding location in the recorded images (image fixation area). Recording in various types of environments (aviary, office, outdoors) revealed significant statistical differences of low level image properties at the image fixation area compared to values extracted at random image patches. These differences are in agreement with results obtained in primates in similar studies. To investigate the role of saliency and its contribution to drawing the owl’s attention, we used a popular bottom-up computational model. Saliency values at the image fixation area were typically greater than at random patches, yet were only 20% out of the maximal saliency value, suggesting a top-down modulation of gaze control.  相似文献   

9.
How do we tell how many objects there are in a visual scene? A recent study has shown that the numerousness of objects is a 'primary visual property' of the scene, just like the objects' colour, shape or location.  相似文献   

10.
Models of the visual cortex are based on image decomposition according to the Fourier spectrum (amplitude and phase). On one hand, it is commonly believed that phase information is necessary to identify a scene. On the other hand, it is known that complex cells of the visual cortex, the most numerous ones, code only the amplitude spectrum. This raises the question of knowing if these cells carry sufficient information to allow visual scene categorization. In this work, using the same experiments in computer simulation and in psychophysics, we provide arguments to show that the amplitude spectrum alone is sufficient for categorization task.  相似文献   

11.
A general problem in learning is how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3–1.7 degrees, or 22–28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.  相似文献   

12.
When confronted with complex visual scenes in daily life, how do we know which visual information represents our own hand? We investigated the cues used to assign visual information to one''s own hand. Wrist tendon vibration elicits an illusory sensation of wrist movement. The intensity of this illusion attenuates when the actual motionless hand is visually presented. Testing what kind of visual stimuli attenuate this illusion will elucidate factors contributing to visual detection of one''s own hand. The illusion was reduced when a stationary object was shown, but only when participants knew it was controllable with their hands. In contrast, the visual image of their own hand attenuated the illusion even when participants knew that it was not controllable. We suggest that long-term knowledge about the appearance of the body and short-term knowledge about controllability of a visual object are combined to robustly extract our own body from a visual scene.  相似文献   

13.
Shading (variations of image intensity) provides an important cue for understanding the shape of three-dimensional surfaces from monocular views. On the other hand, texture (distribution of discontinuities on the surface) is a strong cue for recovering surface orientation by using monocular images. But given the image of an object or scene, what technique should we use to recover the shape of what is image? Resolution of shape from shading requires knowledge of the reflectance of the imaged surface and, usually, the fact that it is smooth (i.e. it shows no discontinuities). Determination of shape from texture requires knowledge of the distribution of surface markings (i.e. discontinuities). One might expect that one method would work when the other does not. I present a theory on how an active observer can determine shape from the image of an object or scene regardless of whether the image is shaded, textured, or both, and without any knowledge of reflectance maps or the distribution of surface markings. The approach is successful because the active observer is able to manipulate the constraints behind the perceptual phenomenon at hand and thus derive a simple solution. Several experimental results are presented with real and synthetic images.  相似文献   

14.
Insects can remember and return to a place of interest using the surrounding visual cues. In previous experiments, we showed that crickets could home to an invisible cool spot in a hot environment. They did so most effectively with a natural scene surround, though they were also able to home with distinct landmarks or blank walls. Homing was not successful, however, when visual cues were removed through a dark control. Here, we compare six different models of visual homing using the same visual environments. Only models deemed biologically plausible for use by insects were implemented. The average landmark vector model and first order differential optic flow are unable to home better than chance in at least one of the visual environments. Second order differential optic flow and GradDescent on image differences can home better than chance in all visual environments, and best in the natural scene environment, but do not quantitatively match the distributions of the cricket data. Two models—centre of mass average landmark vector and RunDown on image differences—could produce the same pattern of results as observed for crickets. Both the models performed best using simple binary images and were robust to changes in resolution and image smoothing.  相似文献   

15.
A multichannel model incorporating visual inhomogeneity is presented in this paper. The parameters that describe inhomogeneity have been experimentally obtained both at threshold and in several suprathreshold conditions. At threshold, probability summation is taken into account in order to determine the spatial extent of visual channels from experimental data showing an asymptotic increase in sensitivity with increasing grating area. At suprathreshold contrast, the region where luminance variations at several scales are visible has also been found. The results support a spatially limited multichannel model of early visual processing and set out a basis for studying perceptual phenomena from the viewpoint of linear space-variant visual processing.  相似文献   

16.
Recalling information from visual short-term memory (VSTM) involves the same neural mechanisms as attending to an actually perceived scene. In particular, retrieval from VSTM has been associated with orienting of visual attention towards a location within a spatially-organized memory representation. However, an open question concerns whether spatial attention is also recruited during VSTM retrieval even when performing the task does not require access to spatial coordinates of items in the memorized scene. The present study combined a visual search task with a modified, delayed central probe protocol, together with EEG analysis, to answer this question. We found a temporal contralateral negativity (TCN) elicited by a centrally presented go-signal which was spatially uninformative and featurally unrelated to the search target and informed participants only about a response key that they had to press to indicate a prepared target-present vs. -absent decision. This lateralization during VSTM retrieval (TCN) provides strong evidence of a shift of attention towards the target location in the memory representation, which occurred despite the fact that the present task required no spatial (or featural) information from the search to be encoded, maintained, and retrieved to produce the correct response and that the go-signal did not itself specify any information relating to the location and defining feature of the target.  相似文献   

17.
A theory of early motion processing in the human and primate visual system is presented which is based on the idea that spatio-temporal retinal image data is represented in primary visual cortex by a truncated 3D Taylor expansion that we refer to as a jet vector. This representation allows all the concepts of differential geometry to be applied to the analysis of visual information processing. We show in particular how the generalised Stokes theorem can be used to move from the calculation of derivatives of image brightness at a point to the calculation of image brightness differences on the boundary of a volume in space-time and how this can be generalised to apply to integrals of products of derivatives. We also provide novel interpretations of the roles of direction selective, bi-directional and pan-directional cells and of type I and type II cells in V5/MT.  相似文献   

18.
This paper presents a computational model to address one prominent psychological behavior of human beings to recognize images. The basic pursuit of our method can be concluded as that differences among multiple images help visual recognition. Generally speaking, we propose a statistical framework to distinguish what kind of image features capture sufficient category information and what kind of image features are common ones shared in multiple classes. Mathematically, the whole formulation is subject to a generative probabilistic model. Meanwhile, a discriminative functionality is incorporated into the model to interpret the differences among all kinds of images. The whole Bayesian formulation is solved in an Expectation-Maximization paradigm. After finding those discriminative patterns among different images, we design an image categorization algorithm to interpret how these differences help visual recognition within the bag-of-feature framework. The proposed method is verified on a variety of image categorization tasks including outdoor scene images, indoor scene images as well as the airborne SAR images from different perspectives.  相似文献   

19.
The processes underlying object recognition are fundamental for the understanding of visual perception. Humans can recognize many objects rapidly even in complex scenes, a task that still presents major challenges for computer vision systems. A common experimental demonstration of this ability is the rapid animal detection protocol, where human participants earliest responses to report the presence/absence of animals in natural scenes are observed at 250–270 ms latencies. One of the hypotheses to account for such speed is that people would not actually recognize an animal per se, but rather base their decision on global scene statistics. These global statistics (also referred to as spatial envelope or gist) have been shown to be computationally easy to process and could thus be used as a proxy for coarse object recognition. Here, using a saccadic choice task, which allows us to investigate a previously inaccessible temporal window of visual processing, we showed that animal – but not vehicle – detection clearly precedes scene categorization. This asynchrony is in addition validated by a late contextual modulation of animal detection, starting simultaneously with the availability of scene category. Interestingly, the advantage for animal over scene categorization is in opposition to the results of simulations using standard computational models. Taken together, these results challenge the idea that rapid animal detection might be based on early access of global scene statistics, and rather suggests a process based on the extraction of specific local complex features that might be hardwired in the visual system.  相似文献   

20.
Although the visual flight control strategies of flying insects have evolved to cope with the complexity of the natural world, studies investigating this behaviour have typically been performed indoors using simplified two-dimensional artificial visual stimuli. How well do the results from these studies reflect the natural behaviour of flying insects considering the radical differences in contrast, spatial composition, colour and dimensionality between these visual environments? Here, we aim to answer this question by investigating the effect of three- and two-dimensional naturalistic and artificial scenes on bumblebee flight control in an outdoor setting and compare the results with those of similar experiments performed in an indoor setting. In particular, we focus on investigating the effect of axial (front-to-back) visual motion cues on ground speed and centring behaviour. Our results suggest that, in general, ground speed control and centring behaviour in bumblebees is not affected by whether the visual scene is two- or three dimensional, naturalistic or artificial, or whether the experiment is conducted indoors or outdoors. The only effect that we observe between naturalistic and artificial scenes on flight control is that when the visual scene is three-dimensional and the visual information on the floor is minimised, bumblebees fly further from the midline of the tunnel. The findings presented here have implications not only for understanding the mechanisms of visual flight control in bumblebees, but also for the results of past and future investigations into visually guided flight control in other insects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号