首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Humans use various cues to understand the structure of the world from images. One such cue is the contours of an object formed by occlusion or from surface discontinuities. It is known that contours in the image of an object provide various amounts of information about the shape of the object in view, depending on assumptions that the observer makes. Another powerful cue is motion. The ability of the human visual system to discern structure from a motion stimulus is well known and has a solid theoretical and experimental foundation. However, when humans interpret a visual scene they use various cues to understand what they observe, and the interpretation comes from combining the information acquired from the various modules devoted to specific cues. In such an integration of modules it seems that each cue carries a different weight and importance. We performed several experiments where we made sure that the only cues available to the observer were contour and motion. It turns out that when humans combine information from contour and motion to reconstruct the shape of an object in view, if the results of the two modules--shape from contour and structure from motion--are inconsistent, they experience a perceptual result which is due to the combination of the two modules, with the influence of the contour dominating, thus giving rise to the illusion. We describe here examples of such illusions and identify the conditions under which they happen. Finally, we introduce a computational theory for combining contour and motion using the theory of regularization. The theory explains such illusions and predicts many more.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

2.
Random-dot stereograms were generated with a blank area placed in part of the right-hand image so making a patchwork of monocular and binocular areas. The perceived depth and shape of the monocular region, where depth was not explicitly marked, depended in part on the depth and surface orientation of adjacent binocular areas. Thus a monocular rectangle flanked by two binocular rectangles which were placed in different fronto-parallel planes was seen as a sloping surface spanning the depth between the binocular regions, and, under some conditions, the gradient of a sloping binocular plane extended into a neighbouring monocular area. Division of the monocular region into two by textural discontinuities or discontinuities of motion sometimes altered the shape of the extrapolated surface. Often, though, the shape was unchanged by such discontinuities implying that both two- and three-dimensional features are used to segment a scene into separate surfaces. Pictorial cues also contribute to the shape and apparent depth of the monocular surface. For instance, when subjects viewed a display consisting of portions of a cube of which two ends were shown stereoscopically and one side monocularly, the monocular side was seen in three dimensions filling the gap between the ends. When stereo cues were pitted against pictorial cues, sometimes pictorial cues and sometimes stereo cues dominated, and sometimes the surface contained sharp discontinuities enabling both to be accommodated.  相似文献   

3.
Harding G  Harris JM  Bloj M 《PloS one》2012,7(4):e35950
The luminance and colour gradients across an image are the result of complex interactions between object shape, material and illumination. Using such variations to infer object shape or surface colour is therefore a difficult problem for the visual system. We know that changes to the shape of an object can affect its perceived colour, and that shading gradients confer a sense of shape. Here we investigate if the visual system is able to effectively utilise these gradients as a cue to shape perception, even when additional cues are not available. We tested shape perception of a folded card object that contained illumination gradients in the form of shading and more subtle effects such as inter-reflections. Our results suggest that observers are able to use the gradients to make consistent shape judgements. In order to do this, observers must be given the opportunity to learn suitable assumptions about the lighting and scene. Using a variety of different training conditions, we demonstrate that learning can occur quickly and requires only coarse information. We also establish that learning does not deliver a trivial mapping between gradient and shape; rather learning leads to the acquisition of assumptions about lighting and scene parameters that subsequently allow for gradients to be used as a shape cue. The perceived shape is shown to be consistent for convex and concave versions of the object that exhibit very different shading, and also similar to that delivered by outline, a largely unrelated cue to shape. Overall our results indicate that, although gradients are less reliable than some other cues, the relationship between gradients and shape can be quickly assessed and the gradients therefore used effectively as a visual shape cue.  相似文献   

4.
Shape from texture   总被引:4,自引:0,他引:4  
A central goal for visual perception is the recovery of the three-dimensional structure of the surfaces depicted in an image. Crucial information about three-dimensional structure is provided by the spatial distribution of surface markings, particularly for static monocular views: projection distorts texture geometry in a manner tha depends systematically on surface shape and orientation. To isolate and measure this projective distortion in an image is to recover the three dimensional structure of the textured surface. For natural textures, we show that the uniform density assumption (texels are uniformly distributed) is enough to recover the orientation of a single textured plane in view, under perspective projection. Furthermore, when the texels cannot be found, the edges of the image are enough to determine shape, under a more general assumption, that the sum of the lengths of the contours on the world plane is about the same everywhere. Finally, several experimental results for synthetic and natural images are presented.  相似文献   

5.
Spatial analysis of objects often requires significant image simplification prior to information extraction and application of a decision-making algorithm. Much decision making based on images (e.g., histologic diagnoses) requires identifying patterns in complex backgrounds (image simplification) and comparison of those patterns to other patterns (decision making). Automated extraction of information from images commonly requires the extraction system to recognize edges (contours) of structures and their internal discontinuities (such as gradations in density) and to selectively suppress irrelevant data in order to conserve memory and speed computation; data from homogeneous image areas occupy memory, but are noncontributory or redundant. This paper describes the development of a microcomputer-based algorithm that deletes all homogeneous information from overlaid digitized images, generating contours in the place of nonhomogeneities. Contours corresponding to different areas or objects depend on color differences between an object and its surroundings. Any set of contours can be deleted almost instantaneously, leaving only those of interest. Contours can be highlighted by an operator-driven interactive process if desired and can be deleted and retrieved until an appropriate image is obtained. This contour-generating and image-simplification algorithm facilitates three-dimensional reconstruction of an object from serial images by reducing the number of calculations required and yielding a cleaner final image.  相似文献   

6.
Texture discontinuities are a fundamental cue by which the visual system segments objects from their background. The neural mechanisms supporting texture-based segmentation are therefore critical to visual perception and cognition. In the present experiment we employ an EEG source-imaging approach in order to study the time course of texture-based segmentation in the human brain. Visual Evoked Potentials were recorded to four types of stimuli in which periodic temporal modulation of a central 3° figure region could either support figure-ground segmentation, or have identical local texture modulations but not produce changes in global image segmentation. The image discontinuities were defined either by orientation or phase differences across image regions. Evoked responses to these four stimuli were analyzed both at the scalp and on the cortical surface in retinotopic and functional regions-of-interest (ROIs) defined separately using fMRI on a subject-by-subject basis. Texture segmentation (tsVEP: segmenting versus non-segmenting) and cue-specific (csVEP: orientation versus phase) responses exhibited distinctive patterns of activity. Alternations between uniform and segmented images produced highly asymmetric responses that were larger after transitions from the uniform to the segmented state. Texture modulations that signaled the appearance of a figure evoked a pattern of increased activity starting at ~143 ms that was larger in V1 and LOC ROIs, relative to identical modulations that didn't signal figure-ground segmentation. This segmentation-related activity occurred after an initial response phase that did not depend on the global segmentation structure of the image. The two cue types evoked similar tsVEPs up to 230 ms when they differed in the V4 and LOC ROIs. The evolution of the response proceeded largely in the feed-forward direction, with only weak evidence for feedback-related activity.  相似文献   

7.
8.
As we move through the world, our eyes acquire a sequence of images. The information from this sequence is sufficient to determine the structure of a three-dimensional scene, up to a scale factor determined by the distance that the eyes have moved. Previous evidence shows that the human visual system accounts for the distance the observer has walked and the separation of the eyes when judging the scale, shape, and distance of objects. However, in an immersive virtual-reality environment, observers failed to notice when a scene expanded or contracted, despite having consistent information about scale from both distance walked and binocular vision. This failure led to large errors in judging the size of objects. The pattern of errors cannot be explained by assuming a visual reconstruction of the scene with an incorrect estimate of interocular separation or distance walked. Instead, it is consistent with a Bayesian model of cue integration in which the efficacy of motion and disparity cues is greater at near viewing distances. Our results imply that observers are more willing to adjust their estimate of interocular separation or distance walked than to accept that the scene has changed in size.  相似文献   

9.
When confronted with complex visual scenes in daily life, how do we know which visual information represents our own hand? We investigated the cues used to assign visual information to one''s own hand. Wrist tendon vibration elicits an illusory sensation of wrist movement. The intensity of this illusion attenuates when the actual motionless hand is visually presented. Testing what kind of visual stimuli attenuate this illusion will elucidate factors contributing to visual detection of one''s own hand. The illusion was reduced when a stationary object was shown, but only when participants knew it was controllable with their hands. In contrast, the visual image of their own hand attenuated the illusion even when participants knew that it was not controllable. We suggest that long-term knowledge about the appearance of the body and short-term knowledge about controllability of a visual object are combined to robustly extract our own body from a visual scene.  相似文献   

10.
Does becoming aware of a change to a purely visual stimulus necessarily cause the observer to be able to identify or localise the change or can change detection occur in the absence of identification or localisation? Several theories of visual awareness stress that we are aware of more than just the few objects to which we attend. In particular, it is clear that to some extent we are also aware of the global properties of the scene, such as the mean luminance or the distribution of spatial frequencies. It follows that we may be able to detect a change to a visual scene by detecting a change to one or more of these global properties. However, detecting a change to global property may not supply us with enough information to accurately identify or localise which object in the scene has been changed. Thus, it may be possible to reliably detect the occurrence of changes without being able to identify or localise what has changed. Previous attempts to show that this can occur with natural images have produced mixed results. Here we use a novel analysis technique to provide additional evidence that changes can be detected in natural images without also being identified or localised. It is likely that this occurs by the observers monitoring the global properties of the scene.  相似文献   

11.
We view the world with two eyes and yet are typically only aware of a single, coherent image. Arguably the simplest explanation for this is that the visual system unites the two monocular stimuli into a common stream that eventually leads to a single coherent sensation. However, this notion is inconsistent with the well-known phenomenon of rivalry; when physically different stimuli project to the same retinal location, the ensuing perception alternates between the two monocular views in space and time. Although fundamental for understanding the principles of binocular vision and visual awareness, the mechanisms under-lying binocular rivalry remain controversial. Specifically, there is uncertainty about what determines whether monocular images undergo fusion or rivalry. By taking advantage of the perceptual phenomenon of color contrast, we show that physically identical monocular stimuli tend to rival-not fuse-when they signify different objects at the same location in visual space. Conversely, when physically different monocular stimuli are likely to represent the same object at the same location in space, fusion is more likely to result. The data suggest that what competes for visual awareness in the two eyes is not the physical similarity between images but the similarity in their perceptual/empirical meaning.  相似文献   

12.

Background

Optic flow is an important cue for object detection. Humans are able to perceive objects in a scene using only kinetic boundaries, and can perform the task even when other shape cues are not provided. These kinetic boundaries are characterized by the presence of motion discontinuities in a local neighbourhood. In addition, temporal occlusions appear along the boundaries as the object in front covers the background and the objects that are spatially behind it.

Methodology/Principal Findings

From a technical point of view, the detection of motion boundaries for segmentation based on optic flow is a difficult task. This is due to the problem that flow detected along such boundaries is generally not reliable. We propose a model derived from mechanisms found in visual areas V1, MT, and MSTl of human and primate cortex that achieves robust detection along motion boundaries. It includes two separate mechanisms for both the detection of motion discontinuities and of occlusion regions based on how neurons respond to spatial and temporal contrast, respectively. The mechanisms are embedded in a biologically inspired architecture that integrates information of different model components of the visual processing due to feedback connections. In particular, mutual interactions between the detection of motion discontinuities and temporal occlusions allow a considerable improvement of the kinetic boundary detection.

Conclusions/Significance

A new model is proposed that uses optic flow cues to detect motion discontinuities and object occlusion. We suggest that by combining these results for motion discontinuities and object occlusion, object segmentation within the model can be improved. This idea could also be applied in other models for object segmentation. In addition, we discuss how this model is related to neurophysiological findings. The model was successfully tested both with artificial and real sequences including self and object motion.  相似文献   

13.
Skilled object manipulation requires knowledge, or internal models, of object dynamics relating applied force to motion , and our ability to handle myriad objects indicates that the brain maintains multiple models . Recent behavioral studies have shown that once learned, an internal model of an object with novel dynamics can be rapidly recruited and derecruited as the object is grasped and released . We used event-related fMRI to investigate neural activity linked to grasping an object with recently learned dynamics in preparation for moving it after a delay. Subjects also performed two control tasks in which they either moved without the object in hand or applied isometric forces to the object. In all trials, subjects received a cue indicating which task to perform in response to a go signal delivered 5-10 s later. We examined BOLD responses during the interval between the cue and go and assessed the conjunction of the two contrasts formed by comparing the primary task to each control. The analysis revealed significant activity in the ipsilateral cerebellum and the contralateral and supplementary motor areas. We propose that these regions are involved in internal-model recruitment in preparation for movement execution.  相似文献   

14.
Ninio J 《Spatial Vision》2007,21(1-2):185-200
Autostereograms or SIRDS (Single Image Random-Dot Stereograms) are camouflaged stereograms which combine the Julesz random-dot stereogram principle with the wallpaper effect. They can represent any 3D shape on a single image having a quasi-periodic appearance. Rather large SIRDS can be interpreted in depth with unaided eyes. In the hands of computer graphic designers, SIRDS spread all over the world in 1992-1994, and these images, it was claimed, opened a new era of stereoscopic art. Some scientific, algorithmic and artistic aspects of these images are reviewed here. Scientifically, these images provide interesting cues on stereoscopic memory, and on the roles of monocular regions and texture boundaries in stereopsis. Algorithmically, problems arising with early SIRDS, such as internal texture repeats or ghost images are evoked. Algorithmic recommendations are made for gaining a better control on the construction of SIRDS. Problems of graphic quality (smoothness of the represented surfaces, or elimination of internal texture repeats) are discussed and possible solutions are proposed. Artistically, it is proposed that SIRDS should become less anecdotal, and more oriented towards simple geometric effects, which could be implemented on large panels in natural surrounds.  相似文献   

15.
Anderson BL 《Neuron》1999,24(4):919-928
Physiological, computational, and psychophysical studies of stereopsis have assumed that the perceived surface structure of binocularly viewed images is primarily specified by the pattern of binocular disparities in the two eyes' views. A novel set of stereoscopic phenomena are reported that demonstrate the insufficiency of this view. It is shown that the visual system computes the contrast relationships along depth discontinuities to infer the depth, lightness, and opacity of stereoscopically viewed surfaces. A novel theoretical framework is introduced to explain these results. It is argued that the visual system contains mechanisms that enforce two principles of scene interpretation: a generic view principle that determines qualitative scene geometry, and anchoring principles that determine how image data are quantitatively partitioned between different surface attributes.  相似文献   

16.
When a voluntary action is followed by an unexpected stimulus, a late positive potential (LPP) with a posterior scalp distribution is elicited in a latency range of 500–700 ms. In the present study, we examined what type of mismatch between expectations and action outcomes was reflected by the LPP. Twelve student volunteers participated in a task simulating choice of TV programs. After choosing one of three options displayed as a cue stimulus, they viewed a second stimulus (still TV image). To manipulate the type of expectation, three kinds of cue conditions were used: thumbnail image condition (three small TV images), category label condition (three words), and no cue condition (three question marks). Over trials, the second stimulus either matched (p = .80) or mismatched (p = .20) the chosen option. As compared to matched TV images, mismatched TV images elicited a larger LPP (500–700 ms) in the thumbnail image and category label conditions. In addition, a larger centroparietal P3 (400–450 ms) was elicited to mismatched TV images in the thumbnail image condition alone. LPP reflects a conceptual mismatch between a category-based expectation and an ensuing action outcome, whereas P3 reflects a perceptual mismatch between an image-based expectation and an action outcome.  相似文献   

17.
This paper presents a computational model to address one prominent psychological behavior of human beings to recognize images. The basic pursuit of our method can be concluded as that differences among multiple images help visual recognition. Generally speaking, we propose a statistical framework to distinguish what kind of image features capture sufficient category information and what kind of image features are common ones shared in multiple classes. Mathematically, the whole formulation is subject to a generative probabilistic model. Meanwhile, a discriminative functionality is incorporated into the model to interpret the differences among all kinds of images. The whole Bayesian formulation is solved in an Expectation-Maximization paradigm. After finding those discriminative patterns among different images, we design an image categorization algorithm to interpret how these differences help visual recognition within the bag-of-feature framework. The proposed method is verified on a variety of image categorization tasks including outdoor scene images, indoor scene images as well as the airborne SAR images from different perspectives.  相似文献   

18.
This article deals with the role of fish's body and object's geometry on determining the image spatial shape in pulse Gymnotiforms. This problem was explored by measuring local electric fields along a line on the skin in the presence and absence of objects. We depicted object's electric images at different regions of the electrosensory mosaic, paying particular attention to the perioral region where a fovea has been described. When sensory surface curvature increases relative to the object's curvature, the image details depending on object's shape are blurred and finally disappear. The remaining effect of the object on the stimulus profile depends on the strength of its global polarization. This depends on the length of the object's axis aligned with the field, in turn depending on fish body geometry. Thus, fish's body and self-generated electric field geometries are embodied in this "global effect" of the object. The presence of edges or local changes in impedance at the nearest surface of closely located objects adds peaks to the image profiles ("local effect" or "object's electric texture"). It is concluded that two cues for object recognition may be used by active electroreceptive animals: global effects (informing on object's dimension along the field lines, conductance, and position) and local effects (informing on object's surface). Since the field has fish's centered coordinates, and electrosensory fovea is used for exploration of surfaces, fish fine movements are essential to perform electric perception. We conclude that fish may explore adjacent objects combining active movements and electrogenesis to represent them using electrosensory information.  相似文献   

19.
Based on an information theoretical approach, we investigate feature selection processes in saccadic object and scene analysis. Saccadic eye movements of human observers are recorded for a variety of natural and artificial test images. These experimental data are used for a statistical evaluation of the fixated image regions. Analysis of second-order statistics indicates that regions with higher spatial variance have a higher probability to be fixated, but no significant differences beyond these variance effects could be found at the level of power spectra. By contrast, an investigation with higher-order statistics, as reflected in the bispectral density, yielded clear structural differences between the image regions selected by saccadic eye movements as opposed to regions selected by a random process. These results indicate that nonredundant, intrinsically two-dimensional image features like curved lines and edges, occlusions, isolated spots, etc. play an important role in the saccadic selection process which must be integrated with top-down knowledge to fully predict object and scene analysis by human observers.  相似文献   

20.
A key challenge underlying theories of vision is how the spatially restricted, retinotopically represented feature analysis can be integrated to form abstract, coordinate-free object models. A resolution likely depends on the use of intermediate-level representations which can on the one hand be populated by local features and on the other hand be used as atomic units underlying the formation of, and interaction with, object hypotheses. The precise structure of this intermediate representation derives from the varied requirements of a range of visual tasks which motivate a significant role for incorporating a geometry of visual form. The need to integrate input from features capturing surface properties such as texture, shading, motion, color, etc., as well as from features capturing surface discontinuities such as silhouettes, T-junctions, etc., implies a geometry which captures both regional and boundary aspects. Curves, as a geometric model of boundaries, have been extensively used as an intermediate representation in computational, perceptual, and physiological studies, while the use of the medial axis (MA) has been popular mainly in computer vision as a geometric region-based model of the interior of closed boundaries. We extend the traditional model of the MA to represent images, where each MA segment represents a region of the image which we call a visual fragment. We present a unified theory of perceptual grouping and object recognition where through various sequences of transformations of the MA representation, visual fragments are grouped in various configurations to form object hypotheses, and are related to stored models. The mechanisms underlying both the computation and the transformation of the MA is a lateral wave propagation model. Recent psychophysical experiments depicting contrast sensitivity map peaks at the medial axes of stimuli, and experiments on perceptual filling-in, and brightness induction and modulation, are consistent with both the use of an MA representation and a propagation-based scheme. Also, recent neurophysiological recordings in V1 correlate with the MA hypothesis and a horizontal propagation scheme. This evidence supports a geometric computational paradigm for processing sensory data where both dynamic in-plane propagation and feedforward-feedback connections play an integral role.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号