首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Optic flow is an important cue for object detection. Humans are able to perceive objects in a scene using only kinetic boundaries, and can perform the task even when other shape cues are not provided. These kinetic boundaries are characterized by the presence of motion discontinuities in a local neighbourhood. In addition, temporal occlusions appear along the boundaries as the object in front covers the background and the objects that are spatially behind it.

Methodology/Principal Findings

From a technical point of view, the detection of motion boundaries for segmentation based on optic flow is a difficult task. This is due to the problem that flow detected along such boundaries is generally not reliable. We propose a model derived from mechanisms found in visual areas V1, MT, and MSTl of human and primate cortex that achieves robust detection along motion boundaries. It includes two separate mechanisms for both the detection of motion discontinuities and of occlusion regions based on how neurons respond to spatial and temporal contrast, respectively. The mechanisms are embedded in a biologically inspired architecture that integrates information of different model components of the visual processing due to feedback connections. In particular, mutual interactions between the detection of motion discontinuities and temporal occlusions allow a considerable improvement of the kinetic boundary detection.

Conclusions/Significance

A new model is proposed that uses optic flow cues to detect motion discontinuities and object occlusion. We suggest that by combining these results for motion discontinuities and object occlusion, object segmentation within the model can be improved. This idea could also be applied in other models for object segmentation. In addition, we discuss how this model is related to neurophysiological findings. The model was successfully tested both with artificial and real sequences including self and object motion.  相似文献   

2.
We are surrounded by surfaces that we perceive by visual means. Understanding the basic principles behind this perceptual process is a central theme in visual psychology, psychophysics, and computational vision. In many of the computational models employed in the past, it has been assumed that a metric representation of physical space can be derived by visual means. Psychophysical experiments, as well as computational considerations, can convince us that the perception of space and shape has a much more complicated nature, and that only a distorted version of actual, physical space can be computed. This paper develops a computational geometric model that explains why such distortion might take place. The basic idea is that, both in stereo and motion, we perceive the world from multiple views. Given the rigid transformation between the views and the properties of the image correspondence, the depth of the scene can be obtained. Even a slight error in the rigid transformation parameters causes distortion of the computed depth of the scene. The unified framework introduced here describes this distortion in computational terms. We characterize the space of distortions by its level sets, that is, we characterize the systematic distortion via a family of iso-distortion surfaces which describes the locus over which depths are distorted by some multiplicative factor. Given that humans' estimation of egomotion or estimation of the extrinsic parameters of the stereo apparatus is likely to be imprecise, the framework is used to explain a number of psychophysical experiments on the perception of depth from motion or stereo. Received: 9 January 1997 / Accepted in revised form: 8 July 1997  相似文献   

3.
The human visual system utilizes depth information as a major cue to group together visual items constituting an object and to segregate them from items belonging to other objects in the visual scene. Depth information can be inferred from a variety of different visual cues, such as disparity, occlusions and perspective. Many of these cues provide only local and relative information about the depth of objects. For example, at occlusions, T-junctions indicate the local relative depth precedence of surface patches. However, in order to obtain a globally consistent interpretation of the depth relations between the surfaces and objects in a visual scene, a mechanism is necessary that globally propagates such local and relative information. We present a computational framework in which depth information derived from T-junctions is propagated along surface contours using local recurrent interactions between neighboring neurons. We demonstrate that within this framework a globally consistent depth sorting of overlapping surfaces can be obtained on the basis of local interactions. Unlike previous approaches in which locally restricted cell interactions could merely distinguish between two depths (figure and ground), our model can also represent several intermediate depth positions. Our approach is an extension of a previous model of recurrent V1–V2 interaction for contour processing and illusory contour formation. Based on the contour representation created by this model, a recursive scheme of local interactions subsequently achieves a globally consistent depth sorting of several overlapping surfaces. Within this framework, the induction of illusory contours by the model of recurrent V1–V2 interaction gives rise to the figure-ground segmentation of illusory figures such as a Kanizsa square.  相似文献   

4.
Flying insects are able to fly smartly in an unpredictable environment. It has been found that flying insects have smart neurons inside their tiny brains that are sensitive to visual motion also called optic flow. Consequently, flying insects rely mainly on visual motion during their flight maneuvers such as: takeoff or landing, terrain following, tunnel crossing, lateral and frontal obstacle avoidance, and adjusting flight speed in a cluttered environment. Optic flow can be defined as the vector field of the apparent motion of objects, surfaces, and edges in a visual scene generated by the relative motion between an observer (an eye or a camera) and the scene. Translational optic flow is particularly interesting for short-range navigation because it depends on the ratio between (i) the relative linear speed of the visual scene with respect to the observer and (ii) the distance of the observer from obstacles in the surrounding environment without any direct measurement of either speed or distance. In flying insects, roll stabilization reflex and yaw saccades attenuate any rotation at the eye level in roll and yaw respectively (i.e. to cancel any rotational optic flow) in order to ensure pure translational optic flow between two successive saccades. Our survey focuses on feedback-loops which use the translational optic flow that insects employ for collision-free navigation. Optic flow is likely, over the next decade to be one of the most important visual cues that can explain flying insects' behaviors for short-range navigation maneuvers in complex tunnels. Conversely, the biorobotic approach can therefore help to develop innovative flight control systems for flying robots with the aim of mimicking flying insects’ abilities and better understanding their flight.  相似文献   

5.
We describe psychophysical evidence that the human visual system contains information-processing channels for motion in depth in addition to those for position in depth. These motion-in-depth channels include some that are selectively sensitive to the relative velocities of the left and right retinal images. We propose that the visual pathway contains stereoscopic (cyclopean) motion filters that respond to only a narrow range of the directions of motion in depth. Turning to the single-neuron level we report that, in addition to neurons turned to position to depth, cat visual cortex contains neurons that emphasize information about the direction of motion at the expense of positional information. We describe psychophysical evidence for the existence of channels that are sensitive to change size, and are separate from the channels both for motion and for flicker. These changing-size channels respond independently of whether the stimulus is a bright square on a dark ground or a dark square on a bright ground. At the physiological level we report single neurons in cat visual cortex that respond selectively to increasing or to decreasing size independently of the sign of stimulus contrast. Adaptation to a changing-size stimulus produces two separable after-effects: an illusion of changing size, and an illusion of motion in depth. These after-effects have different decay time constants. We propose a psychophysical model in which changing-size filters feed a motion-in-depth stage, and suppose that the motion-in-depth after-effect is due to activity at the motion-in-depth stage, while the changing-size after-effect is due to to activity at the changing-size and more peripheral stages. The motion-in-depth after-effect can be cancelled either by a changing-size test stimulus or by relative motion of the left and right retinal images. Opposition of these two cues can also cancel the impression of motion in depth produced by the adapting stimulus. These findings link the stereoscopic (cyclopean) motion filters and the changing-size filters: both feed the same motion-in-depth stage.  相似文献   

6.
The projected pattern of retinal-image motion supplies the human visual system with valuable information about properties of the three-dimensional environment. How well three-dimensional properties can be recovered depends both on the accuracy with which the early motion system estimates retinal motion, and on the way later processes interpret this retinal motion. Here we combine both early and late stages of the computational process to account for the hitherto puzzling phenomenon of systematic biases in three-dimensional shape perception. We present data showing how the perceived depth of a hinged plane (''an open book'') can be systematically biased by the extent over which it rotates. We then present a Bayesian model that combines early measurement noise with geometric reconstruction of the three-dimensional scene. Although this model has no in-built bias towards particular three-dimensional shapes, it accounts for the data well. Our analysis suggests that the biases stem largely from the geometric constraints imposed on what three-dimensional scenes are compatible with the (noisy) early motion measurements. Given these findings, we suggest that the visual system may act as an optimal estimator of three-dimensional structure-from-motion.  相似文献   

7.
Anderson BL 《Neuron》1999,24(4):919-928
Physiological, computational, and psychophysical studies of stereopsis have assumed that the perceived surface structure of binocularly viewed images is primarily specified by the pattern of binocular disparities in the two eyes' views. A novel set of stereoscopic phenomena are reported that demonstrate the insufficiency of this view. It is shown that the visual system computes the contrast relationships along depth discontinuities to infer the depth, lightness, and opacity of stereoscopically viewed surfaces. A novel theoretical framework is introduced to explain these results. It is argued that the visual system contains mechanisms that enforce two principles of scene interpretation: a generic view principle that determines qualitative scene geometry, and anchoring principles that determine how image data are quantitatively partitioned between different surface attributes.  相似文献   

8.
Form and motion perception rely upon the visual system’s capacity to segment the visual scene based upon local differences in luminance or wavelength. It is not clear if polarization contrast is a sufficient basis for motion detection. Here we show that crayfish optomotor responses elicited by the motion of images derived from spatiotemporal variations in e-vector angles are comparable to contrast-elicited responses. Response magnitude increases with the difference in e-vector angles in adjacent segments of the scene and with the degree of polarization but the response is relatively insensitive to the absolute values of e-vector angles that compose the stimulus. The results indicate that polarization contrast can support visual motion detection.  相似文献   

9.
In contradistinction to conventional wisdom, we propose that retinal image slip of a visual scene (optokinetic pattern, OP) does not constitute the only crucial input for visually induced percepts of self-motion (vection). Instead, the hypothesis is investigated that there are three input factors: 1) OP retinal image slip, 2) motion of the ocular orbital shadows across the retinae, and 3) smooth pursuit eye movements (efference copy). To test this hypothesis, we visually induced percepts of sinusoidal rotatory self-motion (circular vection, CV) in the absence of vestibular stimulation. Subjects were presented with three concurrent stimuli: a large visual OP, a fixation point to be pursued with the eyes (both projected in superposition on a semi-circular screen), and a dark window frame placed close to the eyes to create artificial visual field boundaries that simulate ocular orbital rim boundary shadows, but which could be moved across the retinae independent from eye movements. In different combinations these stimuli were independently moved or kept stationary. When moved together (horizontally and sinusoidally around the subject's head), they did so in precise temporal synchrony at 0.05 Hz. The results show that the occurrence of CV requires retinal slip of the OP and/or relative motion between the orbital boundary shadows and the OP. On the other hand, CV does not develop when the two retinal slip signals equal each other (no relative motion) and concur with pursuit eye movements (as it is the case, e.g., when we follow with the eyes the motion of a target on a stationary visual scene). The findings were formalized in terms of a simulation model. In the model two signals coding relative motion between OP and head are fused and fed into the mechanism for CV, a visuo-oculomotor one, derived from OP retinal slip and eye movement efference copy, and a purely visual signal of relative motion between the orbital rims (head) and the OP. The latter signal is also used, together with a version of the oculomotor efference copy, for a mechanism that suppresses CV at a later stage of processing in conditions in which the retinal slip signals are self-generated by smooth pursuit eye movements.  相似文献   

10.
It is well known that the human postural control system responds to motion of the visual scene, but the implicit assumptions it makes about the visual environment and what quantities, if any, it estimates about the visual environment are unknown. This study compares the behavior of four models of the human postural control system to experimental data. Three include internal models that estimate the state of the visual environment, implicitly assuming its dynamics to be that of a linear stochastic process (respectively, a random walk, a general first-order process, and a general second-order process). In each case, all of the coefficients that describe the process are estimated by an adaptive scheme based on maximum likelihood. The fourth model does not estimate the state of the visual environment. It adjusts sensory weights to minimize the mean square of the control signal without making any specific assumptions about the dynamic properties of the environmental motion.We find that both having an internal model of the visual environment and its type make a significant difference in how the postural system responds to motion of the visual scene. Notably, the second-order process model outperforms the human postural system in its response to sinusoidal stimulation. Specifically, the second-order process model can correctly identify the frequency of the stimulus and completely compensate so that the motion of the visual scene has no effect on sway. In this case the postural control system extracts the same information from the visual modality as it does when the visual scene is stationary. The fourth model that does not simulate the motion of the visual environment is the only one that reproduces the experimentally observed result that, across different frequencies of sinusoidal stimulation, the gain with respect to the stimulus drops as the amplitude of the stimulus increases but the phase remains roughly constant. Our results suggest that the human postural control system does not estimate the state of the visual environment to respond to sinusoidal stimuli.  相似文献   

11.
12.
Fang F  He S 《Current biology : CB》2004,14(3):247-251
3D structures can be perceived based on the patterns of 2D motion signals. With orthographic projection of a 3D stimulus onto a 2D plane, the kinetic information can give a vivid impression of depth, but the depth order is intrinsically ambiguous, resulting in bistable or even multistable interpretations. For example, an orthographic projection of dots on the surface of a rotating cylinder is perceived as a rotating cylinder with ambiguous direction of rotation. We show that the bistable rotation can be stabilized by adding information, not to the dots themselves, but to their spatial context. More interestingly, the stabilized bistable motion can generate consistent rotation aftereffects. The rotation aftereffect can only be observed when the adapting and test stimuli are presented at the same stereo depth and the same retinal location, and it is not due to attentional tracking. The observed rotation aftereffect is likely due to direction-contingent disparity adaptation, implying that stimuli with kinetic depth may have activated neurons sensitive to different disparities, even though the stimuli have zero relative disparity. Stereo depth and kinetic depth may be supported by a common neural mechanism at an early stage in the visual system.  相似文献   

13.
Auditory cues can create the illusion of self-motion (vection) in the absence of visual or physical stimulation. The present study aimed to determine whether auditory cues alone can also elicit motion sickness and how auditory cues contribute to motion sickness when added to visual motion stimuli. Twenty participants were seated in front of a curved projection display and were exposed to a virtual scene that constantly rotated around the participant''s vertical axis. The virtual scene contained either visual-only, auditory-only, or a combination of corresponding visual and auditory cues. All participants performed all three conditions in a counterbalanced order. Participants tilted their heads alternately towards the right or left shoulder in all conditions during stimulus exposure in order to create pseudo-Coriolis effects and to maximize the likelihood for motion sickness. Measurements of motion sickness (onset, severity), vection (latency, strength, duration), and postural steadiness (center of pressure) were recorded. Results showed that adding auditory cues to the visual stimuli did not, on average, affect motion sickness and postural steadiness, but it did reduce vection onset times and increased vection strength compared to pure visual or pure auditory stimulation. Eighteen of the 20 participants reported at least slight motion sickness in the two conditions including visual stimuli. More interestingly, six participants also reported slight motion sickness during pure auditory stimulation and two of the six participants stopped the pure auditory test session due to motion sickness. The present study is the first to demonstrate that motion sickness may be caused by pure auditory stimulation, which we refer to as “auditorily induced motion sickness”.  相似文献   

14.
Olveczky BP  Baccus SA  Meister M 《Neuron》2007,56(4):689-700
Due to fixational eye movements, the image on the retina is always in motion, even when one views a stationary scene. When an object moves within the scene, the corresponding patch of retina experiences a different motion trajectory than the surrounding region. Certain retinal ganglion cells respond selectively to this condition, when the motion in the cell's receptive field center is different from that in the surround. Here we show that this response is strongest at the very onset of differential motion, followed by gradual adaptation with a time course of several seconds. Different subregions of a ganglion cell's receptive field can adapt independently. The circuitry responsible for differential motion adaptation lies in the inner retina. Several candidate mechanisms were tested, and the adaptation most likely results from synaptic depression at the synapse from bipolar to ganglion cell. Similar circuit mechanisms may act more generally to emphasize novel features of a visual stimulus.  相似文献   

15.
In robot-assisted beating heart surgery, motion of the heart surface might be virtually stabilized to let the surgeon work as in on-pump cardiac surgery. Virtual stabilization means to compensate physically the relative motion between the instrument tool tip and the region of interest on the heart surface, and to offer surgeon a stable visual display of the scene. To this end, motion of the heart must be estimated. This article focusses on motion estimation of the heart surface. Two approaches are considered in the paper. The first one is based on landmark tracking allowing 3D pose estimation. The second is based on texture tracking. Classical computer vision methods, as well as a new texture-based tracking scheme has been applied to track the heart motion, and, when possible, reconstruct 3D distance to the heart surface. Experimental results obtained on in vivo images show the estimated motion of heart surface points.  相似文献   

16.
Motion of visual scene (optokinetic stimulus) projected on a wide screen frequently induces motion sickness. Rotational movements of 3D visual images were analyzed to examine what factors are effective in visually-induced motion sickness and how the gravity contributes to its inducement. While an angle of a rotational axis of 3D visual image from the gravitational direction and its angle from the subjective vertical which was perceived by viewers through 3D visual image were varied, the severity of visually-induced motion sickness was measured.  相似文献   

17.
Segmenting meaningful targets from cluttered scenes is a fundamental function of the visual system. Evolution and development have been suggested to optimize the brain's solution to this computationally challenging task by tuning the visual system to features that co-occur frequently in natural scenes (e.g., collinear edges) [1, 2, 3]. However, the role of shorter-term experience in shaping the utility of scene statistics remains largely unknown. Here, we ask whether collinearity is a specialized case, or whether the brain can learn to recruit any image regularity for the purpose of target identification. Consistent with long-term optimization for typical scene statistics, observers were better at detecting collinear contours than configurations of elements oriented at orthogonal or acute angles to the contour path. However, training resulted in improved detection of orthogonal contours that lasted for several months, suggesting retuning rather than transient changes of visual sensitivity. Improvement was also observed for acute contours but only after longer training. These results demonstrate that the brain flexibly exploits image regularities and learns to use discontinuities typically associated with surface boundaries (orthogonal, acute alignments) for contour linking and target identification. Thus, short-term experience in adulthood shapes the interpretation of scenes by assigning new statistical utility to image regularities.  相似文献   

18.
Texture of various appearances, geometric distortions, spatial frequency content and densities is utilized by the human visual system to segregate items from background and to enable recognition of complex geometric forms. For automatic, or pre-attentive, segmentation of a visual scene, sophisticated analysis and comparison of surface properties over wide areas of the visual field are required. We investigated the neural substrate underlying human texture processing, particularly the computational mechanisms of texture boundary detection. We present a neural network model which uses as building blocks model cortical areas that are bi-directionally linked to implement cycles of feedforward and feedback interaction for signal detection, hypothesis generation and testing within the infero-temporal pathway of form processing. In the spirit of Jake Beck's early investigations our model particularly builds upon two key hypotheses, namely that (i) texture segregation is based on boundary detection, rather than clustering homogeneous items, and (ii) texture boundaries are detected mainly on the basis of larger scenic contexts mediated by higher cortical areas, such as area V4. The latter constraint provides a basis for element grouping in accordance to the Gestalt laws of similarity and good continuation. It is shown through simulations that the model integrates a variety of psychophysical findings on texture processing and provides a link to the underlying physiology. The functional role of feedback processing is demonstrated by context dependent modulation of V1 cell activation, leading to sharply localized detection of texture boundaries. It furthermore explains why pre-attentive processing in visual search tasks can be directly linked to texture boundary processing as revealed by recent EEG studies on visual search.  相似文献   

19.
Pack CC  Livingstone MS  Duffy KR  Born RT 《Neuron》2003,39(4):671-680
Our perception of fine visual detail relies on small receptive fields at early stages of visual processing. However, small receptive fields tend to confound the orientation and velocity of moving edges, leading to ambiguous or inaccurate motion measurements (the aperture problem). Thus, it is often assumed that neurons in primary visual cortex (V1) carry only ambiguous motion information. Here we show that a subpopulation of V1 neurons is capable of signaling motion direction in a manner that is independent of contour orientation. Specifically, end-stopped V1 neurons obtain accurate motion measurements by responding only to the endpoints of long contours, a strategy which renders them largely immune to the aperture problem. Furthermore, the time course of end-stopping is similar to the time course of motion integration by MT neurons. These results suggest that cortical neurons might represent object motion by responding selectively to two-dimensional discontinuities in the visual scene.  相似文献   

20.
Grossberg S 《Spatial Vision》2008,21(3-5):463-486
The human urge to represent the three-dimensional world using two-dimensional pictorial representations dates back at least to Paleolithic times. Artists from ancient to modern times have struggled to understand how a few contours or color patches on a flat surface can induce mental representations of a three-dimensional scene. This article summarizes some of the recent breakthroughs in scientifically understanding how the brain sees that shed light on these struggles. These breakthroughs illustrate how various artists have intuitively understood paradoxical properties about how the brain sees, and have used that understanding to create great art. These paradoxical properties arise from how the brain forms the units of conscious visual perception; namely, representations of three-dimensional boundaries and surfaces. Boundaries and surfaces are computed in parallel cortical processing streams that obey computationally complementary properties. These streams interact at multiple levels to overcome their complementary weaknesses and to transform their complementary properties into consistent percepts. The article describes how properties of complementary consistency have guided the creation of many great works of art.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号