首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Airport detection in remote sensing images: a method based on saliency map   总被引:1,自引:0,他引:1  
The detection of airport attracts lots of attention and becomes a hot topic recently because of its applications and importance in military and civil aviation fields. However, the complicated background around airports brings much difficulty into the detection. This paper presents a new method for airport detection in remote sensing images. Distinct from other methods which analyze images pixel by pixel, we introduce visual attention mechanism into detection of airport and improve the efficiency of detection greatly. Firstly, Hough transform is used to judge whether an airport exists in an image. Then an improved graph-based visual saliency model is applied to compute the saliency map and extract regions of interest (ROIs). The airport target is finally detected according to the scale-invariant feature transform features which are extracted from each ROI and classified by hierarchical discriminant regression tree. Experimental results show that the proposed method is faster and more accurate than existing methods, and has lower false alarm rate and better anti-noise performance simultaneously.  相似文献   

2.
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti''s visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti''s model and consists of two stages. In the first stage, Itti''s model is used to calculate the saliency map, and Otsu''s global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti''s model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti''s model in terms of captured scene texts.  相似文献   

3.
The automatic computerized detection of regions of interest (ROI) is an important step in the process of medical image processing and analysis. The reasons are many, and include an increasing amount of available medical imaging data, existence of inter-observer and inter-scanner variability, and to improve the accuracy in automatic detection in order to assist doctors in diagnosing faster and on time. A novel algorithm, based on visual saliency, is developed here for the identification of tumor regions from MR images of the brain. The GBM saliency detection model is designed by taking cue from the concept of visual saliency in natural scenes. A visually salient region is typically rare in an image, and contains highly discriminating information, with attention getting immediately focused upon it. Although color is typically considered as the most important feature in a bottom-up saliency detection model, we circumvent this issue in the inherently gray scale MR framework. We develop a novel pseudo-coloring scheme, based on the three MRI sequences, viz. FLAIR, T2 and T1C (contrast enhanced with Gadolinium). A bottom-up strategy, based on a new pseudo-color distance and spatial distance between image patches, is defined for highlighting the salient regions in the image. This multi-channel representation of the image and saliency detection model help in automatically and quickly isolating the tumor region, for subsequent delineation, as is necessary in medical diagnosis. The effectiveness of the proposed model is evaluated on MRI of 80 subjects from the BRATS database in terms of the saliency map values. Using ground truth of the tumor regions for both high- and low- grade gliomas, the results are compared with four highly referred saliency detection models from literature. In all cases the AUC scores from the ROC analysis are found to be more than 0.999 ± 0.001 over different tumor grades, sizes and positions.  相似文献   

4.
Xu J  Yang Z  Tsien JZ 《PloS one》2010,5(12):e15796
Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.  相似文献   

5.
In this paper we propose a computational model of bottom–up visual attention based on a pulsed principal component analysis (PCA) transform, which simply exploits the signs of the PCA coefficients to generate spatial and motional saliency. We further extend the pulsed PCA transform to a pulsed cosine transform that is not only data-independent but also very fast in computation. The proposed model has the following biological plausibilities. First, the PCA projection vectors in the model can be obtained by using the Hebbian rule in neural networks. Second, the outputs of the pulsed PCA transform, which are inherently binary, simulate the neuronal pulses in the human brain. Third, like many Fourier transform-based approaches, our model also accomplishes the cortical center-surround suppression in frequency domain. Experimental results on psychophysical patterns and natural images show that the proposed model is more effective in saliency detection and predict human eye fixations better than the state-of-the-art attention models.  相似文献   

6.
Saliency detection attracted attention of many researchers and had become a very active area of research. Recently, many saliency detection models have been proposed and achieved excellent performance in various fields. However, most of these models only consider low-level features. This paper proposes a novel saliency detection model using both color and texture features and incorporating higher-level priors. The SLIC superpixel algorithm is applied to form an over-segmentation of the image. Color saliency map and texture saliency map are calculated based on the region contrast method and adaptive weight. Higher-level priors including location prior and color prior are incorporated into the model to achieve a better performance and full resolution saliency map is obtained by using the up-sampling method. Experimental results on three datasets demonstrate that the proposed saliency detection model outperforms the state-of-the-art models.  相似文献   

7.
In this paper we propose a novel saliency-based computational model for visual attention. This model processes both top-down (goal directed) and bottom-up information. Processing in the top-down channel creates the so called skin conspicuity map and emulates the visual search for human faces performed by humans. This is clearly a goal directed task but is generic enough to be context independent. Processing in the bottom-up information channel follows the principles set by Itti et al. but it deviates from them by computing the orientation, intensity and color conspicuity maps within a unified multi-resolution framework based on wavelet subband analysis. In particular, we apply a wavelet based approach for efficient computation of the topographic feature maps. Given that wavelets and multiresolution theory are naturally connected the usage of wavelet decomposition for mimicking the center surround process in humans is an obvious choice. However, our implementation goes further. We utilize the wavelet decomposition for inline computation of the features (such as orientation angles) that are used to create the topographic feature maps. The bottom-up topographic feature maps and the top-down skin conspicuity map are then combined through a sigmoid function to produce the final saliency map. A prototype of the proposed model was realized through the TMDSDMK642-0E DSP platform as an embedded system allowing real-time operation. For evaluation purposes, in terms of perceived visual quality and video compression improvement, a ROI-based video compression setup was followed. Extended experiments concerning both MPEG-1 as well as low bit-rate MPEG-4 video encoding were conducted showing significant improvement in video compression efficiency without perceived deterioration in visual quality.  相似文献   

8.
In recent years, there has been considerable interest in visual attention models (saliency map of visual attention). These models can be used to predict eye fixation locations, and thus will have many applications in various fields which leads to obtain better performance in machine vision systems. Most of these models need to be improved because they are based on bottom-up computation that does not consider top-down image semantic contents and often does not match actual eye fixation locations. In this study, we recorded the eye movements (i.e., fixations) of fourteen individuals who viewed images which consist natural (e.g., landscape, animal) and man-made (e.g., building, vehicles) scenes. We extracted the fixation locations of eye movements in two image categories. After extraction of the fixation areas (a patch around each fixation location), characteristics of these areas were evaluated as compared to non-fixation areas. The extracted features in each patch included the orientation and spatial frequency. After feature extraction phase, different statistical classifiers were trained for prediction of eye fixation locations by these features. This study connects eye-tracking results to automatic prediction of saliency regions of the images. The results showed that it is possible to predict the eye fixation locations by using of the image patches around subjects’ fixation points.  相似文献   

9.
Many saliency computational models have been proposed to simulate bottom-up visual attention mechanism of human visual system. However, most of them only deal with certain kinds of images or aim at specific applications. In fact, human beings have the ability to correctly select attentive focuses of objects with arbitrary sizes within any scenes. This paper proposes a new bottom-up computational model from the perspective of frequency domain based on the biological discovery of non-Classical Receptive Field (nCRF) in the retina. A saliency map can be obtained according to the idea of Extended Classical Receptive Field. The model is composed of three major steps: firstly decompose the input image into several feature maps representing different frequency bands that cover the whole frequency domain by utilizing Gabor wavelet. Secondly, whiten the feature maps to highlight the embedded saliency information. Thirdly, select some optimal maps, simulating the response of receptive field especially nCRF, to generate the saliency map. Experimental results show that the proposed algorithm is able to work with stable effect and outstanding performance in a variety of situations as human beings do and is adaptive to both psychological patterns and natural images. Beyond that, biological plausibility of nCRF and Gabor wavelet transform make this approach reliable.  相似文献   

10.
A unique vertical bar among horizontal bars is salient and pops out perceptually. Physiological data have suggested that mechanisms in the primary visual cortex (V1) contribute to the high saliency of such a unique basic feature, but indicated little regarding whether V1 plays an essential or peripheral role in input-driven or bottom-up saliency. Meanwhile, a biologically based V1 model has suggested that V1 mechanisms can also explain bottom-up saliencies beyond the pop-out of basic features, such as the low saliency of a unique conjunction feature such as a red vertical bar among red horizontal and green vertical bars, under the hypothesis that the bottom-up saliency at any location is signaled by the activity of the most active cell responding to it regardless of the cell's preferred features such as color and orientation. The model can account for phenomena such as the difficulties in conjunction feature search, asymmetries in visual search, and how background irregularities affect ease of search. In this paper, we report nontrivial predictions from the V1 saliency hypothesis, and their psychophysical tests and confirmations. The prediction that most clearly distinguishes the V1 saliency hypothesis from other models is that task-irrelevant features could interfere in visual search or segmentation tasks which rely significantly on bottom-up saliency. For instance, irrelevant colors can interfere in an orientation-based task, and the presence of horizontal and vertical bars can impair performance in a task based on oblique bars. Furthermore, properties of the intracortical interactions and neural selectivities in V1 predict specific emergent phenomena associated with visual grouping. Our findings support the idea that a bottom-up saliency map can be at a lower visual area than traditionally expected, with implications for top-down selection mechanisms.  相似文献   

11.
An important requirement for vision is to identify interesting and relevant regions of the environment for further processing. Some models assume that salient locations from a visual scene are encoded in a dedicated spatial saliency map [1, 2]. Then, a winner-take-all (WTA) mechanism [1, 2] is often believed to threshold the graded saliency representation and identify the most salient position in the visual field. Here we aimed to assess whether neural representations of graded saliency and the subsequent WTA mechanism can be dissociated. We presented images of natural scenes while subjects were in a scanner performing a demanding fixation task, and thus their attention was directed away. Signals in early visual cortex and posterior intraparietal sulcus (IPS) correlated with graded saliency as defined by a computational saliency model. Multivariate pattern classification [3, 4] revealed that the most salient position in the visual field was encoded in anterior IPS and frontal eye fields (FEF), thus reflecting a potential WTA stage. Our results thus confirm that graded saliency and WTA-thresholded saliency are encoded in distinct neural structures. This could provide the neural representation required for rapid and automatic orientation toward salient events in natural environments.  相似文献   

12.
It has been hypothesized that neural activities in the primary visual cortex (V1) represent a saliency map of the visual field to exogenously guide attention. This hypothesis has so far provided only qualitative predictions and their confirmations. We report this hypothesis’ first quantitative prediction, derived without free parameters, and its confirmation by human behavioral data. The hypothesis provides a direct link between V1 neural responses to a visual location and the saliency of that location to guide attention exogenously. In a visual input containing many bars, one of them saliently different from all the other bars which are identical to each other, saliency at the singleton’s location can be measured by the shortness of the reaction time in a visual search for singletons. The hypothesis predicts quantitatively the whole distribution of the reaction times to find a singleton unique in color, orientation, and motion direction from the reaction times to find other types of singletons. The prediction matches human reaction time data. A requirement for this successful prediction is a data-motivated assumption that V1 lacks neurons tuned simultaneously to color, orientation, and motion direction of visual inputs. Since evidence suggests that extrastriate cortices do have such neurons, we discuss the possibility that the extrastriate cortices play no role in guiding exogenous attention so that they can be devoted to other functions like visual decoding and endogenous attention.  相似文献   

13.
The neural correlates of visual awareness are elusive because of its fleeting nature. Here we have addressed this issue by using single trial statistical “brain reading” of neurophysiological event related (ERP) signatures of conscious perception of visual attributes with different levels of saliency. Behavioral reports were taken at every trial in 4 experiments addressing conscious access to color, luminance, and local phase offset cues. We found that single trial neurophysiological signatures of target presence can be observed around 300 ms at central parietal sites. Such signatures are significantly related with conscious perception, and their probability is related to sensory saliency levels. These findings identify a general neural correlate of conscious perception at the single trial level, since conscious perception can be decoded as such independently of stimulus salience and fluctuations of threshold levels. This approach can be generalized to successfully detect target presence in other individuals.  相似文献   

14.
White blood cell (WBC) detection plays a vital role in peripheral blood smear analysis. However, cell detection remains a challenging task due to multi-cell adhesion, different staining and imaging conditions. Owing to the powerful feature extraction capability of deep learning, object detection methods based on convolutional neural networks (CNNs) have been widely applied in medical image analysis. Nevertheless, the CNN training is time-consuming and inaccuracy, especially for large-scale blood smear images, where most of the images are background. To address the problem, we propose a two-stage approach that treats WBC detection as a small salient object detection task. In the first saliency detection stage, we use the Itti's visual attention model to locate the regions of interest (ROIs), based on the proposed adaptive center-surround difference (ACSD) operator. In the second WBC detection stage, the modified CenterNet model is performed on ROI sub-images to obtain a more accurate localization and classification result of each WBC. Experimental results showed that our method exceeds the performance of several existing methods on two different data sets, and achieves a state-of-the-art mAP of over 98.8%.  相似文献   

15.
Visual saliency is a fundamental yet hard to define property of objects or locations in the visual world. In a context where objects and their representations compete to dominate our perception, saliency can be thought of as the "juice" that makes objects win the race. It is often assumed that saliency is extracted and represented in an explicit saliency map, which serves to determine the location of spatial attention at any given time. It is then by drawing attention to a salient object that it can be recognized or categorized. I argue against this classical view that visual "bottom-up" saliency automatically recruits the attentional system prior to object recognition. A number of visual processing tasks are clearly performed too fast for such a costly strategy to be employed. Rather, visual attention could simply act by biasing a saliency-based object recognition system. Under natural conditions of stimulation, saliency can be represented implicitly throughout the ventral visual pathway, independent of any explicit saliency map. At any given level, the most activated cells of the neural population simply represent the most salient locations. The notion of saliency itself grows increasingly complex throughout the system, mostly based on luminance contrast until information reaches visual cortex, gradually incorporating information about features such as orientation or color in primary visual cortex and early extrastriate areas, and finally the identity and behavioral relevance of objects in temporal cortex and beyond. Under these conditions the object that dominates perception, i.e. the object yielding the strongest (or the first) selective neural response, is by definition the one whose features are most "salient"--without the need for any external saliency map. In addition, I suggest that such an implicit representation of saliency can be best encoded in the relative times of the first spikes fired in a given neuronal population. In accordance with our subjective experience that saliency and attention do not modify the appearance of objects, the feed-forward propagation of this first spike wave could serve to trigger saliency-based object recognition outside the realm of awareness, while conscious perceptions could be mediated by the remaining discharges of longer neuronal spike trains.  相似文献   

16.
The notion of a saliency-based processing architecture [1] underlying human vision is central to a number of current theories of visual selective attention [e.g., 2]. On this view, focal-attention is guided by an overall-saliency map of the scene, which integrates (sums) signals from pre-attentive sensory feature-contrast computations (e.g., for color, motion, etc.). By linking the Posterior Contralateral Negativity (PCN) component to reaction time (RT) performance, we tested one specific prediction of such salience summation models: expedited shifts of focal-attention to targets with low, as compared to high, target-distracter similarity. For two feature-dimensions (color and orientation), we observed decreasing RTs with increasing target saliency. Importantly, this pattern was systematically mirrored by the timing, as well as amplitude, of the PCN. This pattern demonstrates that visual saliency is a key determinant of the time it takes for focal-attention to be engaged onto the target item, even when it is just a feature singleton.  相似文献   

17.
Coral reefs are rich in fisheries and aquatic resources, and the study and monitoring of coral reef ecosystems are of great economic value and practical significance. Due to complex backgrounds and low-quality videos, it is challenging to identify coral reef fish. This study proposed an image enhancement approach for fish detection in complex underwater environments. The method first uses a Siamese network to obtain a saliency map and then multiplies this saliency map by the input image to construct an image enhancement module. Applying this module to the existing mainstream one-stage and two-stage target detection frameworks can significantly improve their detection accuracy. Good detection performance was achieved in a variety of scenarios, such as those with luminosity variations, aquatic plant movements, blurred images, large targets and multiple targets, demonstrating the robustness of the algorithm. The best performance was achieved on the LCF-15 dataset when combining the proposed method with the cascade region-based convolutional neural network (Cascade-RCNN). The average precision at an intersection-over-union (IoU) threshold of 0.5 (AP50) was 0.843, and the F1 score was 0.817, exceeding the best reported results on this dataset. This study provides an automated video analysis tool for marine-related researchers and technical support for downstream applications.  相似文献   

18.
Saliency maps produced by different algorithms are often evaluated by comparing output to fixated image locations appearing in human eye tracking data. There are challenges in evaluation based on fixation data due to bias in the data. Properties of eye movement patterns that are independent of image content may limit the validity of evaluation results, including spatial bias in fixation data. To address this problem, we present modeling and evaluation results for data derived from different perceptual tasks related to the concept of saliency. We also present a novel approach to benchmarking to deal with some of the challenges posed by spatial bias. The results presented establish the value of alternatives to fixation data to drive improvement and development of models. We also demonstrate an approach to approximate the output of alternative perceptual tasks based on computational saliency and/or eye gaze data. As a whole, this work presents novel benchmarking results and methods, establishes a new performance baseline for perceptual tasks that provide an alternative window into visual saliency, and demonstrates the capacity for saliency to serve in approximating human behaviour for one visual task given data from another.  相似文献   

19.

Background

Accurate evaluation of immunostained histological images is required for reproducible research in many different areas and forms the basis of many clinical decisions. The quality and efficiency of histopathological evaluation is limited by the information content of a histological image, which is primarily encoded as perceivable contrast differences between objects in the image. However, the colors of chromogen and counterstain used for histological samples are not always optimally distinguishable, even under optimal conditions.

Methods and Results

In this study, we present a method to extract the bivariate color map inherent in a given histological image and to retrospectively optimize this color map. We use a novel, unsupervised approach based on color deconvolution and principal component analysis to show that the commonly used blue and brown color hues in Hematoxylin—3,3’-Diaminobenzidine (DAB) images are poorly suited for human observers. We then demonstrate that it is possible to construct improved color maps according to objective criteria and that these color maps can be used to digitally re-stain histological images.

Validation

To validate whether this procedure improves distinguishability of objects and background in histological images, we re-stain phantom images and N = 596 large histological images of immunostained samples of human solid tumors. We show that perceptual contrast is improved by a factor of 2.56 in phantom images and up to a factor of 2.17 in sets of histological tumor images.

Context

Thus, we provide an objective and reliable approach to measure object distinguishability in a given histological image and to maximize visual information available to a human observer. This method could easily be incorporated in digital pathology image viewing systems to improve accuracy and efficiency in research and diagnostics.  相似文献   

20.
In this paper we propose a fast frequency domain saliency detection method that is also biologically plausible, referred to as frequency domain divisive normalization (FDN). We show that the initial feature extraction stage, common to all spatial domain approaches, can be simplified to a Fourier transform with a contourlet-like grouping of coefficients, and saliency detection can be achieved in frequency domain. Specifically, we show that divisive normalization, a model of cortical surround inhibition, can be conducted in frequency domain. Since Fourier coefficients are global in space, we extend to this model by conducting piecewise FDN (PFDN) using overlapping local patches to provide better biological plausibility. Not only do FDN and PFDN outperform current state-of-the-art methods in eye fixation prediction, they are also faster. Speed and simplicity are advantages of our frequency domain approach, and its biological plausibility is the main contribution of our paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号