首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is more robust and sensitive to weak spots. More importantly, it can obtain higher segmentation accuracy in the presence of noise, artifacts and weakly expressed spots compared with the other four methods.  相似文献   

2.
Little consideration has been given to the effect of different segmentation methods on the variability of data derived from microarray images. Previous work has suggested that the significant source of variability from microarray image analysis is from estimation of local background. In this study, we used Analysis of Variance (ANOVA) models to investigate the effect of methods of segmentation on the precision of measurements obtained from replicate microarray experiments. We used four different methods of spot segmentation (adaptive, fixed circle, histogram and GenePix) to analyse a total number of 156 172 spots from 12 microarray experiments. Using a two-way ANOVA model and the coefficient of repeatability, we show that the method of segmentation significantly affects the precision of the microarray data. The histogram method gave the lowest variability across replicate spots compared to other methods, and had the lowest pixel-to-pixel variability within spots. This effect on precision was independent of background subtraction. We show that these findings have direct, practical implications as the variability in precision between the four methods resulted in different numbers of genes being identified as differentially expressed. Segmentation method is an important source of variability in microarray data that directly affects precision and the identification of differentially expressed genes.  相似文献   

3.
Spot Detection and Image Segmentation in DNA Microarray Data   总被引:3,自引:0,他引:3  
Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.  相似文献   

4.
A new integrated image analysis package with quantitative quality control schemes is described for cDNA microarray technology. The package employs an iterative algorithm that utilizes both intensity characteristics and spatial information of the spots on a microarray image for signal–background segmentation and defines five quality scores for each spot to record irregularities in spot intensity, size and background noise levels. A composite score qcom is defined based on these individual scores to give an overall assessment of spot quality. Using qcom we demonstrate that the inherent variability in intensity ratio measurements is closely correlated with spot quality, namely spots with higher quality give less variable measurements and vice versa. In addition, gauging data by qcom can improve data reliability dramatically and efficiently. We further show that the variability in ratio measurements drops exponentially with increasing qcom and, for the majority of spots at the high quality end, this improvement is mainly due to an improvement in correlation between the two dyes. Based on these studies, we discuss the potential of quantitative quality control for microarray data and the possibility of filtering and normalizing microarray data using a quality metrics-dependent scheme.  相似文献   

5.
MOTIVATION: Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. The first step applies model-based clustering to the distribution of pixel intensities, using the Bayesian Information Criterion (BIC) to choose the number of groups up to a maximum of three. The second step is spatial, finding the large spatially connected components in each cluster of pixels. The method thus combines the strengths of the histogram-based and spatial approaches. It deals effectively with inner holes in spots and with artifacts. It also provides a formal inferential basis for deciding when the spot is blank, namely when the BIC favors one group over two or three. RESULTS: We apply our methods for gridding and segmentation to cDNA microarray images from an HIV infection experiment. In these experiments, our method had better stability across replicates than a fixed-circle segmentation method or the seeded region growing method in the SPOT software, without introducing noticeable bias when estimating the intensities of differentially expressed genes. AVAILABILITY: spotSegmentation, an R language package implementing both the gridding and segmentation methods is available through the Bioconductor project (http://www.bioconductor.org). The segmentation method requires the contributed R package MCLUST for model-based clustering (http://cran.us.r-project.org). CONTACT: fraley@stat.washington.edu.  相似文献   

6.
7.

Background

Processing cDNA microarray images is a crucial step in gene expression analysis, since any errors in early stages affect subsequent steps, leading to possibly erroneous biological conclusions. When processing the underlying images, accurately separating the sub-grids and spots is extremely important for subsequent steps that include segmentation, quantification, normalization and clustering.

Results

We propose a parameterless and fully automatic approach that first detects the sub-grids given the entire microarray image, and then detects the locations of the spots in each sub-grid. The approach, first, detects and corrects rotations in the images by applying an affine transformation, followed by a polynomial-time optimal multi-level thresholding algorithm used to find the positions of the sub-grids in the image and the positions of the spots in each sub-grid. Additionally, a new validity index is proposed in order to find the correct number of sub-grids in the image, and the correct number of spots in each sub-grid. Moreover, a refinement procedure is used to correct possible misalignments and increase the accuracy of the method.

Conclusions

Extensive experiments on real-life microarray images and a comparison to other methods show that the proposed method performs these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approach can be used in various type of microarray images with different resolutions and spot sizes and does not need any parameter to be adjusted.  相似文献   

8.
Image and statistical analysis are two important stages of cDNA microarrays. Of these, gridding is necessary to accurately identify the location of each spot while extracting spot intensities from the microarray images and automating this procedure permits high-throughput analysis. Due to the deficiencies of the equipment used to print the arrays, rotations, misalignments, high contamination with noise and artifacts, and the enormous amount of data generated, solving the gridding problem by means of an automatic system is not trivial. Existing techniques to solve the automatic grid segmentation problem cover only limited aspects of this challenging problem and require the user to specify the size of the spots, the number of rows and columns in the grid, and boundary conditions. In this paper, a hill-climbing automatic gridding and spot quantification technique is proposed which takes a microarray image (or a subgrid) as input and makes no assumptions about the size of the spots, rows, and columns in the grid. The proposed method is based on a hill-climbing approach that utilizes different objective functions. The method has been found to effectively detect the grids on microarray images drawn from databases from GEO and the Stanford genomic laboratories.  相似文献   

9.
Quantifying interactions in DNA microarrays is of central importance for a better understanding of their functioning. Hybridization thermodynamics for nucleic acid strands in aqueous solution can be described by the so-called nearest neighbor model, which estimates the hybridization free energy of a given sequence as a sum of dinucleotide terms. Compared with its solution counterparts, hybridization in DNA microarrays may be hindered due to the presence of a solid surface and of a high density of DNA strands. We present here a study aimed at the determination of hybridization free energies in DNA microarrays. Experiments are performed on custom Agilent slides. The solution contains a single oligonucleotide. The microarray contains spots with a perfect matching (PM) complementary sequence and other spots with one or two mismatches (MM) : in total 1006 different probe spots, each replicated 15 times per microarray. The free energy parameters are directly fitted from microarray data. The experiments demonstrate a clear correlation between hybridization free energies in the microarray and in solution. The experiments are fully consistent with the Langmuir model at low intensities, but show a clear deviation at intermediate (non-saturating) intensities. These results provide new interesting insights for the quantification of molecular interactions in DNA microarrays.  相似文献   

10.
Automatic analysis of DNA microarray images using mathematical morphology   总被引:10,自引:0,他引:10  
MOTIVATION: DNA microarrays are an experimental technology which consists in arrays of thousands of discrete DNA sequences that are printed on glass microscope slides. Image analysis is an important aspect of microarray experiments. The aim of this step is to reduce an image of spots into a table with a measure of the intensity for each spot. Efficient, accurate and automatic analysis of DNA spot images is essential in order to use this technology in laboratory routines. RESULTS: We present an automatic non-supervised set of algorithms for a fast and accurate spot data extraction from DNA microarrays using morphological operators which are robust to both intensity variation and artefacts. The approach can be summarised as follows. Initially, a gridding algorithm yields the automatic segmentation of the microarray image into spot quadrants which are later individually analysed. Then the analysis of the spot quadrant images is achieved in five steps. First, a pre-quantification, the spot size distribution law is calculated. Second, the background noise extraction is performed using a morphological filtering by area. Third, an orthogonal grid provides the first approach to the spot locus. Fourth, the spot segmentation or spot boundaries definition is carried out using the watershed transformation. And fifth, the outline of detected spots allows the signal quantification or spot intensities extraction; in this respect, a noise model has been investigated. The performance of the algorithm has been compared with two packages: ScanAlyze and Genepix, showing its robustness and precision.  相似文献   

11.
MOTIVATION: We present a new approach to the analysis of images for complementary DNA microarray experiments. The image segmentation and intensity estimation are performed simultaneously by adopting a two-component mixture model. One component of this mixture corresponds to the distribution of the background intensity, while the other corresponds to the distribution of the foreground intensity. The intensity measurement is a bivariate vector consisting of red and green intensities. The background intensity component is modeled by the bivariate gamma distribution, whose marginal densities for the red and green intensities are independent three-parameter gamma distributions with different parameters. The foreground intensity component is taken to be the bivariate t distribution, with the constraint that the mean of the foreground is greater than that of the background for each of the two colors. The degrees of freedom of this t distribution are inferred from the data but they could be specified in advance to reduce the computation time. Also, the covariance matrix is not restricted to being diagonal and so it allows for nonzero correlation between R and G foreground intensities. This gamma-t mixture model is fitted by maximum likelihood via the EM algorithm. A final step is executed whereby nonparametric (kernel) smoothing is undertaken of the posterior probabilities of component membership. The main advantages of this approach are: (1) it enjoys the well-known strengths of a mixture model, namely flexibility and adaptability to the data; (2) it considers the segmentation and intensity simultaneously and not separately as in commonly used existing software, and it also works with the red and green intensities in a bivariate framework as opposed to their separate estimation via univariate methods; (3) the use of the three-parameter gamma distribution for the background red and green intensities provides a much better fit than the normal (log normal) or t distributions; (4) the use of the bivariate t distribution for the foreground intensity provides a model that is less sensitive to extreme observations; (5) as a consequence of the aforementioned properties, it allows segmentation to be undertaken for a wide range of spot shapes, including doughnut, sickle shape and artifacts. RESULTS: We apply our method for gridding, segmentation and estimation to cDNA microarray real images and artificial data. Our method provides better segmentation results in spot shapes as well as intensity estimation than Spot and spotSegmentation R language softwares. It detected blank spots as well as bright artifact for the real data, and estimated spot intensities with high-accuracy for the synthetic data. AVAILABILITY: The algorithms were implemented in Matlab. The Matlab codes implementing both the gridding and segmentation/estimation are available upon request. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.  相似文献   

12.
We developed an R function named "microarray outlier filter" (MOF) to assist in the identification of faUed arrays. In sorting a group of similar arrays by the likelihood of failure, two statistical indices were employed: the correlation coefficient and the percentage of outlier spots. MOF can be used to monitor the quality of microarray data for both trouble shooting, and to eliminate bad datasets from downstream analysis. The function is freely avaliable at http://www.wriwindber.org/ applications/mof/.  相似文献   

13.
Pok G  Liu JC  Ryu KH 《Bioinformation》2010,4(8):385-389
The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results.  相似文献   

14.
We describe a probabilistic approach to simultaneous image segmentation and intensity estimation for complementary DNA microarray experiments. The approach overcomes several limitations of existing methods. In particular, it (a) uses a flexible Markov random field approach to segmentation that allows for a wider range of spot shapes than existing methods, including relatively common 'doughnut-shaped' spots; (b) models the image directly as background plus hybridization intensity, and estimates the two quantities simultaneously, avoiding the common logical error that estimates of foreground may be less than those of the corresponding background if the two are estimated separately; and (c) uses a probabilistic modeling approach to simultaneously perform segmentation and intensity estimation, and to compute spot quality measures. We describe two approaches to parameter estimation: a fast algorithm, based on the expectation-maximization and the iterated conditional modes algorithms, and a fully Bayesian framework. These approaches produce comparable results, and both appear to offer some advantages over other methods. We use an HIV experiment to compare our approach to two commercial software products: Spot and Arrayvision.  相似文献   

15.
16.

Background  

Complementary DNA (cDNA) microarrays are a well established technology for studying gene expression. A microarray image is obtained by laser scanning a hybridized cDNA microarray, which consists of thousands of spots representing chains of cDNA sequences, arranged in a two-dimensional array. The separation of the spots into distinct cells is widely known as microarray image gridding.  相似文献   

17.
MOTIVATION: Although numerous algorithms have been developed for microarray segmentation, extensive comparisons between the algorithms have acquired far less attention. In this study, we evaluate the performance of nine microarray segmentation algorithms. Using both simulated and real microarray experiments, we overcome the challenges in performance evaluation, arising from the lack of ground-truth information. The usage of simulated experiments allows us to analyze the segmentation accuracy on a single pixel level as is commonly done in traditional image processing studies. With real experiments, we indirectly measure the segmentation performance, identify significant differences between the algorithms, and study the characteristics of the resulting gene expression data. RESULTS: Overall, our results show clear differences between the algorithms. The results demonstrate how the segmentation performance depends on the image quality, which algorithms operate on significantly different performance levels, and how the selection of a segmentation algorithm affects the identification of differentially expressed genes. AVAILABILITY: Supplementary results and the microarray images used in this study are available at the companion web site http://www.cs.tut.fi/sgn/csb/spotseg/  相似文献   

18.
Limiting amounts of RNA is a major issue in cDNA microarray, especially when one is dealing with fresh tissue samples. Here we describe a protocol based on template switch and T7 amplification that led to efficient and linear amplification of 1300x. Using a glass-array containing 368 genes printed in three or six replicas covering a wide range of expression levels and ratios, we determined quality and reproducibility of the data obtained from one nonamplified and two independently amplified RNAs (aRNA) derived from normal and tumor samples using replicas with dye exchange (dye-swap measurements). Overall, signal-to-noise ratio improved when we used aRNA (1.45-fold for channel 1 and 2.02-fold for channel 2), increasing by 6% the number of spots with meaningful data. Measurements arising from independent aRNA samples showed strong correlation among themselves (r(2)=0.962) and with those from the nonamplified sample (r(2)=0.975), indicating the reproducibility and fidelity of the amplification procedure. Measurement differences, i.e, spots with poor correlation between amplified and nonamplified measurements, did not show association with gene sequence, expression intensity, or expression ratio and can, therefore, be compensated with replication. In conclusion, aRNA can be used routinely in cDNA microarray analysis, leading to improved quality of data with high fidelity and reproducibility.  相似文献   

19.
Segmentation of cDNA microarray spots using markov random field modeling   总被引:3,自引:3,他引:0  
Motivation: Spot segmentation is a critical step in microarraygene expression data analysis. Therefore, the performance ofsegmentation may substantially affect the results of subsequentstages of the analysis, such as the detection of differentiallyexpressed genes. Several methods have been developed to segmentmicroarray spots from the surrounding background. In this study,we have proposed a new approach based on Markov random field(MRF) modeling and tested its performance on simulated and realmicroarray images against a widely used segmentation methodbased on Mann–Whitney test adopted by QuantArray software(Boston, MA). Spot addressing was performed using QuantArray.We have also devised a simulation method to generate microarrayimages with realistic features. Such images can be used as goldstandards for the purposes of testing and comparing differentsegmentation methods, and optimizing segmentation parameters. Results: Experiments on simulated and 14 actual microarray imagesets show that the proposed MRF-based segmentation method candetect spot areas and estimate spot intensities with higheraccuracy. Availability: The algorithms were implemented in MatlabTM (TheMathworks, Inc., Natick, MA) environment. The codes for MRF-basedsegmentation and image simulation methods are available uponrequest. Contact: demirkaya{at}ieee.org  相似文献   

20.

Background

The analysis of gene expression data shows that many genes display similarity in their expression profiles suggesting some co-regulation. Here, we investigated the co-expression patterns in gene expression data and proposed a correlation-based research method to stratify individuals.

Methodology/Principal Findings

Using blood from rheumatoid arthritis (RA) patients, we investigated the gene expression profiles from whole blood using Affymetrix microarray technology. Co-expressed genes were analyzed by a biclustering method, followed by gene ontology analysis of the relevant biclusters. Taking the type I interferon (IFN) pathway as an example, a classification algorithm was developed from the 102 RA patients and extended to 10 systemic lupus erythematosus (SLE) patients and 100 healthy volunteers to further characterize individuals. We developed a correlation-based algorithm referred to as Classification Algorithm Based on a Biological Signature (CABS), an alternative to other approaches focused specifically on the expression levels. This algorithm applied to the expression of 35 IFN-related genes showed that the IFN signature presented a heterogeneous expression between RA, SLE and healthy controls which could reflect the level of global IFN signature activation. Moreover, the monitoring of the IFN-related genes during the anti-TNF treatment identified changes in type I IFN gene activity induced in RA patients.

Conclusions

In conclusion, we have proposed an original method to analyze genes sharing an expression pattern and a biological function showing that the activation levels of a biological signature could be characterized by its overall state of correlation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号