首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is more robust and sensitive to weak spots. More importantly, it can obtain higher segmentation accuracy in the presence of noise, artifacts and weakly expressed spots compared with the other four methods.  相似文献   

2.
MOTIVATION: We present a new approach to the analysis of images for complementary DNA microarray experiments. The image segmentation and intensity estimation are performed simultaneously by adopting a two-component mixture model. One component of this mixture corresponds to the distribution of the background intensity, while the other corresponds to the distribution of the foreground intensity. The intensity measurement is a bivariate vector consisting of red and green intensities. The background intensity component is modeled by the bivariate gamma distribution, whose marginal densities for the red and green intensities are independent three-parameter gamma distributions with different parameters. The foreground intensity component is taken to be the bivariate t distribution, with the constraint that the mean of the foreground is greater than that of the background for each of the two colors. The degrees of freedom of this t distribution are inferred from the data but they could be specified in advance to reduce the computation time. Also, the covariance matrix is not restricted to being diagonal and so it allows for nonzero correlation between R and G foreground intensities. This gamma-t mixture model is fitted by maximum likelihood via the EM algorithm. A final step is executed whereby nonparametric (kernel) smoothing is undertaken of the posterior probabilities of component membership. The main advantages of this approach are: (1) it enjoys the well-known strengths of a mixture model, namely flexibility and adaptability to the data; (2) it considers the segmentation and intensity simultaneously and not separately as in commonly used existing software, and it also works with the red and green intensities in a bivariate framework as opposed to their separate estimation via univariate methods; (3) the use of the three-parameter gamma distribution for the background red and green intensities provides a much better fit than the normal (log normal) or t distributions; (4) the use of the bivariate t distribution for the foreground intensity provides a model that is less sensitive to extreme observations; (5) as a consequence of the aforementioned properties, it allows segmentation to be undertaken for a wide range of spot shapes, including doughnut, sickle shape and artifacts. RESULTS: We apply our method for gridding, segmentation and estimation to cDNA microarray real images and artificial data. Our method provides better segmentation results in spot shapes as well as intensity estimation than Spot and spotSegmentation R language softwares. It detected blank spots as well as bright artifact for the real data, and estimated spot intensities with high-accuracy for the synthetic data. AVAILABILITY: The algorithms were implemented in Matlab. The Matlab codes implementing both the gridding and segmentation/estimation are available upon request. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.  相似文献   

3.
Image and statistical analysis are two important stages of cDNA microarrays. Of these, gridding is necessary to accurately identify the location of each spot while extracting spot intensities from the microarray images and automating this procedure permits high-throughput analysis. Due to the deficiencies of the equipment used to print the arrays, rotations, misalignments, high contamination with noise and artifacts, and the enormous amount of data generated, solving the gridding problem by means of an automatic system is not trivial. Existing techniques to solve the automatic grid segmentation problem cover only limited aspects of this challenging problem and require the user to specify the size of the spots, the number of rows and columns in the grid, and boundary conditions. In this paper, a hill-climbing automatic gridding and spot quantification technique is proposed which takes a microarray image (or a subgrid) as input and makes no assumptions about the size of the spots, rows, and columns in the grid. The proposed method is based on a hill-climbing approach that utilizes different objective functions. The method has been found to effectively detect the grids on microarray images drawn from databases from GEO and the Stanford genomic laboratories.  相似文献   

4.
Improving gene quantification by adjustable spot-image restoration   总被引:1,自引:0,他引:1  
MOTIVATION: One of the major factors that complicate the task of microarray image analysis is that microarray images are distorted by various types of noise. In this study a robust framework is proposed, designed to take into account the effect of noise in microarray images in order to assist the demanding task of microarray image analysis. The proposed framework, incorporates in the microarray image processing pipeline a novel combination of spot adjustable image analysis and processing techniques and consists of the following stages: (1) gridding for facilitating spot identification, (2) clustering (unsupervised discrimination between spot and background pixels) applied to spot image for automatic local noise assessment, (3) modeling of local image restoration process for spot image conditioning (adjustable wiener restoration using an empirically determined degradation function), (4) automatic spot segmentation employing seeded-region-growing, (5) intensity extraction and (6) assessment of the reproducibility (real data) and the validity (simulated data) of the extracted gene expression levels. RESULTS: Both simulated and real microarray images were employed in order to assess the performance of the proposed framework against well-established methods implemented in publicly available software packages (Scanalyze and SPOT). Regarding simulated images, the novel combination of techniques, introduced in the proposed framework, rendered the detection of spot areas and the extraction of spot intensities more accurate. Furthermore, on real images the proposed framework proved of better stability across replicates. Results indicate that the proposed framework improves spots' segmentation and, consequently, quantification of gene expression levels. AVAILABILITY: All algorithms were implemented in Matlab (The Mathworks, Inc., Natick, MA, USA) environment. The codes that implement microarray gridding, adaptive spot restoration and segmentation/intensity extraction are available upon request. Supplementary results and the simulated microarray images used in this study are available for download from: ftp://users:bioinformatics@mipa.med.upatras.gr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
A comparison of background correction methods for two-colour microarrays   总被引:7,自引:0,他引:7  
MOTIVATION: Microarray data must be background corrected to remove the effects of non-specific binding or spatial heterogeneity across the array, but this practice typically causes other problems such as negative corrected intensities and high variability of low intensity log-ratios. Different estimators of background, and various model-based processing methods, are compared in this study in search of the best option for differential expression analyses of small microarray experiments. RESULTS: Using data where some independent truth in gene expression is known, eight different background correction alternatives are compared, in terms of precision and bias of the resulting gene expression measures, and in terms of their ability to detect differentially expressed genes as judged by two popular algorithms, SAM and limma eBayes. A new background processing method (normexp) is introduced which is based on a convolution model. The model-based correction methods are shown to be markedly superior to the usual practice of subtracting local background estimates. Methods which stabilize the variances of the log-ratios along the intensity range perform the best. The normexp+offset method is found to give the lowest false discovery rate overall, followed by morph and vsn. Like vsn, normexp is applicable to most types of two-colour microarray data. AVAILABILITY: The background correction methods compared in this article are available in the R package limma (Smyth, 2005) from http://www.bioconductor.org. SUPPLEMENTARY INFORMATION: Supplementary data are available from http://bioinf.wehi.edu.au/resources/webReferences.html.  相似文献   

6.
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi  相似文献   

7.
Automatic analysis of DNA microarray images using mathematical morphology   总被引:10,自引:0,他引:10  
MOTIVATION: DNA microarrays are an experimental technology which consists in arrays of thousands of discrete DNA sequences that are printed on glass microscope slides. Image analysis is an important aspect of microarray experiments. The aim of this step is to reduce an image of spots into a table with a measure of the intensity for each spot. Efficient, accurate and automatic analysis of DNA spot images is essential in order to use this technology in laboratory routines. RESULTS: We present an automatic non-supervised set of algorithms for a fast and accurate spot data extraction from DNA microarrays using morphological operators which are robust to both intensity variation and artefacts. The approach can be summarised as follows. Initially, a gridding algorithm yields the automatic segmentation of the microarray image into spot quadrants which are later individually analysed. Then the analysis of the spot quadrant images is achieved in five steps. First, a pre-quantification, the spot size distribution law is calculated. Second, the background noise extraction is performed using a morphological filtering by area. Third, an orthogonal grid provides the first approach to the spot locus. Fourth, the spot segmentation or spot boundaries definition is carried out using the watershed transformation. And fifth, the outline of detected spots allows the signal quantification or spot intensities extraction; in this respect, a noise model has been investigated. The performance of the algorithm has been compared with two packages: ScanAlyze and Genepix, showing its robustness and precision.  相似文献   

8.
MOTIVATION: Microarray images challenge existing analytical methods in many ways given that gene spots are often comprised of characteristic imperfections. Irregular contours, donut shapes, artifacts, and low or heterogeneous expression impair corresponding values for red and green intensities as well as their ratio R/G. New approaches are needed to ensure accurate data extraction from these images. RESULTS: Herein we introduce a novel method for intensity assessment of gene spots. The technique is based on clustering pixels of a target area into foreground and background. For this purpose we implemented two clustering algorithms derived from k-means and Partitioning Around Medoids (PAM), respectively. Results from the analysis of real gene spots indicate that our approach performs superior to other existing analytical methods. This is particularly true for spots generally considered as problematic due to imperfections or almost absent expression. Both PX(PAM) and PX(KMEANS) prove to be highly robust against various types of artifacts through adaptive partitioning, which more correctly assesses expression intensity values. AVAILABILITY: The implementation of this method is a combination of two complementary tools Extractiff (Java) and Pixclust (free statistical language R), which are available upon request from the authors.  相似文献   

9.
Mfuzz: a software package for soft clustering of microarray data   总被引:1,自引:0,他引:1  
For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. AVAILABILITY: The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license.  相似文献   

10.
MOTIVATION: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. RESULTS: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. AVAILABILITY: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org  相似文献   

11.

Background

Processing cDNA microarray images is a crucial step in gene expression analysis, since any errors in early stages affect subsequent steps, leading to possibly erroneous biological conclusions. When processing the underlying images, accurately separating the sub-grids and spots is extremely important for subsequent steps that include segmentation, quantification, normalization and clustering.

Results

We propose a parameterless and fully automatic approach that first detects the sub-grids given the entire microarray image, and then detects the locations of the spots in each sub-grid. The approach, first, detects and corrects rotations in the images by applying an affine transformation, followed by a polynomial-time optimal multi-level thresholding algorithm used to find the positions of the sub-grids in the image and the positions of the spots in each sub-grid. Additionally, a new validity index is proposed in order to find the correct number of sub-grids in the image, and the correct number of spots in each sub-grid. Moreover, a refinement procedure is used to correct possible misalignments and increase the accuracy of the method.

Conclusions

Extensive experiments on real-life microarray images and a comparison to other methods show that the proposed method performs these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approach can be used in various type of microarray images with different resolutions and spot sizes and does not need any parameter to be adjusted.  相似文献   

12.
limmaGUI: a graphical user interface for linear modeling of microarray data   总被引:15,自引:0,他引:15  
SUMMARY: limmaGUI is a graphical user interface (GUI) based on R-Tcl/Tk for the exploration and linear modeling of data from two-color spotted microarray experiments, especially the assessment of differential expression in complex experiments. limmaGUI provides an interface to the statistical methods of the limma package for R, and is itself implemented as an R package. The software provides point and click access to a range of methods for background correction, graphical display, normalization, and analysis of microarray data. Arbitrarily complex microarray experiments involving multiple RNA sources can be accomodated using linear models and contrasts. Empirical Bayes shrinkage of the gene-wise residual variances is provided to ensure stable results even when the number of arrays is small. Integrated support is provided for quantitative spot quality weights, control spots, within-array replicate spots and multiple testing. limmaGUI is available for most platforms on the which R runs including Windows, Mac and most flavors of Unix. AVAILABILITY: http://bioinf.wehi.edu.au/limmaGUI.  相似文献   

13.
The large variety of clustering algorithms and their variants can be daunting to researchers wishing to explore patterns within their microarray datasets. Furthermore, each clustering method has distinct biases in finding patterns within the data, and clusterings may not be reproducible across different algorithms. A consensus approach utilizing multiple algorithms can show where the various methods agree and expose robust patterns within the data. In this paper, we present a software package - Consense, written for R/Bioconductor - that utilizes such an approach to explore microarray datasets. Consense produces clustering results for each of the clustering methods and produces a report of metrics comparing the individual clusterings. A feature of Consense is identification of genes that cluster consistently with an index gene across methods. Utilizing simulated microarray data, sensitivity of the metrics to the biases of the different clustering algorithms is explored. The framework is easily extensible, allowing this tool to be used by other functional genomic data types, as well as other high-throughput OMICS data types generated from metabolomic and proteomic experiments. It also provides a flexible environment to benchmark new clustering algorithms. Consense is currently available as an installable R/Bioconductor package (http://www.ohsucancer.com/isrdev/consense/).  相似文献   

14.
The Graphical Query Language (GQL) is a set of tools for the analysis of gene expression time-courses. They allow a user to pre-process the data, to query it for interesting patterns, to perform model-based clustering or mixture estimation, to include subsequent refinements of clusters and, finally, to use other biological resources to evaluate the results. Analyses are carried out in a graphical and interactive environment, allowing expert intervention in all stages of the data analysis. AVAILABILITY: The GQL package is freely available under the GNU general public license (GPL) at http://www.ghmm.org/gql  相似文献   

15.
IlluminaGUI is a graphical user interface implemented for analyzing microarray data from the Illumina BeadChip platform. All key components of a microarray experiment, including quality control, normalization, inference and classification methods are provided in a 'point and click' approach. IlluminaGUI is implemented as a R package based on the R-Tcl/Tk interface and is available for platforms on which R runs including Windows, Mac and Unix-type machines. AVAILABILITY: http://IlluminaGUI.dnsalias.org  相似文献   

16.
CRCView is a user-friendly point-and-click web server for analyzing and visualizing microarray gene expression data using a Dirichlet process mixture model-based clustering algorithm. CRCView is designed to clustering genes based on their expression profiles. It allows flexible input data format, rich graphical illustration as well as integrated GO term based annotation/interpretation of clustering results. Availability: http://helab.bioinformatics.med.umich.edu/crcview/.  相似文献   

17.
Segmentation of cDNA microarray spots using markov random field modeling   总被引:3,自引:3,他引:0  
Motivation: Spot segmentation is a critical step in microarraygene expression data analysis. Therefore, the performance ofsegmentation may substantially affect the results of subsequentstages of the analysis, such as the detection of differentiallyexpressed genes. Several methods have been developed to segmentmicroarray spots from the surrounding background. In this study,we have proposed a new approach based on Markov random field(MRF) modeling and tested its performance on simulated and realmicroarray images against a widely used segmentation methodbased on Mann–Whitney test adopted by QuantArray software(Boston, MA). Spot addressing was performed using QuantArray.We have also devised a simulation method to generate microarrayimages with realistic features. Such images can be used as goldstandards for the purposes of testing and comparing differentsegmentation methods, and optimizing segmentation parameters. Results: Experiments on simulated and 14 actual microarray imagesets show that the proposed MRF-based segmentation method candetect spot areas and estimate spot intensities with higheraccuracy. Availability: The algorithms were implemented in MatlabTM (TheMathworks, Inc., Natick, MA) environment. The codes for MRF-basedsegmentation and image simulation methods are available uponrequest. Contact: demirkaya{at}ieee.org  相似文献   

18.
MOTIVATION: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. RESULTS: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression. AVAILABILITY: GaneSh, a Java package for coclustering, is available under the terms of the GNU General Public License from our website at http://bioinformatics.psb.ugent.be/software  相似文献   

19.
pcaMethods is a Bioconductor compliant library for computing principal component analysis (PCA) on incomplete data sets. The results can be analyzed directly or used to estimate missing values to enable the use of missing value sensitive statistical methods. The package was mainly developed with microarray and metabolite data sets in mind, but can be applied to any other incomplete data set as well. AVAILABILITY: http://www.bioconductor.org  相似文献   

20.
While meta-analysis provides a powerful tool for analyzing microarray experiments by combining data from multiple studies, it presents unique computational challenges. The Bioconductor package RankProd provides a new and intuitive tool for this purpose in detecting differentially expressed genes under two experimental conditions. The package modifies and extends the rank product method proposed by Breitling et al., [(2004) FEBS Lett., 573, 83-92] to integrate multiple microarray studies from different laboratories and/or platforms. It offers several advantages over t-test based methods and accepts pre-processed expression datasets produced from a wide variety of platforms. The significance of the detection is assessed by a non-parametric permutation test, and the associated P-value and false discovery rate (FDR) are included in the output alongside the genes that are detected by user-defined criteria. A visualization plot is provided to view actual expression levels for each gene with estimated significance measurements. AVAILABILITY: RankProd is available at Bioconductor http://www.bioconductor.org. A web-based interface will soon be available at http://cactus.salk.edu/RankProd  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号