首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
limmaGUI: a graphical user interface for linear modeling of microarray data   总被引:15,自引:0,他引:15  
SUMMARY: limmaGUI is a graphical user interface (GUI) based on R-Tcl/Tk for the exploration and linear modeling of data from two-color spotted microarray experiments, especially the assessment of differential expression in complex experiments. limmaGUI provides an interface to the statistical methods of the limma package for R, and is itself implemented as an R package. The software provides point and click access to a range of methods for background correction, graphical display, normalization, and analysis of microarray data. Arbitrarily complex microarray experiments involving multiple RNA sources can be accomodated using linear models and contrasts. Empirical Bayes shrinkage of the gene-wise residual variances is provided to ensure stable results even when the number of arrays is small. Integrated support is provided for quantitative spot quality weights, control spots, within-array replicate spots and multiple testing. limmaGUI is available for most platforms on the which R runs including Windows, Mac and most flavors of Unix. AVAILABILITY: http://bioinf.wehi.edu.au/limmaGUI.  相似文献   

2.
In this article we describe a new Bioconductor package 'CALIB' for normalization of two-color microarray data. This approach is based on the measurements of external controls and estimates an absolute target level for each gene and condition pair, as opposed to working with log-ratios as a relative measure of expression. Moreover, this method makes no assumptions regarding the distribution of gene expression divergence. AVAILABILITY: http://bioconductor.org/packages/2.0/bioc Open Source.  相似文献   

3.
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi  相似文献   

4.
Combinatorial image analysis of DNA microarray features   总被引:3,自引:0,他引:3  
MOTIVATION: DNA and protein microarrays have become an established leading-edge technology for large-scale analysis of gene and protein content and activity. Contact-printed microarrays has emerged as a relatively simple and cost effective method of choice but its reliability is especially susceptible to quality of pixel information obtained from digital scans of spotted features in the microarray image. RESULTS: We address the statistical computation requirements for optimizing data acquisition and processing of digital scans. We consider the use of median filters to reduce noise levels in images and top-hat filters to correct for trends in background values. We also consider, as alternative estimators of spot intensity, discs of fixed radius, proportions of histograms and k-means clustering, either with or without a square-root intensity transformation and background subtraction. We identify, using combinatoric procedures, optimal filter and estimator parameters, in achieving consistency among the replicates of a gene on each microarray. Our results, using test data from microarrays of HCMV, indicate that a highly effective approach for improving reliability and quality of microarray data is to apply a 21 by 21 top-hat filter, then estimate spot intensity as the mean of the largest 20% of pixel values in the target region, after a square-root transformation, and corrected for background, by subtracting the mean of the smallest 70% of pixel values. AVAILABILITY: Fortran90 subroutines implementing these methods are available from the authors, or at http://www.bioss.ac.uk/~chris.  相似文献   

5.
MOTIVATION: Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. The first step applies model-based clustering to the distribution of pixel intensities, using the Bayesian Information Criterion (BIC) to choose the number of groups up to a maximum of three. The second step is spatial, finding the large spatially connected components in each cluster of pixels. The method thus combines the strengths of the histogram-based and spatial approaches. It deals effectively with inner holes in spots and with artifacts. It also provides a formal inferential basis for deciding when the spot is blank, namely when the BIC favors one group over two or three. RESULTS: We apply our methods for gridding and segmentation to cDNA microarray images from an HIV infection experiment. In these experiments, our method had better stability across replicates than a fixed-circle segmentation method or the seeded region growing method in the SPOT software, without introducing noticeable bias when estimating the intensities of differentially expressed genes. AVAILABILITY: spotSegmentation, an R language package implementing both the gridding and segmentation methods is available through the Bioconductor project (http://www.bioconductor.org). The segmentation method requires the contributed R package MCLUST for model-based clustering (http://cran.us.r-project.org). CONTACT: fraley@stat.washington.edu.  相似文献   

6.
Despite the tremendous growth of microarray usage in scientific studies, there is a lack of standards for background correction methodologies, especially in single-color microarray platforms. Traditional background subtraction methods often generate negative signals and thus cause large amounts of data loss. Hence, some researchers prefer to avoid background corrections, which typically result in the underestimation of differential expression. Here, by utilizing nonspecific negative control features integrated into Illumina whole genome expression arrays, we have developed a method of model-based background correction for BeadArrays (MBCB). We compared the MBCB with a method adapted from the Affymetrix robust multi-array analysis algorithm and with no background subtraction, using a mouse acute myeloid leukemia (AML) dataset. We demonstrated that differential expression ratios obtained by using the MBCB had the best correlation with quantitative RT–PCR. MBCB also achieved better sensitivity in detecting differentially expressed genes with biological significance. For example, we demonstrated that the differential regulation of Tnfr2, Ikk and NF-kappaB, the death receptor pathway, in the AML samples, could only be detected by using data after MBCB implementation. We conclude that MBCB is a robust background correction method that will lead to more precise determination of gene expression and better biological interpretation of Illumina BeadArray data.  相似文献   

7.
CRCView is a user-friendly point-and-click web server for analyzing and visualizing microarray gene expression data using a Dirichlet process mixture model-based clustering algorithm. CRCView is designed to clustering genes based on their expression profiles. It allows flexible input data format, rich graphical illustration as well as integrated GO term based annotation/interpretation of clustering results. Availability: http://helab.bioinformatics.med.umich.edu/crcview/.  相似文献   

8.
Methods for identifying differentially expressed genes were compared on time-series microarray data simulated from artificial gene networks. Select methods were further analyzed on existing immune response data of Boldrick et al. (2002, Proc. Natl. Acad. Sci. USA 99, 972-977). Based on the simulations, we recommend the ANOVA variants of Cui and Churchill. Efron and Tibshirani's empirical Bayes Wilcoxon rank sum test is recommended when the background cannot be effectively corrected. Our proposed GSVD-based differential expression method was shown to detect subtle changes. ANOVA combined with GSVD was consistent on background-normalized simulation data. GSVD with empirical Bayes was consistent without background correction. Based on the Boldrick et al. data, ANOVA is best suited to detect changes in temporal data, while GSVD and empirical Bayes effectively detect individual spikes or overall shifts, respectively. For methods tested on simulation data, lowess after background correction improved results. On simulation data without background correction, lowess decreased performance compared to median centering.  相似文献   

9.
MOTIVATION: High-throughput microarray technologies enable measurements of the expression levels of thousands of genes in parallel. However, microarray printing, hybridization and washing may create substantial variability in the quality of the data. As erroneous measurements may have a drastic impact on the results by disturbing the normalization schemes and by introducing expression patterns that lead to incorrect conclusions, it is crucial to discard low quality observations in the early phases of a microarray experiment. A typical microarray experiment consists of tens of thousands of spots on a microarray, making manual extraction of poor quality spots impossible. Thus, there is a need for a reliable and general microarray spot quality control strategy. RESULTS: We suggest a novel strategy for spot quality control by using Bayesian networks, which contain many appealing properties in the spot quality control context. We illustrate how a non-linear least squares based Gaussian fitting procedure can be used in order to extract features for a spot on a microarray. The features we used in this study are: spot intensity, size of the spot, roundness of the spot, alignment error, background intensity, background noise, and bleeding. We conclude that Bayesian networks are a reliable and useful model for microarray spot quality assessment. SUPPLEMENTARY INFORMATION: http://sigwww.cs.tut.fi/TICSP/SpotQuality/.  相似文献   

10.
MOTIVATION: Data from one-channel cDNA microarray studies may exhibit poor reproducibility due to spatial heterogeneity, non-linear array-to-array variation and problems in correcting for background. Uncorrected, these phenomena can give rise to misleading conclusions. RESULTS: Spatial heterogeneity may be corrected using two-dimensional loess smoothing (Colantuoni et al., 2002). Non-linear between-array variation may be corrected using an iterative application of one-dimensional loess smoothing. A method for background correction using a smoothing function rather than simple subtraction is described. These techniques promote within-array spatial uniformity and between-array reproducibility. Their application is illustrated using data from a study of the effects of an insulin sensitizer, rosiglitazone, on gene expression in white adipose tissue in diabetic db/db mice. They may also be useful with data from two-channel cDNA microarrays and from oligonucleotide arrays. AVAILABILITY: R functions for the methods described are available on request from the author.  相似文献   

11.
Methods are presented for detecting differential expression using statistical hypothesis testing methods including analysis of variance (ANOVA). Practicalities of experimental design, power, and sample size are discussed. Methods for multiple testing correction and their application are described. Instructions for running typical analyses are given in the R programming environment. R code and the sample data set used to generate the examples are available at http://microarray.cpmc.columbia.edu/pavlidis/pub/aovmethods/.  相似文献   

12.
MOTIVATION: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. RESULTS: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. AVAILABILITY: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org  相似文献   

13.

Background  

Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the microarray scanner and the image analysis software.  相似文献   

14.
MOTIVATION: Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarization methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large datasets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and does not account for specific binding to the mismatch (MM) probes. RESULTS: We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark datasets and a real time-course dataset, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions. AVAILABILITY: Both mgMOS and the new model multi-mgMOS have been implemented in an R package, which is available at http://www.bioinf.man.ac.uk/resources/puma.  相似文献   

15.
16.
SUMMARY: We introduce a novel Matlab toolbox for microarray data analysis. This toolbox uses normalization based upon a normally distributed background and differential gene expression based on five statistical measures. The objects in this toolbox are open source and can be implemented to suit your application. AVAILABILITY: MDAT v1.0 is a Matlab toolbox and requires Matlab to run. MDAT is freely available at http://microarray.omrf.org/publications/2004/knowlton/MDAT.zip.  相似文献   

17.
MOTIVATION: Despite theoretical arguments that so-called 'loop designs' for two-channel DNA microarray experiments are more efficient, biologists continue to use 'reference designs'. We describe two sets of microarray experiments with RNA from two different biological systems (TPA-stimulated mammalian cells and Streptomyces coelicolor). In each case, both a loop and a reference design were used with the same RNA preparations with the aim of studying their relative efficiency. RESULTS: The results of these experiments show that (1) the loop design attains a much higher precision than the reference design, (2) multiplicative spot effects are a large source of variability, and if they are not accounted for in the mathematical model, for example, by taking log-ratios or including spot effects, then the model will perform poorly. The first result is reinforced by a simulation study. Practical recommendations are given on how simple loop designs can be extended to more realistic experimental designs and how standard statistical methods allow the experimentalist to use and interpret the results from loop designs in practice. AVAILABILITY: The data and R code are available at http://exgen.ma.umist.ac.uk CONTACT: veronica.vinciotti@brunel.ac.uk.  相似文献   

18.
ABSTRACT: BACKGROUND: A recent large-scale analysis of Gene Expression Omnibus (GEO) data found frequent evidence for spatial defects in a substantial fraction of Affymetrix microarrays in the GEO. Nevertheless, in contrast to quality assessment, artefact detection is not widely used in standard gene expression analysis pipelines. Furthermore, although approaches have been proposed to detect diverse types of spatial noise on arrays, the correction of these artefacts is mostly left to either summarization methods or the corresponding arrays are completely discarded. RESULTS: We show that state-of-the-art robust summarization procedures are vulnerable to artefacts on arrays and cannot appropriately correct for these. To address this problem, we present a simple approach to detect artefacts with high recall and precision, which we further improve by taking into account the spatial layout of arrays. Finally, we propose two correction methods for these artefacts that either substitute values of defective probes using probeset information or filter corrupted probes. We show that our approach can identify and correct defective probe measurements appropriately and outperforms existing tools. CONCLUSIONS: While summarization is insufficient to correct for defective probes, this problem can be addressed in a straightforward way by the methods we present for identification and correction of defective probes. As these methods output CEL files with corrected probe values that serve as input to standard normalization and summarization procedures, they can be easily integrated into existing microarray analysis pipelines as an additional pre-processing step. An R package is freely available from http://www.bio.ifi.lmu.de/artefact-correction.  相似文献   

19.
MOTIVATION: Two-dimensional Difference Gel Electrophoresis (DIGE) measures expression differences for thousands of proteins in parallel. In contrast to DNA microarray analysis, however, there have been few systematic studies on the validity of differential protein expression analysis, and the effects of normalization methods have not yet been investigated. To address this need, we assessed a series of same-same comparisons, evaluating how random experimental variance influenced differential expression analysis. RESULTS: The strong fluctuations observed were reflected in large discrepancies between the distributions of the spot intensities for different gels. Correct normalization for pooling of multiple gels for analysis is, therefore, essential. We show that both dye-specific background levels and the differences in scale of the spot intensity distributions must be accounted for. A variance stabilizing transform that had been developed for DNA microarray analysis combined with a robust Z-score allowed the determination of gel-independent signal thresholds based on the empirical distributions from same-same comparisons. In contrast, similar thresholds holding up to cross-validation could not be proposed for data normalized using methods established in the field of proteomics. AVAILABILITY: Software is available on request from the authors. SUPPLEMENTARY INFORMATION: There is supplementary material available online at http://www.flychip.org.uk/kreil/pub/2dgels/  相似文献   

20.
MOTIVATION: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium. RESULTS: We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community. AVAILABILITY: Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL. SUPPLEMENTARY INFORMATION: Several additional data files, figures and discussions are available at http://function.princeton.edu/SPELL/supplement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号