首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput microarray technologies measure the abundance of thousands of mRNA targets simultaneously. Due to the usual disparity between a few available samples (from limited conditions or time course points) and many gene expression values (entire genomes), a complex high-dimensional genomic system has to be analyzed, for instance by reverse engineering methods. The latter aim to reconstruct gene networks from experimentally observed expression changes caused by various kinds of perturbations. In particular, elucidating regulatory paths and assessing their reliability across replicates are central topics in this article. The reconstruction problem requires efficiency and accuracy from numerical optimization algorithms and statistical inference techniques. To this end, we focus on methods but also on the available experimental information produced in technical replicates. We propose a model-based approach based on a few steps. First, feature selection is performed by a projective method aimed to combine the gene measurements observed across replicates. Second, a quite heuristic sieving strategy is pursued to bypass the usual recourse to averaging. Third, the impact of dimensionality reduction on the biological system under study is evaluated. Evidence is obtained from the application of our approach to microarray time course experimental replicated data, and suggests that gene features, once identified, can be used for stabilization purposes relatively to the replicate variability. Both quantitative representation and qualitative assessment of the observed gene feature interference are reported in order to decipher specific gene regulatory map and the pathway-associated dynamics.  相似文献   

2.
MOTIVATION: We present statistical methods for determining the number of per gene replicate spots required in microarray experiments. The purpose of these methods is to obtain an estimate of the sampling variability present in microarray data, and to determine the number of replicate spots required to achieve a high probability of detecting a significant fold change in gene expression, while maintaining a low error rate. Our approach is based on data from control microarrays, and involves the use of standard statistical estimation techniques. RESULTS: After analyzing two experimental data sets containing control array data, we were able to determine the statistical power available for the detection of significant differential expression given differing levels of replication. The inclusion of replicate spots on microarrays not only allows more accurate estimation of the variability present in an experiment, but more importantly increases the probability of detecting genes undergoing significant fold changes in expression, while substantially decreasing the probability of observing fold changes due to chance rather than true differential expression.  相似文献   

3.
Bayesian mixture model based clustering of replicated microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. RESULTS: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. AVAILABILITY: The MS Windows based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm SUPPLEMENTAL INFORMATION: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html  相似文献   

4.
5.
Bayesian hierarchical error model for analysis of gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Analysis of genome-wide microarray data requires the estimation of a large number of genetic parameters for individual genes and their interaction expression patterns under multiple biological conditions. The sources of microarray error variability comprises various biological and experimental factors, such as biological and individual replication, sample preparation, hybridization and image processing. Moreover, the same gene often shows quite heterogeneous error variability under different biological and experimental conditions, which must be estimated separately for evaluating the statistical significance of differential expression patterns. Widely used linear modeling approaches are limited because they do not allow simultaneous modeling and inference on the large number of these genetic parameters and heterogeneous error components on different genes, different biological and experimental conditions, and varying intensity ranges in microarray data. RESULTS: We propose a Bayesian hierarchical error model (HEM) to overcome the above restrictions. HEM accounts for heterogeneous error variability in an oligonucleotide microarray experiment. The error variability is decomposed into two components (experimental and biological errors) when both biological and experimental replicates are available. Our HEM inference is based on Markov chain Monte Carlo to estimate a large number of parameters from a single-likelihood function for all genes. An F-like summary statistic is proposed to identify differentially expressed genes under multiple conditions based on the HEM estimation. The performance of HEM and its F-like statistic was examined with simulated data and two published microarray datasets-primate brain data and mouse B-cell development data. HEM was also compared with ANOVA using simulated data. AVAILABILITY: The software for the HEM is available from the authors upon request.  相似文献   

6.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

7.

Background  

The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework.  相似文献   

8.
Differential analysis of DNA microarray gene expression data   总被引:6,自引:0,他引:6  
Here, we review briefly the sources of experimental and biological variance that affect the interpretation of high-dimensional DNA microarray experiments. We discuss methods using a regularized t-test based on a Bayesian statistical framework that allow the identification of differentially regulated genes with a higher level of confidence than a simple t-test when only a few experimental replicates are available. We also describe a computational method for calculating the global false-positive and false-negative levels inherent in a DNA microarray data set. This method provides a probability of differential expression for each gene based on experiment-wide false-positive and -negative levels driven by experimental error and biological variance.  相似文献   

9.
Expression microarrays have great potential for clinical use but variability of the results represents a challenge for reliable practical application. The amount of fluorescent dye used in microarray experiments is a significant source of variability that has not been systematically studied. Here we demonstrate that the quantity of Cy3 dye affects microarray results performed on tumor specimens. Signal-to-noise ratios and coefficients of variation are significantly improved by increasing Cy3 to 150–180 pmol, but any further increase does not improve the data. In conclusion, optimal amounts of dye reduce variability and improve reliability of expression microarray experiments.  相似文献   

10.
Microarray experiments offer the ability to generate gene expression measurements for thousands of genes simultaneously. Work has begun recently on attempting to reconstruct genetic networks based on analyses of microarray experiments in time-course studies. An important tool in these analyses has been the singular value decomposition method. However, little work has been done on assessing the variability associated with singular value decomposition analyses. In this report, we discuss use of the bootstrap as a method of obtaining standard errors for singular value decomposition analyses. We consider use of this method both when there are replicates and when no replicates exist. The proposed methods are illustrated with an application to two datasets: one involving a human foreskin study, the other involving yeast. Electronic Publication  相似文献   

11.
Evaluation of the gene-specific dye bias in cDNA microarray experiments   总被引:2,自引:0,他引:2  
MOTIVATION: In cDNA microarray experiments all samples are labeled with either Cy3 or Cy5. Systematic and gene-specific dye bias effects have been observed in dual-color experiments. In contrast to systematic effects which can be corrected by a normalization method, the gene-specific dye bias is not completely suppressed and may alter the conclusions about the differentially expressed genes. METHODS: The gene-specific dye bias is taken into account using an analysis of variance model. We propose an index, named label bias index, to measure the gene-specific dye bias. It requires at least two self-self hybridization cDNA microarrays. RESULTS: After lowess normalization we have found that the gene-specific dye bias is the major source of experimental variability between replicates. The ratio (R/G) may exceed 2. As a consequence false positive genes may be found in direct comparison without dye-swap. The stability of this artifact and its consequences on gene variance and on direct or indirect comparisons are addressed. AVAILABILITY: http://www.inapg.inra.fr/ens_rech/mathinfo/recherche/mathematique  相似文献   

12.
High-density arrays of DNA bound to solid substrates offer a powerful approach to identifying changes in gene expression in response to toxicants. While DNA arrays have been used to explore qualitative changes in gene regulation, less attention has focused on the quantitative aspects of this technology. Arrays containing expressed sequence tags for xenobiotic metabolizing enzymes, proteins associated with glutathione regulation, DNA repair enzymes, heat shock proteins, and housekeeping genes were used to examine gene expression in response to beta-naphthoflavone (beta-NF). Upregulation of cytochrome P4501a1 (Cyp1a1) and 1a2 in mouse liver was maximal 8 h after beta-NF administration. Significant upregulation of Cyp1a2 was noted at beta-NF doses as low as 0.62 and 1.2 mg/kg when gene expression was measured by microarray or Northern blotting, respectively. Maximal Cyp1a2 induction is 5-fold by Northern analysis and 10-fold by microarray. Induction of Cyp1a1 was 15- and 20-fold by Northern and microarray analysis, respectively. The coefficient of variation for spot to spot and slide to slide comparisons was <15%; this variability was smaller than interanimal variability (18-60%). Comparison of mRNA expression in control animals indicated that there are differences in labeling/detection associated with Cy3/Cy5 dyes; accordingly, experiments must include methods for establishing baseline signals for all genes. We conclude that the dynamic range and sensitivity of DNA microarrays on glass slides is comparable to Northern blotting analysis and that variability of the data introduced during spotting and hybridization is less than the interanimal variability.  相似文献   

13.
Wang H  He X 《Biometrics》2008,64(2):449-457
Summary .   Due to the small number of replicates in typical gene microarray experiments, the performance of statistical inference is often unsatisfactory without some form of information-sharing across genes. In this article, we propose an enhanced quantile rank score test (EQRS) for detecting differential expression in GeneChip studies by analyzing the quantiles of gene intensity distributions through probe-level measurements. A measure of sign correlation, δ, plays an important role in the rank score tests. By sharing information across genes, we develop a calibrated estimate of δ, which reduces the variability at small sample sizes. We compare the EQRS test with four other approaches for determining differential expression: the gene-specific quantile rank score test, the quantile rank score test assuming a common δ, a modified t -test using summarized probe-set-level intensities, and the Mack–Skillings rank test on probe-level data. The proposed EQRS is shown to be favorable for preserving false discovery rates and for being robust against outlying arrays. In addition, we demonstrate the merits of the proposed approach using a GeneChip study comparing gene expression in the livers of mice exposed to chronic intermittent hypoxia and of those exposed to intermittent room air.  相似文献   

14.
Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.  相似文献   

15.
Conventional statistical methods for interpreting microarray data require large numbers of replicates in order to provide sufficient levels of sensitivity. We recently described a method for identifying differentially-expressed genes in one-channel microarray data 1. Based on the idea that the variance structure of microarray data can itself be a reliable measure of noise, this method allows statistically sound interpretation of as few as two replicates per treatment condition. Unlike the one-channel array, the two-channel platform simultaneously compares gene expression in two RNA samples. This leads to covariation of the measured signals. Hence, by accounting for covariation in the variance model, we can significantly increase the power of the statistical test. We believe that this approach has the potential to overcome limitations of existing methods. We present here a novel approach for the analysis of microarray data that involves modeling the variance structure of paired expression data in the context of a Bayesian framework. We also describe a novel statistical test that can be used to identify differentially-expressed genes. This method, bivariate microarray analysis (BMA), demonstrates dramatically improved sensitivity over existing approaches. We show that with only two array replicates, it is possible to detect gene expression changes that are at best detected with six array replicates by other methods. Further, we show that combining results from BMA with Gene Ontology annotation yields biologically significant results in a ligand-treated macrophage cell system.  相似文献   

16.

Background  

In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data.  相似文献   

17.
18.
19.
The effect of replication on gene expression microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication. RESULTS: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10-15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号