首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.  相似文献   

2.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

3.
MOTIVATION: The issue of high dimensionality in microarray data has been, and remains, a hot topic in statistical and computational analysis. Efficient gene filtering and differentiation approaches can reduce the dimensions of data, help to remove redundant genes and noises, and highlight the most relevant genes that are major players in the development of certain diseases or the effect of drug treatment. The purpose of this study is to investigate the efficiency of parametric (including Bayesian and non-Bayesian, linear and non-linear), non-parametric and semi-parametric gene filtering methods through the application of time course microarray data from multiple sclerosis patients being treated with interferon-beta-1a. The analysis of variance with bootstrapping (parametric), class dispersion (semi-parametric) and Pareto (non-parametric) with permutation methods are presented and compared for filtering and finding differentially expressed genes. The Bayesian linear correlated model, the Bayesian non-linear model the and non-Bayesian mixed effects model with bootstrap were also developed to characterize the differential expression patterns. Furthermore, trajectory-clustering approaches were developed in order to investigate the dynamic patterns and inter-dependency of drug treatment effects on gene expression. RESULTS: Results show that the presented methods performed significant differently but all were adequate in capturing a small number of the potentially relevant genes to the disease. The parametric method, such as the mixed model and two Bayesian approaches proved to be more conservative. This may because these methods are based on overall variation in expression across all time points. The semi-parametric (class dispersion) and non-parametric (Pareto) methods were appropriate in capturing variation in expression from time point to time point, thereby making them more suitable for investigating significant monotonic changes and trajectories of changes in gene expressions in time course microarray data. Also, the non-linear Bayesian model proved to be less conservative than linear Bayesian correlated growth models to filter out the redundant genes, although the linear model showed better fit than non-linear model (smaller DIC). We also report the trajectories of significant genes-since we have been able to isolate trajectories of genes whose regulations appear to be inter-dependent.  相似文献   

4.
5.
Combining information across genes in the statistical analysis of microarray data is desirable because of the relatively small number of data points obtained for each individual gene. Here we develop an estimator of the error variance that can borrow information across genes using the James-Stein shrinkage concept. A new test statistic (FS) is constructed using this estimator. The new statistic is compared with other statistics used to test for differential expression: the gene-specific F test (F1), the pooled-variance F statistic (F3), a hybrid statistic (F2) that uses the average of the individual and pooled variances, the regularized t-statistic, the posterior odds statistic B, and the SAM t-test. The FS-test shows best or nearly best power for detecting differentially expressed genes over a wide range of simulated data in which the variance components associated with individual genes are either homogeneous or heterogeneous. Thus FS provides a powerful and robust approach to test differential expression of genes that utilizes information not available in individual gene testing approaches and does not suffer from biases of the pooled variance approach.  相似文献   

6.
The increased availability of microarray data has been calling for statistical methods to integrate findings across studies. A common goal of microarray analysis is to determine differentially expressed genes between two conditions, such as treatment vs control. A recent Bayesian metaanalysis model used a prior distribution for the mean log-expression ratios that was a mixture of two normal distributions. This model centered the prior distribution of differential expression at zero, and separated genes into two groups only: expressed and nonexpressed. Here, we introduce a Bayesian three-component truncated normal mixture prior model that more flexibly assigns prior distributions to the differentially expressed genes and produces three groups of genes: up and downregulated, and nonexpressed. We found in simulations of two and five studies that the three-component model outperformed the two-component model using three comparison measures. When analyzing biological data of Bacillus subtilis, we found that the three-component model discovered more genes and omitted fewer genes for the same levels of posterior probability of differential expression than the two-component model, and discovered more genes for fixed thresholds of Bayesian false discovery. We assumed that the data sets were produced from the same microarray platform and were prescaled.  相似文献   

7.
SUMMARY: Using replicated human serum samples, we applied an error model for proteomic differential expression profiling for a high-resolution liquid chromatography-mass spectrometry (LC-MS) platform. The detailed noise analysis presented here uses an experimental design that separates variance caused by sample preparation from variance due to analytical equipment. An analytic approach based on a two-component error model was applied, and in combination with an existing data driven technique that utilizes local sample averaging, we characterized and quantified the noise variance as a function of mean peak intensity. The results indicate that for processed LC-MS data a constant coefficient of variation is dominant for high intensities, whereas a model for low intensities explains Poisson-like variations. This result leads to a quadratic variance model which is used for the estimation of sample preparation noise present in LC-MS data.  相似文献   

8.
为了揭示出一些影响鱼类生长发育速度方面的遗传信息,本文应用mRNA差异显示技术,以同一批受精卵孵化的同池养殖的两组大菱鲆鱼为试验材料,检测了在相同生长发育条件下两组体长相差悬殊的3月龄大菱鲆肌肉组织的基因差异表达。结果:18对引物组合共显示出723条带,其中有527条带能够重现,差异显示条带的重现率为72.89%;在527条稳定的条带中有21条为差异带,其余506条为共有带(96.02%)。21条差异表达条带中有16条(3.04%)为阳性差异表达cDNA片段。这些差异表达cDNA片段的存在,说明体长差异悬殊的两组大菱鲆之间存在基因表达上的差异。试验结果对于进一步分析各种差异表达基因与大菱鲆的生长性状之间的相关关系奠定了基础;为深入研究大菱鲆的生长发育性状的分子遗传机制奠定了基础。  相似文献   

9.
Fluorescent differential display (FDD) has been used to screen for cDNAs that are differentially up-regulated in male flowers of the dioecious plant Silene latifolia in which an X/Y chromosome system of sex determination operates. To adapt FDD to the cloning of large numbers of differential cDNAs, a novel method of confirming the differential expression of these has been devised. FDD gels were Southern electro-blotted and probed with mixtures of individual cDNA clones derived from different FDD product ligation reactions. These Southern blots were then stripped and re-probed with further mixtures of individual cloned FDD products to identify the maximum number of recombinant clones carrying the true differential amplification products. Of 135 differential bands identified by FDD, 56 differential amplification products were confirmed; these represent 23 unique differentially expressed genes as determined by virtual Northern analysis and two genes expressed at or below the level of detection by virtual Northern analysis. These two low expressed genes show bands of hybridization on genomic Southern blots that are specific to male plants, indicating that they are derived from, or closely related to, Y chromosome genes.  相似文献   

10.
Wang J  Jia M  Zhu L  Yuan Z  Li P  Chang C  Luo J  Liu M  Shi T 《PloS one》2010,5(10):e13721
Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.  相似文献   

11.
MOTIVATION: When analyzing microarray data, non-biological variation introduces uncertainty in the analysis and interpretation. In this paper we focus on the validation of significant differences in gene expression levels, or normalized channel intensity levels with respect to different experimental conditions and with replicated measurements. A myriad of methods have been proposed to study differences in gene expression levels and to assign significance values as a measure of confidence. In this paper we compare several methods, including SAM, regularized t-test, mixture modeling, Wilk's lambda score and variance stabilization. From this comparison we developed a weighted resampling approach and applied it to gene deletions in Mycobacterium bovis. RESULTS: We discuss the assumptions, model structure, computational complexity and applicability to microarray data. The results of our study justified the theoretical basis of the weighted resampling approach, which clearly outperforms the others.  相似文献   

12.
In many applications, an understanding of differentially expressed genes in different tissues, or owing to an applied stimulus is important. However, the wide use of two rather similar polymerase chain reaction (PCR)-based techniques for the identification of differentially expressed mRNAs (RNA fingerprinting by arbitrarily primed PCR [RAP-PCR] and differential, display [DDR-PCR] has shown, that reproducibility is still a problem. By combining features of both RAP-PCR and DDRT-PCR a technique has recently been developed that avoids some of the disadvantages, but the use of radioisotopes for band detection still limits its application. We have improved this technique for analyzing differentially expressed mRNA by resolving the amplified products on nondenaturing polyacrylamide gels and subsequently staining the gels with silver nitrate. Our modification allows the identification of differentially expressed bands with a very high accuracy. Therefore these bands can be very easily reamplified and sequenced directly. Subsequently the differential expression can be verified by semiquantitative RT-PCR with specific primers derived from sequence data. These improvements, together with nonradioactive sequencing techniques, make it possible to do DD analysis completely without a health hazardous owing to radioactivity. The nonradioisotopic differentially expressed mRNA-PCR (DEmRNA-PCR) is a reliable and useful modification of available differential expression methods.  相似文献   

13.
Differential analysis of DNA microarray gene expression data   总被引:6,自引:0,他引:6  
Here, we review briefly the sources of experimental and biological variance that affect the interpretation of high-dimensional DNA microarray experiments. We discuss methods using a regularized t-test based on a Bayesian statistical framework that allow the identification of differentially regulated genes with a higher level of confidence than a simple t-test when only a few experimental replicates are available. We also describe a computational method for calculating the global false-positive and false-negative levels inherent in a DNA microarray data set. This method provides a probability of differential expression for each gene based on experiment-wide false-positive and -negative levels driven by experimental error and biological variance.  相似文献   

14.
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.  相似文献   

15.
16.
17.
MOTIVATION: Microarray experiments generate a high data volume. However, often due to financial or experimental considerations, e.g. lack of sample, there is little or no replication of the experiments or hybridizations. These factors combined with the intrinsic variability associated with the measurement of gene expression can result in an unsatisfactory detection rate of differential gene expression (DGE). Our motivation was to provide an easy to use measure of the success rate of DGE detection that could find routine use in the design of microarray experiments or in post-experiment assessment. RESULTS: In this study, we address the problem of both random errors and systematic biases in microarray experimentation. We propose a mathematical model for the measured data in microarray experiments and on the basis of this model present a t-based statistical procedure to determine DGE. We have derived a formula to determine the success rate of DGE detection that takes into account the number of microarrays, the number of genes, the magnitude of DGE, and the variance from biological and technical sources. The formula and look-up tables based on the formula, can be used to assist in the design of microarray experiments. We also propose an ad hoc method for estimating the fraction of non-differentially expressed genes within a set of genes being tested. This will help to increase the power of DGE detection. AVAILABILITY: The functions to calculate the success rate of DGE detection have been implemented as a Java application, which is accessible at http://www.le.ac.uk/mrctox/microarray_lab/Microarray_Softwares/Microarray_Softwares.htm  相似文献   

18.
Hu J  Wright FA 《Biometrics》2007,63(1):41-49
The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.  相似文献   

19.
Combining multiple microarrays in the presence of controlling variables   总被引:2,自引:0,他引:2  
MOTIVATION: Microarray technology enables the monitoring of expression levels for thousands of genes simultaneously. When the magnitude of the experiment increases, it becomes common to use the same type of microarrays from different laboratories or hospitals. Thus, it is important to analyze microarray data together to derive a combined conclusion after accounting for the differences. One of the main objectives of the microarray experiment is to identify differentially expressed genes among the different experimental groups. The analysis of variance (ANOVA) model has been commonly used to detect differentially expressed genes after accounting for the sources of variation commonly observed in the microarray experiment. RESULTS: We extended the usual ANOVA model to account for an additional variability resulting from many confounding variables such as the effect of different hospitals. The proposed model is a two-stage ANOVA model. The first stage is the adjustment for the effects of no interests. The second stage is the detection of differentially expressed genes among the experimental groups using the residuals obtained from the first stage. Based on these residuals, we propose a permutation test to detect the differentially expressed genes. The proposed model is illustrated using the data from 133 microarrays collected at three different hospitals. The proposed approach is more flexible to use, and it is easier to accommodate the individual covariates in this model than using the meta-analysis approach. AVAILABILITY: A set of programs written in R will be electronically sent upon request.  相似文献   

20.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号