首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Identification of differentially expressed (DE) genes across two conditions is a common task with microarray. Most existing approaches accomplish this goal by examining each gene separately based on a model and then control the false discovery rate over all genes. We took a different approach that employs a uniform platform to simultaneously depict the dynamics of the gene trajectories for all genes and select differently expressed genes. A new Functional Principal Component (FPC) approach is developed for time-course microarray data to borrow strength across genes. The approach is flexible as the temporal trajectory of the gene expressions is modeled nonparametrically through a set of orthogonal basis functions, and often fewer basis functions are needed to capture the shape of the gene expression trajectory than existing nonparametric methods. These basis functions are estimated from the data reflecting major modes of variation in the data. The correlation structure of the gene expressions over time is also incorporated without any parametric assumptions and estimated from all genes such that the information across other genes can be shared to infer one individual gene. Estimation of the parameters is carried out by an efficient hybrid EM algorithm. The performance of the proposed method across different scenarios was compared favorably in simulation to two-way mixed-effects ANOVA and the EDGE method using B-spline basis function. Application to the real data on C. elegans developmental stages also suggested that FPC analysis combined with hybrid EM algorithm provides a computationally fast and efficient method for identifying DE genes based on time-course microarray data.  相似文献   

2.
Li X  Feltus FA  Sun X  Wang JZ  Luo F 《Proteomics》2011,11(19):3845-3852
Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model.  相似文献   

3.
MOTIVATION: ANOVA is a technique, which is frequently used in the analysis of microarray data, e.g. to assess the significance of treatment effects, and to select interesting genes based on P-values. However, it does not give information about what exactly is causing the effect. Our purpose is to improve the interpretation of the results from ANOVA on large microarray datasets, by applying PCA on the individual variance components. Interaction effects can be visualized by biplots, showing genes and variables in one plot, providing insight in the effect of e.g. treatment or time on gene expression. Because ANOVA has removed uninteresting sources of variance, the results are much more interpretable than without ANOVA. Moreover, the combination of ANOVA and PCA provides a simple way to select genes, based on the interactions of interest. RESULTS: It is shown that the components from an ANOVA model can be summarized and visualized with PCA, which improves the interpretability of the models. The method is applied to a real time-course gene expression dataset of mesenchymal stem cells. The dataset was designed to investigate the effect of different treatments on osteogenesis. The biplots generated with the algorithm give specific information about the effects of specific treatments on genes over time. These results are in agreement with the literature. The biological validation with GO annotation from the genes present in the selections shows that biologically relevant groups of genes are selected. AVAILABILITY: R code with the implementation of the method for this dataset is available from http://www.cac.science.ru.nl under the heading "Software".  相似文献   

4.
5.
Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.  相似文献   

6.
A common goal of microarray and related high-throughput genomic experiments is to identify genes that vary across biological condition. Most often this is accomplished by identifying genes with changes in mean expression level, so called differentially expressed (DE) genes, and a number of effective methods for identifying DE genes have been developed. Although useful, these approaches do not accommodate other types of differential regulation. An important example concerns differential coexpression (DC). Investigations of this class of genes are hampered by the large cardinality of the space to be interrogated as well as by influential outliers. As a result, existing DC approaches are often underpowered, exceedingly prone to false discoveries, and/or computationally intractable for even a moderately large number of pairs. To address this, an empirical Bayesian approach for identifying DC gene pairs is developed. The approach provides a false discovery rate controlled list of significant DC gene pairs without sacrificing power. It is applicable within a single study as well as across multiple studies. Computations are greatly facilitated by a modification to the expectation-maximization algorithm and a procedural heuristic. Simulations suggest that the proposed approach outperforms existing methods in far less computational time; and case study results suggest that the approach will likely prove to be a useful complement to current DE methods in high-throughput genomic studies.  相似文献   

7.
Statistical tests for differential expression in cDNA microarray experiments   总被引:13,自引:0,他引:13  
Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.  相似文献   

8.
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.  相似文献   

9.
MOTIVATION: The field of microarray data analysis is shifting emphasis from methods for identifying differentially expressed genes to methods for identifying differentially expressed gene categories. The latter approaches utilize a priori information about genes to group genes into categories and enhance the interpretation of experiments aimed at identifying expression differences across treatments. While almost all of the existing approaches for identifying differentially expressed gene categories are practically useful, they suffer from a variety of drawbacks. Perhaps most notably, many popular tools are based exclusively on gene-specific statistics that cannot detect many types of multivariate expression change. RESULTS: We have developed a nonparametric multivariate method for identifying gene categories whose multivariate expression distribution differs across two or more conditions. We illustrate our approach and compare its performance to several existing procedures via the analysis of a real data set and a unique data-based simulation study designed to capture the challenges and complexities of practical data analysis. We show that our method has good power for differentiating between differentially expressed and non-differentially expressed gene categories, and we utilize a resampling based strategy for controlling the false discovery rate when testing multiple categories. AVAILABILITY: R code (www.r-project.org) for implementing our approach is available from the first author by request.  相似文献   

10.
This article focuses on microarray experiments with two or more factors in which treatment combinations of the factors corresponding to the samples paired together onto arrays are not completely random. A main effect of one (or more) factor(s) is confounded with arrays (the experimental blocks). This is called a split-plot microarray experiment. We utilise an analysis of variance (ANOVA) model to assess differentially expressed genes for between-array and within-array comparisons that are generic under a split-plot microarray experiment. Instead of standard t- or F-test statistics that rely on mean square errors of the ANOVA model, we use a robust method, referred to as 'a pooled percentile estimator', to identify genes that are differentially expressed across different treatment conditions. We illustrate the design and analysis of split-plot microarray experiments based on a case application described by Jin et al. A brief discussion of power and sample size for split-plot microarray experiments is also presented.  相似文献   

11.
Time course microarray experiments designed to characterize the dynamic regulation of gene expression in biological systems are becoming increasingly important. One critical issue that arises when examining time course microarray data is the identification of genes that show different temporal expression patterns among biological conditions. Here we propose a Bayesian hierarchical model to incorporate important experimental factors and to account for correlated gene expression measurements over time and over different genes. A new gene selection algorithm is also presented with the model to simultaneously identify genes that show changes in expression among biological conditions, in response to time and other experimental factors of interest. The algorithm performs well in terms of the false positive and false negative rates in simulation studies. The methodology is applied to a mouse model time course experiment to correlate temporal changes in azoxymethane-induced gene expression profiles with colorectal cancer susceptibility.  相似文献   

12.
13.
基于基因表达谱识别乳腺癌转移相关差异表达基因及其功能时,由于基因表达在个体间的变异相对较高而样本量相对较少,由不同研究识别的差异表达基因的可重复性较低。本文基于两套乳腺癌转移基因表达谱,评价两组差异表达基因及其所富集的功能的可重复性。结果显示:在两套表达谱中识别的差异表达基因的表达改变方向高度一致并具有显著的表达相关性;基于两组差异表达基因识别的转移相关功能在两套表达谱中高度可重复,主要涉及细胞分裂、细胞周期、DNA复制、染色体分离、磷酸肌醇介导信号转导和DNA损伤刺激应答等。  相似文献   

14.
15.
On gene ranking using replicated microarray time course data   总被引:1,自引:0,他引:1  
Tai YC  Speed TP 《Biometrics》2009,65(1):40-51
Summary .  Consider the ranking of genes using data from replicated microarray time course experiments, where there are multiple biological conditions, and the genes of interest are those whose temporal profiles differ across conditions. We derive a multisample multivariate empirical Bayes' statistic for ranking genes in the order of differential expression, from both longitudinal and cross-sectional replicated developmental microarray time course data. Our longitudinal multisample model assumes that time course replicates are independent and identically distributed multivariate normal vectors. On the other hand, we construct a cross-sectional model using a normal regression framework with any appropriate basis for the design matrices. In both cases, we use natural conjugate priors in our empirical Bayes' setting which guarantee closed form solutions for the posterior odds. The simulations and two case studies using published worm and mouse microarray time course datasets indicate that the proposed approaches perform satisfactorily.  相似文献   

16.
17.
Little consideration has been given to the effect of different segmentation methods on the variability of data derived from microarray images. Previous work has suggested that the significant source of variability from microarray image analysis is from estimation of local background. In this study, we used Analysis of Variance (ANOVA) models to investigate the effect of methods of segmentation on the precision of measurements obtained from replicate microarray experiments. We used four different methods of spot segmentation (adaptive, fixed circle, histogram and GenePix) to analyse a total number of 156 172 spots from 12 microarray experiments. Using a two-way ANOVA model and the coefficient of repeatability, we show that the method of segmentation significantly affects the precision of the microarray data. The histogram method gave the lowest variability across replicate spots compared to other methods, and had the lowest pixel-to-pixel variability within spots. This effect on precision was independent of background subtraction. We show that these findings have direct, practical implications as the variability in precision between the four methods resulted in different numbers of genes being identified as differentially expressed. Segmentation method is an important source of variability in microarray data that directly affects precision and the identification of differentially expressed genes.  相似文献   

18.
As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the optimal discovery procedure (ODP), which has recently been introduced and theoretically shown to optimally perform multiple significance tests. Whereas existing procedures essentially use data from only one feature at a time, the ODP approach uses the relevant information from the entire data set when testing each feature. In particular, we propose a generally applicable estimate of the ODP for identifying differentially expressed genes in microarray experiments. This microarray method consistently shows favorable performance over five highly used existing methods. For example, in testing for differential expression between two breast cancer tumor types, the ODP provides increases from 72% to 185% in the number of genes called significant at a false discovery rate of 3%. Our proposed microarray method is freely available to academic users in the open-source, point-and-click EDGE software package.  相似文献   

19.
通过比较登革热患者和健康人群转录组数据,识别差异基因,构建失调ceRNA网络,筛选关键基因富集分析,解析潜在生物学功能,助力登革热诊断标志物的研究。从GEO数据库下载登革热外周血芯片数据,识别差异基因并进行富集分析。结合miRNA-mRNA互作数据,利用超几何算法和皮尔森相关性计算方法识别登革热失调ceRNA互作对,使用Cytoscape软件可视化ceRNA网络与模块挖掘,对网络模块进行功能富集及外部数据验证表达模式。筛选出251个差异基因,发现其富集在细胞周期等生物学通路中。经外部数据验证,网络模块基因的表达趋势与训练集数据大致相同,表明模块基因在登革热疾病中的潜在诊断效能。本研究可为确定有效的疾病诊断分子标志物提供思路。  相似文献   

20.
The work presented here is a first step toward a long term goal of systems biology, the complete elucidation of the gene regulatory networks of a living organism. To this end, we have employed DNA microarray technology to identify genes involved in the regulatory networks that facilitate the transition of Escherichia coli cells from an aerobic to an anaerobic growth state. We also report the identification of a subset of these genes that are regulated by a global regulatory protein for anaerobic metabolism, FNR. Analysis of these data demonstrated that the expression of over one-third of the genes expressed during growth under aerobic conditions are altered when E. coli cells transition to an anaerobic growth state, and that the expression of 712 (49%) of these genes are either directly or indirectly modulated by FNR. The results presented here also suggest interactions between the FNR and the leucine-responsive regulatory protein (Lrp) regulatory networks. Because computational methods to analyze and interpret high dimensional DNA microarray data are still at an early stage, and because basic issues of data analysis are still being sorted out, much of the emphasis of this work is directed toward the development of methods to identify differentially expressed genes with a high level of confidence. In particular, we describe an approach for identifying gene expression patterns (clusters) obtained from multiple perturbation experiments based on a subset of genes that exhibit high probability for differential expression values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号