首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
MOTIVATION: Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. In time-course experiments in which gene expression is monitored over time, we are interested in testing gene expression profiles for different experimental groups. However, no sophisticated analytic methods have yet been proposed to handle time-course experiment data. RESULTS: We propose a statistical test procedure based on the ANOVA model to identify genes that have different gene expression profiles among experimental groups in time-course experiments. Especially, we propose a permutation test which does not require the normality assumption. For this test, we use residuals from the ANOVA model only with time-effects. Using this test, we detect genes that have different gene expression profiles among experimental groups. The proposed model is illustrated using cDNA microarrays of 3840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells.  相似文献   

2.
We propose an algorithm for selecting and clustering genes according to their time-course or dose-response profiles using gene expression data. The proposed algorithm is based on the order-restricted inference methodology developed in statistics. We describe the methodology for time-course experiments although it is applicable to any ordered set of treatments. Candidate temporal profiles are defined in terms of inequalities among mean expression levels at the time points. The proposed algorithm selects genes when they meet a bootstrap-based criterion for statistical significance and assigns each selected gene to the best fitting candidate profile. We illustrate the methodology using data from a cDNA microarray experiment in which a breast cancer cell line was stimulated with estrogen for different time intervals. In this example, our method was able to identify several biologically interesting genes that previous analyses failed to reveal.  相似文献   

3.
MOTIVATION: Classifying genes into clusters depending on their expression profiles is one of the most important analysis techniques for microarray data. Because temporal gene expression profiles are indicative of the dynamic functional properties of genes, the application of clustering analysis to time-course data allows the more precise division of genes into functional classes. Conventional clustering methods treat the sampling data at each time point as data obtained under different experimental conditions without considering the continuity of time-course data between time periods t and t+1. Here, we propose a method designated mathematical model-based clustering (MMBC). RESULTS: The proposed method, designated MMBC, was applied to artificial data and time-course data obtained using Saccharomyces cerevisiae. Our method is able to divide data into clusters more accurately and coherently than conventional clustering methods. Furthermore, MMBC is more tolerant to noise than conventional clustering methods. AVAILABILITY: Software is available upon request. CONTACT: taizo@brs.kyushu-u.ac.jp.  相似文献   

4.
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.  相似文献   

5.
MOTIVATION: Multi-series time-course microarray experiments are useful approaches for exploring biological processes. In this type of experiments, the researcher is frequently interested in studying gene expression changes along time and in evaluating trend differences between the various experimental groups. The large amount of data, multiplicity of experimental conditions and the dynamic nature of the experiments poses great challenges to data analysis. RESULTS: In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset.  相似文献   

6.

Background  

Time-course microarray experiments produce vector gene expression profiles across a series of time points. Clustering genes based on these profiles is important in discovering functional related and co-regulated genes. Early developed clustering algorithms do not take advantage of the ordering in a time-course study, explicit use of which should allow more sensitive detection of genes that display a consistent pattern over time. Peddada et al. [1] proposed a clustering algorithm that can incorporate the temporal ordering using order-restricted statistical inference. This algorithm is, however, very time-consuming and hence inapplicable to most microarray experiments that contain a large number of genes. Its computational burden also imposes difficulty to assess the clustering reliability, which is a very important measure when clustering noisy microarray data.  相似文献   

7.
8.
Dynamic model-based clustering for time-course gene expression data   总被引:1,自引:0,他引:1  
Microarray technology has produced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.  相似文献   

9.
TimeClust is a user-friendly software package to cluster genes according to their temporal expression profiles. It can be conveniently used to analyze data obtained from DNA microarray time-course experiments. It implements two original algorithms specifically designed for clustering short time series together with hierarchical clustering and self-organizing maps. AVAILABILITY: TimeClust executable files for Windows and LINUX platforms can be downloaded free of charge for non-profit institutions from the following web site: http://aimed11.unipv.it/TimeClust.  相似文献   

10.
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.  相似文献   

11.
On gene ranking using replicated microarray time course data   总被引:1,自引:0,他引:1  
Tai YC  Speed TP 《Biometrics》2009,65(1):40-51
Summary .  Consider the ranking of genes using data from replicated microarray time course experiments, where there are multiple biological conditions, and the genes of interest are those whose temporal profiles differ across conditions. We derive a multisample multivariate empirical Bayes' statistic for ranking genes in the order of differential expression, from both longitudinal and cross-sectional replicated developmental microarray time course data. Our longitudinal multisample model assumes that time course replicates are independent and identically distributed multivariate normal vectors. On the other hand, we construct a cross-sectional model using a normal regression framework with any appropriate basis for the design matrices. In both cases, we use natural conjugate priors in our empirical Bayes' setting which guarantee closed form solutions for the posterior odds. The simulations and two case studies using published worm and mouse microarray time course datasets indicate that the proposed approaches perform satisfactorily.  相似文献   

12.
We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.  相似文献   

13.
MOTIVATION: In haploinsufficiency profiling data, pleiotropic genes are often misclassified by clustering algorithms that impose the constraint that a gene or experiment belong to only one cluster. We have developed a general probabilistic model that clusters genes and experiments without requiring that a given gene or drug only appear in one cluster. The model also incorporates the functional annotation of known genes to guide the clustering procedure. RESULTS: We applied our model to the clustering of 79 chemogenomic experiments in yeast. Known pleiotropic genes PDR5 and MAL11 are more accurately represented by the model than by a clustering procedure that requires genes to belong to a single cluster. Drugs such as miconazole and fenpropimorph that have different targets but similar off-target genes are clustered more accurately by the model-based framework. We show that this model is useful for summarizing the relationship among treatments and genes affected by those treatments in a compendium of microarray profiles. AVAILABILITY: Supplementary information and computer code at http://genomics.lbl.gov/llda.  相似文献   

14.
Qin LX  Self SG 《Biometrics》2006,62(2):526-533
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.  相似文献   

15.
Hong F  Li H 《Biometrics》2006,62(2):534-544
Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions.  相似文献   

16.
17.
Microarray data acquired during time-course experiments allow the temporal variations in gene expression to be monitored. An original postprandial fasting experiment was conducted in the mouse and the expression of 200 genes was monitored with a dedicated macroarray at 11 time points between 0 and 72 hours of fasting. The aim of this study was to provide a relevant clustering of gene expression temporal profiles. This was achieved by focusing on the shapes of the curves rather than on the absolute level of expression. Actually, we combined spline smoothing and first derivative computation with hierarchical and partitioning clustering. A heuristic approach was proposed to tune the spline smoothing parameter using both statistical and biological considerations. Clusters are illustrated a posteriori through principal component analysis and heatmap visualization. Most results were found to be in agreement with the literature on the effects of fasting on the mouse liver and provide promising directions for future biological investigations.  相似文献   

18.
19.
MOTIVATION: Time series experiments of cDNA microarrays have been commonly used in various biological studies and conducted under a lot of experimental factors. A popular approach of time series microarray analysis is to compare one gene with another in their expression profiles, and clustering expression sequences is a typical example. On the other hand, a practically important issue in gene expression is to identify the general timing difference that is caused by experimental factors. This type of difference can be extracted by comparing a set of time series expression profiles under a factor with those under another factor, and so it would be difficult to tackle this issue by using only a current approach for time series microarray analysis. RESULTS: We have developed a systematic method to capture the timing difference in gene expression under different experimental factors, based on hidden Markov models. Our model outputs a real-valued vector at each state and has a unique state transition diagram. The parameters of our model are trained from a given set of pairwise (generally multiplewise) expression sequences. We evaluated our model using synthetic as well as real microarray datasets. The results of our experiment indicate that our method worked favourably to identify the timing ordering under different experimental factors, such as that gene expression under heat shock tended to start earlier than that under oxidative stress. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.
MOTIVATION: Cluster analysis of gene expression profiles has been widely applied to clustering genes for gene function discovery. Many approaches have been proposed. The rationale is that the genes with the same biological function or involved in the same biological process are more likely to co-express, hence they are more likely to form a cluster with similar gene expression patterns. However, most existing methods, including model-based clustering, ignore known gene functions in clustering. RESULTS: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions as prior probabilities in model-based clustering. In contrast to a global mixture model applicable to all the genes in the standard model-based clustering, we use a stratified mixture model: one stratum corresponds to the genes of unknown function while each of the other ones corresponding to the genes sharing the same biological function or pathway; the genes from the same stratum are assumed to have the same prior probability of coming from a cluster while those from different strata are allowed to have different prior probabilities of coming from the same cluster. We derive a simple EM algorithm that can be used to fit the stratified model. A simulation study and an application to gene function prediction demonstrate the advantage of our proposal over the standard method. CONTACT: weip@biostat.umn.edu  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号