首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a new method for identifying and validating drug targets by using gene networks, which are estimated from cDNA microarray gene expression profile data. We created novel gene disruption and drug response microarray gene expression profile data libraries for the purpose of drug target elucidation. We use two types of microarray gene expression profile data for estimating gene networks and then identifying drug targets. The estimated gene networks play an essential role in understanding drug response data and this information is unattainable from clustering methods, which are the standard for gene expression analysis. In the construction of gene networks, we use the Bayesian network model. We use an actual example from analysis of the Saccharomyces cerevisiae gene expression profile data to express a concrete strategy for the application of gene network information to drug discovery.  相似文献   

2.
Dynamic model-based clustering for time-course gene expression data   总被引:1,自引:0,他引:1  
Microarray technology has produced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.  相似文献   

3.
4.
5.
Tsai J  Sultana R  Lee Y  Pertea G  Karamycheva S  Antonescu V  Cho J  Parvizi B  Cheung F  Quackenbush J 《Genome biology》2001,2(11):software0002.1-software00024
Microarray expression analysis is providing unprecedented data on gene expression in humans and mammalian model systems. Although such studies provide a tremendous resource for understanding human disease states, one of the significant challenges is cross-referencing the data derived from different species, across diverse expression analysis platforms, in order to properly derive inferences regarding gene expression and disease state. To address this problem, we have developed RESOURCERER, a microarray-resource annotation and cross-reference database built using the analysis of expressed sequence tags (ESTs) and gene sequences provided by the TIGR Gene Index (TGI) and TIGR Orthologous Gene Alignment (TOGA) databases [now called Eukaryotic Gene Orthologs (EGO)].  相似文献   

6.
Microarrays have become a standard tool for investigating gene function and more complex microarray experiments are increasingly being conducted. For example, an experiment may involve samples from several groups or may investigate changes in gene expression over time for several subjects, leading to large three-way data sets. In response to this increase in data complexity, we propose some extensions to the plaid model, a biclustering method developed for the analysis of gene expression data. This model-based method lends itself to the incorporation of any additional structure such as external grouping or repeated measures. We describe how the extended models may be fitted and illustrate their use on real data.  相似文献   

7.
8.
卢汀 《生物信息学》2014,12(2):140-144
基因的差异化表达由多种因素共同导致,并且与许多疾病的发生和发展有密切联系,对差异化表达的基因进行生物信息学以及生物统计学的分析对于研究细胞调节机制和疾病机理有着重要意义。目前,对差异化表达的基因有以下几种主流的研究方法:DNA微阵列(DNA microarray),抑制性消减杂交(SSH),基因表达连续性分析(SAGE),代表性差异分析(RDA),以及mRNA差异显示PCR(mRNA DDRT-PCR)。目前许多基因差异化表达数据是建立在时段(time series)基础上,因此对基于时间变化的基因差异化表达分析变得尤为重要。本文将对差异化表达基因的几种主流方法进行详细阐述,并介绍一种基于傅里叶函数的时段基因差异化表达分析。  相似文献   

9.
Kim S  Imoto S  Miyano S 《Bio Systems》2004,75(1-3):57-65
We propose a dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data. The proposed method can overcome a shortcoming of the Bayesian network model in the sense of the construction of cyclic regulations. The proposed method can analyze the microarray data as a continuous data and can capture even nonlinear relations among genes. It can be expected that this model will give a deeper insight into complicated biological systems. We also derive a new criterion for evaluating an estimated network from Bayes approach. We conduct Monte Carlo experiments to examine the effectiveness of the proposed method. We also demonstrate the proposed method through the analysis of the Saccharomyces cerevisiae gene expression data.  相似文献   

10.
A hidden-state Markov model for cell population deconvolution.   总被引:1,自引:0,他引:1  
Microarrays measure gene expression typically from a mixture of cell populations during different stages of a biological process. However, the specific effects of the distinct or pure populations on measured gene expression are difficult or impossible to determine. The ability to deconvolve measured gene expression into the contributions from pure populations is critical to maximizing the potential of microarray analysis for investigating complex biological processes. In this paper, we describe a novel approach called the multinomial hidden Markov model (MHMM) that produces: (i) a maximum a posteriori estimate of the fraction represented by each pure population and (ii) gene expression values for each pure population. Our method uses an unsupervised, probabilistic approach for handling missing data points and clusters genes based on expression in pure populations. MHMM, used with several yeast datasets, identified statistically significant temporal dynamics. This method, unlike the linear decomposition models used previously for deconvolution, can extract information from different types of data, does not require a priori identification of pure gene expression, exploits the temporal nature of time series data, and is less affected by missing data.  相似文献   

11.
In recent years, variation in gene expression has been recognized as an important component of environmental adaptation in multiple model species, including a few fish species. There is, however, still little known about the genetic basis of adaptation in gene expression resulting from variation in the aquatic environment (e.g. temperature, salinity and oxygen) and the physiological effect and costs of such differences in gene expression. This review presents and discusses progress and pitfalls of applying gene expression analyses to fishes and suggests simple frameworks to get started with gene expression analysis. It is emphasized that well-planned gene expression studies can serve as an important tool for the identification of selection in local populations of fishes, even for non-traditional model species where limited genomic information is available. Recent studies focusing on gene expression variation among natural fish populations are reviewed, highlighting the latest applications that combine genetic evidence from neutral markers and gene expression data.  相似文献   

12.
Finite mixture models can provide the insights about behavioral patterns as a source of heterogeneity of the various dynamics of time course gene expression data by reducing the high dimensionality and making clear the major components of the underlying structure of the data in terms of the unobservable latent variables. The latent structure of the dynamic transition process of gene expression changes over time can be represented by Markov processes. This paper addresses key problems in the analysis of large gene expression data sets that describe systemic temporal response cascades and dynamic changes to therapeutic doses in multiple tissues, such as liver, skeletal muscle, and kidney from the same animals. Bayesian Finite Markov Mixture Model with a Dirichlet Prior is developed for the identifications of differentially expressed time related genes and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. The proposed Bayesian models are applied to multiple tissue polygenetic temporal gene expression data and compared to a Bayesian model‐based clustering method, named CAGED. Results show that our proposed Bayesian Finite Markov Mixture model can well capture the dynamic changes and patterns for irregular complex temporal data (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
14.
Hong H  Tong W  Perkins R  Fang H  Xie Q  Shi L 《DNA and cell biology》2004,23(10):685-694
The wealth of knowledge imbedded in gene expression data from DNA microarrays portends rapid advances in both research and clinic. Turning the prodigious and noisy data into knowledge is a challenge to the field of bioinformatics, and development of classifiers using supervised learning techniques is the primary methodological approach for clinical application using gene expression data. In this paper, we present a novel classification method, multiclass Decision Forest (DF), that is the direct extension of the two-class DF previously developed in our lab. Central to DF is the synergistic combining of multiple heterogenic but comparable decision trees to reach a more accurate and robust classification model. The computationally inexpensive multiclass DF algorithm integrates gene selection and model development, and thus eliminates the bias of gene preselection in crossvalidation. Importantly, the method provides several statistical means for assessment of prediction accuracy, prediction confidence, and diagnostic capability. We demonstrate the method by application to gene expression data for 83 small round blue-cell tumors (SRBCTs) samples belonging to one of four different classes. Based on 500 runs of 10-fold crossvalidation, tumor prediction accuracy was approximately 97%, sensitivity was approximately 95%, diagnostic sensitivity was approximately 91%, and diagnostic accuracy was approximately 99.5%. Among 25 genes selected to distinguish tumor class, 12 have functional information in the literature implicating their involvement in cancer. The four types of SRBCTs samples are also distinguishable in a clustering analysis based on the expression profiles of these 25 genes. The results demonstrated that the multiclass DF is an effective classification method for analysis of gene expression data for the purpose of molecular diagnostics.  相似文献   

15.
Temporal gene expression data are of particular interest to researchers as they contain rich information in characterization of gene function and have been widely used in biomedical studies and early cancer detection. However, the current temporal gene expressions usually have few measuring time series levels; extracting information and identifying efficient treatment effects without temporal information are still a problem. A?dense temporal gene expression data set in bacteria shows that the gene expression has various patterns under different biological conditions. Instead of analyzing gene expression levels, in this paper we consider the relative change-rates of gene in the observation period. We propose a non-linear regression model to characterize the relative change-rates of genes, in which individual expression trajectory is modeled as longitudinal data with changeable variance and covariance structure. Then, based on the parameter estimates, a chi-square test is proposed to test the equality of gene expression change-rates. Furthermore, the Mahalanobis distance is used for the classification of genes. The proposed methods are applied to the data set of 18?genes in P. aeruginosa expressed in 24?biological conditions. The simulation studies show that our methods perform well for analysis of temporal gene expressions.  相似文献   

16.
BackgroundStudies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures.Methodology/FindingsWe address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival.Conclusions/SignificanceOur results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.  相似文献   

17.
MOTIVATION: Extracting useful information from expression levels of thousands of genes generated with microarray technology needs a variety of analytical techniques. Mathematical programming approaches for classification analysis outperform parametric methods when the data depart from assumptions underlying these methods. Therefore, a mathematical programming approach is developed for gene selection and tissue classification using gene expression profiles. RESULTS: A new mixed integer programming model is formulated for this purpose. The mixed integer programming model simultaneously selects genes and constructs a classification model to classify two groups of tissue samples as accurately as possible. Very encouraging results were obtained with two data sets from the literature as examples. These results show that the mathematical programming approach can rival or outperform traditional classification methods.  相似文献   

18.
随着基因芯片的技术的推广,越来越多的表达数据需要被处理和分析.利用这些表达数据提取基因调控矩阵从而构建基因网络是一个重要的问题.通过线性微分方程模型可以初步构建基因网络,了解网络结构,提取最显著的信息.然而由于分子生物学的条件限制或者数据来源的限制,导致实验数据不充分,使方程组无解.本文使用三次样条方法,对26例临床、病理资料完备的具有淋巴结转移的乳腺癌基因表达数据进行插值处理,使表达数据满秩,从而使用最小二乘法解出加权矩阵,构建初步的表达基因调控网络.通过对构建的基因网络的初步分析表明:乳腺癌转移的形成是由多基因异常引起多条传导通路异常,致使细胞恶性转化的结果,这与生物学上公认的看法是相一致的.因此,利用此线性模型方法对基因表达谱进行分析兵有一定可行性,在认识乳腺癌转移机制,乳腺癌诊断和治疗方面具有一定的理论和应用价值.  相似文献   

19.
Microarrays can provide genome-wide expression patterns for various cancers, especially for tumor sub-types that may exhibit substantially different patient prognosis. Using such gene expression data, several approaches have been proposed to classify tumor sub-types accurately. These classification methods are not robust, and often dependent on a particular training sample for modelling, which raises issues in utilizing these methods to administer proper treatment for a future patient. We propose to construct an optimal, robust prediction model for classifying cancer sub-types using gene expression data. Our model is constructed in a step-wise fashion implementing cross-validated quadratic discriminant analysis. At each step, all identified models are validated by an independent sample of patients to develop a robust model for future data. We apply the proposed methods to two microarray data sets of cancer: the acute leukemia data by Golub et al. and the colon cancer data by Alon et al. We have found that the dimensionality of our optimal prediction models is relatively small for these cases and that our prediction models with one or two gene factors outperforms or has competing performance, especially for independent samples, to other methods based on 50 or more predictive gene factors. The methodology is implemented and developed by the procedures in R and Splus. The source code can be obtained at http://hesweb1.med.virginia.edu/bioinformatics.  相似文献   

20.
Summary Gene co‐expressions have been widely used in the analysis of microarray gene expression data. However, the co‐expression patterns between two genes can be mediated by cellular states, as reflected by expression of other genes, single nucleotide polymorphisms, and activity of protein kinases. In this article, we introduce a bivariate conditional normal model for identifying the variables that can mediate the co‐expression patterns between two genes. Based on this model, we introduce a likelihood ratio (LR) test and a penalized likelihood procedure for identifying the mediators that affect gene co‐expression patterns. We propose an efficient computational algorithm based on iterative reweighted least squares and cyclic coordinate descent and have shown that when the tuning parameter in the penalized likelihood is appropriately selected, such a procedure has the oracle property in selecting the variables. We present simulation results to compare with existing methods and show that the LR‐based approach can perform similarly or better than the existing method of liquid association and the penalized likelihood procedure can be quite effective in selecting the mediators. We apply the proposed method to yeast gene expression data in order to identify the kinases or single nucleotide polymorphisms that mediate the co‐expression patterns between genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号