首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
MOTIVATION: The rapid accumulation of microarray datasets provides unique opportunities to perform systematic functional characterization of the human genome. We designed a graph-based approach to integrate cross-platform microarray data, and extract recurrent expression patterns. A series of microarray datasets can be modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. The integrative approach provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression and (3) provide a way to predict gene functions in a context-specific way. RESULTS: We integrate 65 human microarray datasets, comprising 1105 experiments and over 11 million expression measurements. We develop a data mining procedure based on frequent itemset mining and biclustering to systematically discover network patterns that recur in at least five datasets. This resulted in 143,401 potential functional modules. Subsequently, we design a network topology statistic based on graph random walk that effectively captures characteristics of a gene's local functional environment. Function annotations based on this statistic are then subject to the assessment using the random forest method, combining six other attributes of the network modules. We assign 1126 functions to 895 genes, 779 known and 116 unknown, with a validation accuracy of 70%. Among our assignments, 20% genes are assigned with multiple functions based on different network environments. AVAILABILITY: http://zhoulab.usc.edu/ContextAnnotation.  相似文献   

2.
MOTIVATION: Multi-series time-course microarray experiments are useful approaches for exploring biological processes. In this type of experiments, the researcher is frequently interested in studying gene expression changes along time and in evaluating trend differences between the various experimental groups. The large amount of data, multiplicity of experimental conditions and the dynamic nature of the experiments poses great challenges to data analysis. RESULTS: In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset.  相似文献   

3.
MOTIVATION: Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS: We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.  相似文献   

4.
There is great interest in chromosome- and pathway-based techniques for genomics data analysis in the current work in order to understand the mechanism of disease. However, there are few studies addressing the abilities of machine learning methods in incorporating pathway information for analyzing microarray data. In this paper, we identified the characteristic pathways by combining the classification error rates of out-of-bag (OOB) in random forests with pathways information. At each characteristic pathway, the correlation of gene expression was studied and the co-regulated gene patterns in different biological conditions were mined by Mining Attribute Profile (MAP) algorithm. The discovered co-regulated gene patterns were clustered by the average-linkage hierarchical clustering technique. The results showed that the expression of genes at the same characteristic pathway were approximate. Furthermore, two characteristic pathways were discovered to present co-regulated gene patterns in which one contained 108 patterns and the other contained one pattern. The results of cluster analysis showed that the smallest similarity coefficient of clusters was more than 0.623, which indicated that the co-regulated patterns in different biological conditions were more approximate at the same characteristic pathway. The methods discussed in this paper can provide additional insight into the study of microarray data.  相似文献   

5.
MOTIVATION: Microarray and gene chip technology provide high throughput tools for measuring gene expression levels in a variety of circumstances, including cellular response to drug treatment, cellular growth and development, tumorigenesis, among many other processes. In order to interpret the large data sets generated in experiments, data analysis techniques that consider biological knowledge during analysis will be extremely useful. We present here results showing the application of such a tool to expression data from yeast cell cycle experiments. RESULTS: Originally developed for spectroscopic analysis, Bayesian Decomposition (BD) includes two features which make it useful for microarray data analysis: the ability to assign genes to multiple coexpression groups and the ability to encode biological knowledge into the system. Here we demonstrate the ability of the algorithm to provide insight into the yeast cell cycle, including identification of five temporal patterns tied to cell cycle phases as well as the identification of a pattern tied to an approximately 40 min cell cycle oscillator. The genes are simultaneously assigned to the patterns, including partial assignment to multiple patterns when this is required to explain the expression profile. AVAILABILITY: The application is available free to academic users under a material transfer agreement. Go to http://bioinformatics.fccc.edu/ for more details.  相似文献   

6.
Time course microarray experiments designed to characterize the dynamic regulation of gene expression in biological systems are becoming increasingly important. One critical issue that arises when examining time course microarray data is the identification of genes that show different temporal expression patterns among biological conditions. Here we propose a Bayesian hierarchical model to incorporate important experimental factors and to account for correlated gene expression measurements over time and over different genes. A new gene selection algorithm is also presented with the model to simultaneously identify genes that show changes in expression among biological conditions, in response to time and other experimental factors of interest. The algorithm performs well in terms of the false positive and false negative rates in simulation studies. The methodology is applied to a mouse model time course experiment to correlate temporal changes in azoxymethane-induced gene expression profiles with colorectal cancer susceptibility.  相似文献   

7.
We have combined DNA microarray experiments with novel computational methods as a means of defining the topology of a biological signal transduction pathway. By DNA microarray techniques, we previously acquired data on expression over time of all genes in the yeast Saccharomyces following addition of glucose to wild-type cells and to cells mutated in one or more components of the Ras signaling network. In addition, we examined the time course of expression following activation of components of the Ras signaling network in the absence of glucose addition. In this current study, we have applied a novel theoretical and computational framework to these data to identify the network topology of the glucose signaling pathway in yeast and the role of Ras components in that network. The computational approach involves clustering genes by expression pattern, postulating a signaling network topology superstructure that includes all possible component interconnections and then evaluating the feasibility of the superstructure interconnections by optimization methods using Mixed Integer Linear Programming techniques. This approach is the first rigorous mathematical framework for addressing the biological network topology issue, and the novel formulation features the introduction of discrete variables for the connectivity and logical expressions that connect the experimental observations to the network structure. This analysis yields a topology for the glucose signaling pathway that is consistent with, and an extension of, known biological interactions in glucose signaling.  相似文献   

8.
The detection of genes that show similar profiles under different experimental conditions is often an initial step in inferring the biological significance of such genes. Visualization tools are used to identify genes with similar profiles in microarray studies. Given the large number of genes recorded in microarray experiments, gene expression data are generally displayed on a low dimensional plot, based on linear methods. However, microarray data show nonlinearity, due to high-order terms of interaction between genes, so alternative approaches, such as kernel methods, may be more appropriate. We introduce a technique that combines kernel principal component analysis (KPCA) and Biplot to visualize gene expression profiles. Our approach relies on the singular value decomposition of the input matrix and incorporates an additional step that involves KPCA. The main properties of our method are the extraction of nonlinear features and the preservation of the input variables (genes) in the output display. We apply this algorithm to colon tumor, leukemia and lymphoma datasets. Our approach reveals the underlying structure of the gene expression profiles and provides a more intuitive understanding of the gene and sample association.  相似文献   

9.
The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray datasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531-537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.  相似文献   

10.
The comparison of gene expression profiles among DNA microarray experiments enables the identification of unknown relationships among experiments to uncover the underlying biological relationships. Despite the ongoing accumulation of data in public databases, detecting biological correlations among gene expression profiles from multiple laboratories on a large scale remains difficult. Here, we applied a module (sets of genes working in the same biological action)-based correlation analysis in combination with a network analysis to Arabidopsis data and developed a 'module-based correlation network' (MCN) which represents relationships among DNA microarray experiments on a large scale. We developed a Web-based data analysis tool, 'AtCAST' (Arabidopsis thaliana: DNA Microarray Correlation Analysis Tool), which enables browsing of an MCN or mining of users' microarray data by mapping the data into an MCN. AtCAST can help researchers to find novel connections among DNA microarray experiments, which in turn will help to build new hypotheses to uncover physiological mechanisms or gene functions in Arabidopsis.  相似文献   

11.
Summary .  Time course microarray data consist of mRNA expression from a common set of genes collected at different time points. Such data are thought to reflect underlying biological processes developing over time. In this article, we propose a model that allows us to examine differential expression and gene network relationships using time course microarray data. We model each gene-expression profile as a random functional transformation of the scale, amplitude, and phase of a common curve. Inferences about the gene-specific amplitude parameters allow us to examine differential gene expression. Inferences about measures of functional similarity based on estimated time-transformation functions allow us to examine gene networks while accounting for features of the gene-expression profiles. We discuss applications to simulated data as well as to microarray data on prostate cancer progression.  相似文献   

12.
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.  相似文献   

13.
Clustering methods for microarray gene expression data   总被引:1,自引:0,他引:1  
Within the field of genomics, microarray technologies have become a powerful technique for simultaneously monitoring the expression patterns of thousands of genes under different sets of conditions. A main task now is to propose analytical methods to identify groups of genes that manifest similar expression patterns and are activated by similar conditions. The corresponding analysis problem is to cluster multi-condition gene expression data. The purpose of this paper is to present a general view of clustering techniques used in microarray gene expression data analysis.  相似文献   

14.
Kim S  Imoto S  Miyano S 《Bio Systems》2004,75(1-3):57-65
We propose a dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data. The proposed method can overcome a shortcoming of the Bayesian network model in the sense of the construction of cyclic regulations. The proposed method can analyze the microarray data as a continuous data and can capture even nonlinear relations among genes. It can be expected that this model will give a deeper insight into complicated biological systems. We also derive a new criterion for evaluating an estimated network from Bayes approach. We conduct Monte Carlo experiments to examine the effectiveness of the proposed method. We also demonstrate the proposed method through the analysis of the Saccharomyces cerevisiae gene expression data.  相似文献   

15.
16.
17.
ABSTRACT: BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS: We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS: We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.  相似文献   

18.
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi  相似文献   

19.
Fuzzy J-Means and VNS methods for clustering genes from microarray data   总被引:4,自引:0,他引:4  
MOTIVATION: In the interpretation of gene expression data from a group of microarray experiments that include samples from either different patients or conditions, special consideration must be given to the pleiotropic and epistatic roles of genes, as observed in the variation of gene coexpression patterns. Crisp clustering methods assign each gene to one cluster, thereby omitting information about the multiple roles of genes. RESULTS: Here, we present the application of a local search heuristic, Fuzzy J-Means, embedded into the variable neighborhood search metaheuristic for the clustering of microarray gene expression data. We show that for all the datasets studied this algorithm outperforms the standard Fuzzy C-Means heuristic. Different methods for the utilization of cluster membership information in determining gene coregulation are presented. The clustering and data analyses were performed on simulated datasets as well as experimental cDNA microarray data for breast cancer and human blood from the Stanford Microarray Database. AVAILABILITY: The source code of the clustering software (C programming language) is freely available from Nabil.Belacel@nrc-cnrc.gc.ca  相似文献   

20.
Lack of adequate statistical methods for the analysis of microarray data remains the most critical deterrent to uncovering the true potential of these promising techniques in basic and translational biological studies. The popular practice of drawing important biological conclusions from just one replicate (slide) should be discouraged. In this paper, we discuss some modern trends in statistical analysis of microarray data with a special focus on statistical classification (pattern recognition) and variable selection. In addressing these issues we consider the utility of some distances between random vectors and their nonparametric estimates obtained from gene expression data. Performance of the proposed distances is tested by computer simulations and analysis of gene expression data on two different types of human leukemia. In experimental settings, the error rate is estimated by cross-validation, while a control sample is generated in computer simulation experiments aimed at testing the proposed gene selection procedures and associated classification rules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号