首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently a state-space model with time delays for inferring gene regulatory networks was proposed. It was assumed that each regulation between two internal state variables had multiple time delays. This assumption caused underestimation of the model with many current gene expression datasets. In biological reality, one regulatory relationship may have just a single time delay, and not multiple time delays. This study employs Boolean variables to capture the existence of the time-delayed regulatory relationships in gene regulatory networks in terms of the state-space model. As the solution space of time delayed relationships is too large for an exhaustive search, a genetic algorithm (GA) is proposed to determine the optimal Boolean variables (the optimal time-delayed regulatory relationships). Coupled with the proposed GA, Bayesian information criterion (BIC) and probabilistic principle component analysis (PPCA) are employed to infer gene regulatory networks with time delays. Computational experiments are performed on two real gene expression datasets. The results show that the GA is effective at finding time-delayed regulatory relationships. Moreover, the inferred gene regulatory networks with time delays from the datasets improve the prediction accuracy and possess more of the expected properties of a real network, compared to a gene regulatory network without time delays.  相似文献   

2.
The primary goal of this article is to infer genetic interactions based on gene expression data. A new method for multiorganism Bayesian gene network estimation is presented based on multitask learning. When the input datasets are sparse, as is the case in microarray gene expression data, it becomes difficult to separate random correlations from true correlations that would lead to actual edges when modeling the gene interactions as a Bayesian network. Multitask learning takes advantage of the similarity between related tasks, in order to construct a more accurate model of the underlying relationships represented by the Bayesian networks. The proposed method is tested on synthetic data to illustrate its validity. Then it is iteratively applied on real gene expression data to learn the genetic regulatory networks of two organisms with homologous genes.  相似文献   

3.
MOTIVATION: Bayesian network methods have shown promise in gene regulatory network reconstruction because of their capability of capturing causal relationships between genes and handling data with noises found in biological experiments. The problem of learning network structures, however, is NP hard. Consequently, heuristic methods such as hill climbing are used for structure learning. For networks of a moderate size, hill climbing methods are not computationally efficient. Furthermore, relatively low accuracy of the learned structures may be observed. The purpose of this article is to present a novel structure learning method for gene network discovery. RESULTS: In this paper, we present a novel structure learning method to reconstruct the underlying gene networks from the observational gene expression data. Unlike hill climbing approaches, the proposed method first constructs an undirected network based on mutual information between two nodes and then splits the structure into substructures. The directional orientations for the edges that connect two nodes are then obtained by optimizing a scoring function for each substructure. Our method is evaluated using two benchmark network datasets with known structures. The results show that the proposed method can identify networks that are close to the optimal structures. It outperforms hill climbing methods in terms of both computation time and predicted structure accuracy. We also apply the method to gene expression data measured during the yeast cycle and show the effectiveness of the proposed method for network reconstruction.  相似文献   

4.
MOTIVATION: Inferring the genetic interaction mechanism using Bayesian networks has recently drawn increasing attention due to its well-established theoretical foundation and statistical robustness. However, the relative insufficiency of experiments with respect to the number of genes leads to many false positive inferences. RESULTS: We propose a novel method to infer genetic networks by alleviating the shortage of available mRNA expression data with prior knowledge. We call the proposed method 'modularized network learning' (MONET). Firstly, the proposed method divides a whole gene set to overlapped modules considering biological annotations and expression data together. Secondly, it infers a Bayesian network for each module, and integrates the learned subnetworks to a global network. An algorithm that measures a similarity between genes based on hierarchy, specificity and multiplicity of biological annotations is presented. The proposed method draws a global picture of inter-module relationships as well as a detailed look of intra-module interactions. We applied the proposed method to analyze Saccharomyces cerevisiae stress data, and found several hypotheses to suggest putative functions of unclassified genes. We also compared the proposed method with a whole-set-based approach and two expression-based clustering approaches.  相似文献   

5.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

6.
7.

Background

Dynamic aspects of gene regulatory networks are typically investigated by measuring system variables at multiple time points. Current state-of-the-art computational approaches for reconstructing gene networks directly build on such data, making a strong assumption that the system evolves in a synchronous fashion at fixed points in time. However, nowadays omics data are being generated with increasing time course granularity. Thus, modellers now have the possibility to represent the system as evolving in continuous time and to improve the models’ expressiveness.

Results

Continuous time Bayesian networks are proposed as a new approach for gene network reconstruction from time course expression data. Their performance was compared to two state-of-the-art methods: dynamic Bayesian networks and Granger causality analysis. On simulated data, the methods comparison was carried out for networks of increasing size, for measurements taken at different time granularity densities and for measurements unevenly spaced over time. Continuous time Bayesian networks outperformed the other methods in terms of the accuracy of regulatory interactions learnt from data for all network sizes. Furthermore, their performance degraded smoothly as the size of the network increased. Continuous time Bayesian networks were significantly better than dynamic Bayesian networks for all time granularities tested and better than Granger causality for dense time series. Both continuous time Bayesian networks and Granger causality performed robustly for unevenly spaced time series, with no significant loss of performance compared to the evenly spaced case, while the same did not hold true for dynamic Bayesian networks. The comparison included the IRMA experimental datasets which confirmed the effectiveness of the proposed method. Continuous time Bayesian networks were then applied to elucidate the regulatory mechanisms controlling murine T helper 17 (Th17) cell differentiation and were found to be effective in discovering well-known regulatory mechanisms, as well as new plausible biological insights.

Conclusions

Continuous time Bayesian networks were effective on networks of both small and large size and were particularly feasible when the measurements were not evenly distributed over time. Reconstruction of the murine Th17 cell differentiation network using continuous time Bayesian networks revealed several autocrine loops, suggesting that Th17 cells may be auto regulating their own differentiation process.  相似文献   

8.
MOTIVATION: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data. RESULTS: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data. AVAILABILITY: miniTUBA is available at http://www.minituba.org.  相似文献   

9.
The evolutionary relationships among members of the cetacean family Delphinidae, the dolphins, pilot whales and killer whales, are still not well understood. The genus Sotalia (coastal and riverine South American dolphins) is currently considered a member of the Stenoninae subfamily, along with the genera Steno (rough toothed dolphin) and Sousa (humpbacked dolphin). In recent years, a revision of this classification was proposed based on phylogenetic analysis of the mitochondrial gene cytochrome b, wherein Sousa was included in the Delphininae subfamily, keeping only Steno and Sotalia as members of the Stenoninae subfamily. Here we investigate the phylogenetic placement of Sotalia using two mitochondrial genes, six autosomal introns and four Y chromosome introns, providing a total of 5,196 base pairs (bp) for each taxon in the combined dataset. Sequences from these genomic regions were obtained for 17 delphinid species, including at least one species from each of five or six currently recognized subfamilies plus five odontocete outgroup species. Maximum Parsimony, Maximum Likelihood and Bayesian phylogenetic analysis of independent (each fragment) and combined datasets (mtDNA, nuDNA or mtDNA+nuDNA) showed that Sotalia and Sousa fall within a clade containing other members of Delphininae, exclusive of Steno. Sousa was resolved as the sister taxon to Sotalia according to analysis of the nuDNA dataset but not analysis of the mtDNA or combined mtDNA+nuDNA datasets. Based on the results from our multi-locus analysis, we offer several novel changes to the classification of Delphinidae, some of which are supported by previous morphological and molecular studies.  相似文献   

10.
11.
Bayesian networks are knowledge representation tools that model the (in)dependency relationships among variables for probabilistic reasoning. Classification with Bayesian networks aims to compute the class with the highest probability given a case. This special kind is referred to as Bayesian network classifiers. Since learning the Bayesian network structure from a dataset can be viewed as an optimization problem, heuristic search algorithms may be applied to build high-quality networks in medium- or large-scale problems, as exhaustive search is often feasible only for small problems. In this paper, we present our new algorithm, ABC-Miner, and propose several extensions to it. ABC-Miner uses ant colony optimization for learning the structure of Bayesian network classifiers. We report extended computational results comparing the performance of our algorithm with eight other classification algorithms, namely six variations of well-known Bayesian network classifiers, cAnt-Miner for discovering classification rules and a support vector machine algorithm.  相似文献   

12.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

13.
14.
15.
We analyzed sequence variation for the alcohol dehydrogenase (Adh) gene family in Carex section Acrocystis (Cyperaceae) to reconstruct Adh gene trees for Acrocystis species and to characterize the structure of the Adh gene family in Carex. Two Adh loci were included with ITS and ETS sequences in a combined Bayesian inference analysis of Carex section Acrocystis to gain a better understanding of species relationships in the section. In addition, we comment on how the results presented here contribute to our knowledge of the birth-death process of the Adh gene family in angiosperms. It appears that the structure of the Adh gene family in Carex is complex with possibly six loci present in the gene family. Additionally, variation among Acrocystis species within loci is quite low, and there is little phylogenetic resolution in the individual datasets. Bayesian inference analysis of the combined ITS, ETS, Adh1, and Adh2 datasets resulted in a moderately well-supported phylogenetic hypothesis of relationships in the section which is discussed in relation to previous hypotheses of relationships.  相似文献   

16.
We propose methods to integrate data across several genomic platforms using a hierarchical Bayesian analysis framework that incorporates the biological relationships among the platforms to identify genes whose expression is related to clinical outcomes in cancer. This integrated approach combines information across all platforms, leading to increased statistical power in finding these predictive genes, and further provides mechanistic information about the manner in which the gene affects the outcome. We demonstrate the advantages of the shrinkage estimation used by this approach through a simulation, and finally, we apply our method to a Glioblastoma Multiforme dataset and identify several genes potentially associated with the patients’ survival. We find 12 positive prognostic markers associated with nine genes and 13 negative prognostic markers associated with nine genes.  相似文献   

17.
Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods:
  • Support Vector Machine Recursive Feature Elimination (SVMRFE)
  • Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS)
  • Gradient based Leave-one-out Gene Selection (GLGS)
To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors.  相似文献   

18.
The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR), an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes) network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.  相似文献   

19.
MOTIVATION: Biological processes in cells are properly performed by gene regulations, signal transductions and interactions between proteins. To understand such molecular networks, we propose a statistical method to estimate gene regulatory networks and protein-protein interaction networks simultaneously from DNA microarray data, protein-protein interaction data and other genome-wide data. RESULTS: We unify Bayesian networks and Markov networks for estimating gene regulatory networks and protein-protein interaction networks according to the reliability of each biological information source. Through the simultaneous construction of gene regulatory networks and protein-protein interaction networks of Saccharomyces cerevisiae cell cycle, we predict the role of several genes whose functions are currently unknown. By using our probabilistic model, we can detect false positives of high-throughput data, such as yeast two-hybrid data. In a genome-wide experiment, we find possible gene regulatory relationships and protein-protein interactions between large protein complexes that underlie complex regulatory mechanisms of biological processes.  相似文献   

20.

Background

Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes.

Results

This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations.

Conclusions

A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号