首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis–Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.  相似文献   

2.
Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data.  相似文献   

3.
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.  相似文献   

4.
The Dialogue for Reverse Engineering Assessments and Methods (DREAM) project was initiated in 2006 as a community-wide effort for the development of network inference challenges for rigorous assessment of reverse engineering methods for biological networks. We participated in the in silico network inference challenge of DREAM3 in 2008. Here we report the details of our approach and its performance on the synthetic challenge datasets. In our methodology, we first developed a model called relative change ratio (RCR), which took advantage of the heterozygous knockdown data and null-mutant knockout data provided by the challenge, in order to identify the potential regulators for the genes. With this information, a time-delayed dynamic Bayesian network (TDBN) approach was then used to infer gene regulatory networks from time series trajectory datasets. Our approach considerably reduced the searching space of TDBN; hence, it gained a much higher efficiency and accuracy. The networks predicted using our approach were evaluated comparatively along with 29 other submissions by two metrics (area under the ROC curve and area under the precision-recall curve). The overall performance of our approach ranked the second among all participating teams.  相似文献   

5.
We introduce here the concept of Implicit networks which provide, like Bayesian networks, a graphical modelling framework that encodes the joint probability distribution for a set of random variables within a directed acyclic graph. We show that Implicit networks, when used in conjunction with appropriate statistical techniques, are very attractive for their ability to understand and analyze biological data. Particularly, we consider here the use of Implicit networks for causal inference in biomolecular pathways. In such pathways, an Implicit network encodes dependencies among variables (proteins, genes), can be trained to learn causal relationships (regulation, interaction) between them and then used to predict the biological response given the status of some key proteins or genes in the network. We show that Implicit networks offer efficient methodologies for learning from observations without prior knowledge and thus provide a good alternative to classical inference in Bayesian networks when priors are missing. We illustrate our approach by an application to simulated data for a simplified signal transduction pathway of the epidermal growth factor receptor (EGFR) protein.  相似文献   

6.

Background

Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system’s response after systematic perturbations are available.

Results

We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway.

Conclusions

Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-250) contains supplementary material, which is available to authorized users.  相似文献   

7.
8.
9.
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.  相似文献   

10.
11.
TH Chueh  HH Lu 《PloS one》2012,7(8):e42095
One great challenge of genomic research is to efficiently and accurately identify complex gene regulatory networks. The development of high-throughput technologies provides numerous experimental data such as DNA sequences, protein sequence, and RNA expression profiles makes it possible to study interactions and regulations among genes or other substance in an organism. However, it is crucial to make inference of genetic regulatory networks from gene expression profiles and protein interaction data for systems biology. This study will develop a new approach to reconstruct time delay Boolean networks as a tool for exploring biological pathways. In the inference strategy, we will compare all pairs of input genes in those basic relationships by their corresponding [Formula: see text]-scores for every output gene. Then, we will combine those consistent relationships to reveal the most probable relationship and reconstruct the genetic network. Specifically, we will prove that [Formula: see text] state transition pairs are sufficient and necessary to reconstruct the time delay Boolean network of [Formula: see text] nodes with high accuracy if the number of input genes to each gene is bounded. We also have implemented this method on simulated and empirical yeast gene expression data sets. The test results show that this proposed method is extensible for realistic networks.  相似文献   

12.
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster—to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.  相似文献   

13.
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.  相似文献   

14.
15.
Mathematical models are extensively employed to understand physicochemical processes in biological systems. In the absence of detailed mechanistic knowledge, models are often based on network inference methods, which in turn rely upon perturbations to nodes by biochemical means. We have discovered a potential pitfall of the approach underpinning such methods when applied to signaling networks. We first show experimentally, and then explain mathematically, how even in the simplest signaling systems, perturbation methods may lead to paradoxical conclusions: for any given pair of two components X and Y, and depending upon the specific intervention on Y, either an activation or a repression of X could be inferred. This effect is of a different nature from incomplete network identification due to underdetermined data and is a phenomenon intrinsic to perturbations. Our experiments are performed in an in vitro minimal system, thus isolating the effect and showing that it cannot be explained by feedbacks due to unknown intermediates. Moreover, our in vitro system utilizes proteins from a pathway in mammalian (and other eukaryotic) cells that play a central role in proliferation, gene expression, differentiation, mitosis, cell survival, and apoptosis. This pathway is the perturbation target of contemporary therapies for various types of cancers. The results presented here show that the simplistic view of intracellular signaling networks being made up of activation and repression links is seriously misleading, and call for a fundamental rethinking of signaling network analysis and inference methods.  相似文献   

16.
Mathematical models are extensively employed to understand physicochemical processes in biological systems. In the absence of detailed mechanistic knowledge, models are often based on network inference methods, which in turn rely upon perturbations to nodes by biochemical means. We have discovered a potential pitfall of the approach underpinning such methods when applied to signaling networks. We first show experimentally, and then explain mathematically, how even in the simplest signaling systems, perturbation methods may lead to paradoxical conclusions: for any given pair of two components X and Y, and depending upon the specific intervention on Y, either an activation or a repression of X could be inferred. This effect is of a different nature from incomplete network identification due to underdetermined data and is a phenomenon intrinsic to perturbations. Our experiments are performed in an in vitro minimal system, thus isolating the effect and showing that it cannot be explained by feedbacks due to unknown intermediates. Moreover, our in vitro system utilizes proteins from a pathway in mammalian (and other eukaryotic) cells that play a central role in proliferation, gene expression, differentiation, mitosis, cell survival, and apoptosis. This pathway is the perturbation target of contemporary therapies for various types of cancers. The results presented here show that the simplistic view of intracellular signaling networks being made up of activation and repression links is seriously misleading, and call for a fundamental rethinking of signaling network analysis and inference methods.  相似文献   

17.
18.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.  相似文献   

19.
20.
Study of signaling networks is important for a better understanding of cell behaviors e.g., growth, differentiation, metabolism, proptosis, and gaining deeper insights into the molecular mechanisms of complex diseases. While there have been many successes in developing computational approaches for identifying potential genes and proteins involved in cell signaling, new methods are needed for identifying network structures that depict underlying signal cascading mechanisms. In this paper, we propose a new computational approach for inferring signaling network structures from overlapping gene sets related to the networks. In the proposed approach, a signaling network is represented as a directed graph and is viewed as a union of many active paths representing linear and overlapping chains of signal cascading activities in the network. Gene sets represent the sets of genes participating in active paths without prior knowledge of the order in which genes occur within each path. From a compendium of unordered gene sets, the proposed algorithm reconstructs the underlying network structure through evolution of synergistic active paths. In our context, the extent of edge overlapping among active paths is used to define the synergy present in a network. We evaluated the performance of the proposed algorithm in terms of its convergence and recovering true active paths by utilizing four gene set compendiums derived from the KEGG database. Evaluation of results demonstrate the ability of the algorithm in reconstructing the underlying networks with high accuracy and precision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号