首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Perturbation experiments for example using RNA interference (RNAi) offer an attractive way to elucidate gene function in a high throughput fashion. The placement of hit genes in their functional context and the inference of underlying networks from such data, however, are challenging tasks. One of the problems in network inference is the exponential number of possible network topologies for a given number of genes. Here, we introduce a novel mathematical approach to address this question. We formulate network inference as a linear optimization problem, which can be solved efficiently even for large-scale systems. We use simulated data to evaluate our approach, and show improved performance in particular on larger networks over state-of-the art methods. We achieve increased sensitivity and specificity, as well as a significant reduction in computing time. Furthermore, we show superior performance on noisy data. We then apply our approach to study the intracellular signaling of human primary nave CD4+ T-cells, as well as ErbB signaling in trastuzumab resistant breast cancer cells. In both cases, our approach recovers known interactions and points to additional relevant processes. In ErbB signaling, our results predict an important role of negative and positive feedback in controlling the cell cycle progression.  相似文献   

2.
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.  相似文献   

3.
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.  相似文献   

4.

Background

Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system’s response after systematic perturbations are available.

Results

We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway.

Conclusions

Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-250) contains supplementary material, which is available to authorized users.  相似文献   

5.
MOTIVATION: Although many network inference algorithms have been presented in the bioinformatics literature, no suitable approach has been formulated for evaluating their effectiveness at recovering models of complex biological systems from limited data. To overcome this limitation, we propose an approach to evaluate network inference algorithms according to their ability to recover a complex functional network from biologically reasonable simulated data. RESULTS: We designed a simulator to generate data representing a complex biological system at multiple levels of organization: behaviour, neural anatomy, brain electrophysiology, and gene expression of songbirds. About 90% of the simulated variables are unregulated by other variables in the system and are included simply as distracters. We sampled the simulated data at intervals as one would sample from a biological system in practice, and then used the sampled data to evaluate the effectiveness of an algorithm we developed for functional network inference. We found that our algorithm is highly effective at recovering the functional network structure of the simulated system-including the irrelevance of unregulated variables-from sampled data alone. To assess the reproducibility of these results, we tested our inference algorithm on 50 separately simulated sets of data and it consistently recovered almost perfectly the complex functional network structure underlying the simulated data. To our knowledge, this is the first approach for evaluating the effectiveness of functional network inference algorithms at recovering models from limited data. Our simulation approach also enables researchers a priori to design experiments and data-collection protocols that are amenable to functional network inference.  相似文献   

6.
7.
8.
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/  相似文献   

9.
10.
The Dialogue for Reverse Engineering Assessments and Methods (DREAM) project was initiated in 2006 as a community-wide effort for the development of network inference challenges for rigorous assessment of reverse engineering methods for biological networks. We participated in the in silico network inference challenge of DREAM3 in 2008. Here we report the details of our approach and its performance on the synthetic challenge datasets. In our methodology, we first developed a model called relative change ratio (RCR), which took advantage of the heterozygous knockdown data and null-mutant knockout data provided by the challenge, in order to identify the potential regulators for the genes. With this information, a time-delayed dynamic Bayesian network (TDBN) approach was then used to infer gene regulatory networks from time series trajectory datasets. Our approach considerably reduced the searching space of TDBN; hence, it gained a much higher efficiency and accuracy. The networks predicted using our approach were evaluated comparatively along with 29 other submissions by two metrics (area under the ROC curve and area under the precision-recall curve). The overall performance of our approach ranked the second among all participating teams.  相似文献   

11.
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons.  相似文献   

12.
Mathematical models are extensively employed to understand physicochemical processes in biological systems. In the absence of detailed mechanistic knowledge, models are often based on network inference methods, which in turn rely upon perturbations to nodes by biochemical means. We have discovered a potential pitfall of the approach underpinning such methods when applied to signaling networks. We first show experimentally, and then explain mathematically, how even in the simplest signaling systems, perturbation methods may lead to paradoxical conclusions: for any given pair of two components X and Y, and depending upon the specific intervention on Y, either an activation or a repression of X could be inferred. This effect is of a different nature from incomplete network identification due to underdetermined data and is a phenomenon intrinsic to perturbations. Our experiments are performed in an in vitro minimal system, thus isolating the effect and showing that it cannot be explained by feedbacks due to unknown intermediates. Moreover, our in vitro system utilizes proteins from a pathway in mammalian (and other eukaryotic) cells that play a central role in proliferation, gene expression, differentiation, mitosis, cell survival, and apoptosis. This pathway is the perturbation target of contemporary therapies for various types of cancers. The results presented here show that the simplistic view of intracellular signaling networks being made up of activation and repression links is seriously misleading, and call for a fundamental rethinking of signaling network analysis and inference methods.  相似文献   

13.
Mathematical models are extensively employed to understand physicochemical processes in biological systems. In the absence of detailed mechanistic knowledge, models are often based on network inference methods, which in turn rely upon perturbations to nodes by biochemical means. We have discovered a potential pitfall of the approach underpinning such methods when applied to signaling networks. We first show experimentally, and then explain mathematically, how even in the simplest signaling systems, perturbation methods may lead to paradoxical conclusions: for any given pair of two components X and Y, and depending upon the specific intervention on Y, either an activation or a repression of X could be inferred. This effect is of a different nature from incomplete network identification due to underdetermined data and is a phenomenon intrinsic to perturbations. Our experiments are performed in an in vitro minimal system, thus isolating the effect and showing that it cannot be explained by feedbacks due to unknown intermediates. Moreover, our in vitro system utilizes proteins from a pathway in mammalian (and other eukaryotic) cells that play a central role in proliferation, gene expression, differentiation, mitosis, cell survival, and apoptosis. This pathway is the perturbation target of contemporary therapies for various types of cancers. The results presented here show that the simplistic view of intracellular signaling networks being made up of activation and repression links is seriously misleading, and call for a fundamental rethinking of signaling network analysis and inference methods.  相似文献   

14.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

15.
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.  相似文献   

16.
17.
Beal J  Lu T  Weiss R 《PloS one》2011,6(8):e22490

Background

The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry.

Methodology/Principal Findings

To address the gap between DNA synthesis and circuit design capabilities, we present a platform that enables synthetic biologists to express desired behavior using a convenient high-level biologically-oriented programming language, Proto. The high level specification is compiled, using a regulatory motif based mechanism, to a gene network, optimized, and then converted to a computational simulation for numerical verification. Through several example programs we illustrate the automated process of biological system design with our platform, and show that our compiler optimizations can yield significant reductions in the number of genes () and latency of the optimized engineered gene networks.

Conclusions/Significance

Our platform provides a convenient and accessible tool for the automated design of sophisticated synthetic biological systems, bridging an important gap between DNA synthesis and circuit design capabilities. Our platform is user-friendly and features biologically relevant compiler optimizations, providing an important foundation for the development of sophisticated biological systems.  相似文献   

18.
Managing the overwhelming numbers of molecular states and interactions is a fundamental obstacle to building predictive models of biological systems. Here we introduce the Network-Free Stochastic Simulator (NFsim), a general-purpose modeling platform that overcomes the combinatorial nature of molecular interactions. Unlike standard simulators that represent molecular species as variables in equations, NFsim uses a biologically intuitive representation: objects with binding and modification sites acted on by reaction rules. During simulations, rules operate directly on molecular objects to produce exact stochastic results with performance that scales independently of the reaction network size. Reaction rates can be defined as arbitrary functions of molecular states to provide powerful coarse-graining capabilities, for example to merge Boolean and kinetic representations of biological networks. NFsim enables researchers to simulate many biological systems that were previously inaccessible to general-purpose software, as we illustrate with models of immune system signaling, microbial signaling, cytoskeletal assembly and oscillating gene expression.  相似文献   

19.
Plant protein-protein interaction networks have not been identified by large-scale experiments. In order to better understand the protein interactions in rice, the Predicted Rice Interactome Network (PRIN; http://bis.zju.edu.cn/prin/) presented 76,585 predicted interactions involving 5,049 rice proteins. After mapping genomic features of rice (GO annotation, subcellular localization prediction, and gene expression), we found that a well-annotated and biologically significant network is rich enough to capture many significant functional linkages within higher-order biological systems, such as pathways and biological processes. Furthermore, we took MADS-box domain-containing proteins and circadian rhythm signaling pathways as examples to demonstrate that functional protein complexes and biological pathways could be effectively expanded in our predicted network. The expanded molecular network in PRIN has considerably improved the capability of these analyses to integrate existing knowledge and provide novel insights into the function and coordination of genes and gene networks.  相似文献   

20.
Liu B  de la Fuente A  Hoeschele I 《Genetics》2008,178(3):1763-1776
Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号