首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/  相似文献   

2.
Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms.  相似文献   

3.
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.  相似文献   

4.
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.  相似文献   

5.
Biological systems are traditionally studied by focusing on a specific subsystem, building an intuitive model for it, and refining the model using results from carefully designed experiments. Modern experimental techniques provide massive data on the global behavior of biological systems, and systematically using these large datasets for refining existing knowledge is a major challenge. Here we introduce an extended computational framework that combines formalization of existing qualitative models, probabilistic modeling, and integration of high-throughput experimental data. Using our methods, it is possible to interpret genomewide measurements in the context of prior knowledge on the system, to assign statistical meaning to the accuracy of such knowledge, and to learn refined models with improved fit to the experiments. Our model is represented as a probabilistic factor graph, and the framework accommodates partial measurements of diverse biological elements. We study the performance of several probabilistic inference algorithms and show that hidden model variables can be reliably inferred even in the presence of feedback loops and complex logic. We show how to refine prior knowledge on combinatorial regulatory relations using hypothesis testing and derive p-values for learned model features. We test our methodology and algorithms on a simulated model and on two real yeast models. In particular, we use our method to explore uncharacterized relations among regulators in the yeast response to hyper-osmotic shock and in the yeast lysine biosynthesis system. Our integrative approach to the analysis of biological regulation is demonstrated to synergistically combine qualitative and quantitative evidence into concrete biological predictions.  相似文献   

6.
Reverse engineering is the problem of inferring the structure of a network of interactions between biological variables from a set of observations. In this paper, we propose an optimization algorithm, called MORE, for the reverse engineering of biological networks from time series data. The model inferred by MORE is a sparse system of nonlinear differential equations, complex enough to realistically describe the dynamics of a biological system. MORE tackles separately the discrete component of the problem, the determination of the biological network topology, and the continuous component of the problem, the strength of the interactions. This approach allows us both to enforce system sparsity, by globally constraining the number of edges, and to integrate a priori information about the structure of the underlying interaction network. Experimental results on simulated and real-world networks show that the mixed discrete/continuous optimization approach of MORE significantly outperforms standard continuous optimization and that MORE is competitive with the state of the art in terms of accuracy of the inferred networks.  相似文献   

7.
We introduce here the concept of Implicit networks which provide, like Bayesian networks, a graphical modelling framework that encodes the joint probability distribution for a set of random variables within a directed acyclic graph. We show that Implicit networks, when used in conjunction with appropriate statistical techniques, are very attractive for their ability to understand and analyze biological data. Particularly, we consider here the use of Implicit networks for causal inference in biomolecular pathways. In such pathways, an Implicit network encodes dependencies among variables (proteins, genes), can be trained to learn causal relationships (regulation, interaction) between them and then used to predict the biological response given the status of some key proteins or genes in the network. We show that Implicit networks offer efficient methodologies for learning from observations without prior knowledge and thus provide a good alternative to classical inference in Bayesian networks when priors are missing. We illustrate our approach by an application to simulated data for a simplified signal transduction pathway of the epidermal growth factor receptor (EGFR) protein.  相似文献   

8.
One of the most important goals of biological investigation is to uncover gene functional relations. In this study we propose a framework for extraction and integration of gene functional relations from diverse biological data sources, including gene expression data, biological literature and genomic sequence information. We introduce a two-layered Bayesian network approach to integrate relations from multiple sources into a genome-wide functional network. An experimental study was conducted on a test-bed of Arabidopsis thaliana. Evaluation of the integrated network demonstrated that relation integration could improve the reliability of relations by combining evidence from different data sources. Domain expert judgments on the gene functional clusters in the network confirmed the validity of our approach for relation integration and network inference.  相似文献   

9.
To understand the function of protein complexes and their association with biological processes, a lot of studies have been done towards analyzing the protein-protein interaction (PPI) networks. However, the advancement in high-throughput technology has resulted in a humongous amount of data for analysis. Moreover, high level of noise, sparseness, and skewness in degree distribution of PPI networks limits the performance of many clustering algorithms and further analysis of their interactions.In addressing and solving these problems we present a novel random walk based algorithm that converts the incomplete and binary PPI network into a protein-protein topological similarity matrix (PP-TS matrix). We believe that if two proteins share some high-order topological similarities they are likely to be interacting with each other. Using the obtained PP-TS matrix, we constructed and used weighted networks to further study and analyze the interaction among proteins. Specifically, we applied a fully automated community structure finding algorithm (Auto-HQcut) on the obtained weighted network to cluster protein complexes. We then analyzed the protein complexes for significance in biological processes. To help visualize and analyze these protein complexes we also developed an interface that displays the resulting complexes as well as the characteristics associated with each complex.Applying our approach to a yeast protein-protein interaction network, we found that the predicted protein-protein interaction pairs with high topological similarities have more significant biological relevance than the original protein-protein interactions pairs. When we compared our PPI network reconstruction algorithm with other existing algorithms using gene ontology and gene co-expression, our algorithm produced the highest similarity scores. Also, our predicted protein complexes showed higher accuracy measure compared to the other protein complex predictions.  相似文献   

10.
In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16th century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver''s middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines.  相似文献   

11.
Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.  相似文献   

12.
Quantitative time-series observation of gene expression is becoming possible, for example by cell array technology. However, there are no practical methods with which to infer network structures using only observed time-series data. As most computational models of biological networks for continuous time-series data have a high degree of freedom, it is almost impossible to infer the correct structures. On the other hand, it has been reported that some kinds of biological networks, such as gene networks and metabolic pathways, may have scale-free properties. We hypothesize that the architecture of inferred biological network models can be restricted to scale-free networks. We developed an inference algorithm for biological networks using only time-series data by introducing such a restriction. We adopt the S-system as the network model, and a distributed genetic algorithm to optimize models to fit its simulated results to observed time series data. We have tested our algorithm on a case study (simulated data). We compared optimization under no restriction, which allows for a fully connected network, and under the restriction that the total number of links must equal that expected from a scale free network. The restriction reduced both false positive and false negative estimation of the links and also the differences between model simulation and the given time-series data.  相似文献   

13.
14.
Systems biology aims to study the properties of biological systems in terms of the properties of their molecular constituents. This occurs frequently by a process of mathematical modelling. The first step in this modelling process is to unravel the interaction structure of biological systems from experimental data. Previously, an algorithm for gene network inference from gene expression perturbation data was proposed. Here, the algorithm is extended by using regression with subset selection. The performance of the algorithm is extensively evaluated on a set of data produced with gene network models at different levels of simulated experimental noise. Regression with subset selection outperforms the previously stated matrix inverse approach in the presence of experimental noise. Furthermore, this regression approach enables us to deal with under-determination, that is, when not all genes are perturbed. The results on incomplete data sets show that the new method performs well at higher number of perturbations, even when noise levels are high. At lower number of perturbations, although still being able to recover the majority of the connections, less confidence can be placed in the recovered edges.  相似文献   

15.
16.
The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called "GenoSpectrum") as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses.  相似文献   

17.
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.  相似文献   

18.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

19.

Background

The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited.

Results

In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository.

Conclusions

The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods.

Reviewers

This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号