首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.  相似文献   

2.
ABSTRACT: BACKGROUND: Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks. RESULTS: To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques. CONCLUSIONS: Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.  相似文献   

3.
In the post-genomic biology era,the reconstruction of gene regulatory networks from microarray gene expression data isvery important to understand the underlying biological system,and it has been a challenging task in bioinformatics.TheBayesian network model has been used in reconstructing the gene regulatory network for its advantages,but how to determinethe network structure and parameters is still important to be explored.This paper proposes a two-stage structure learning algorithmwhich integrates immune evolution algorithm to build a Bayesian network.The new algorithm is evaluated with the use ofboth simulated and yeast cell cycle data.The experimental results indicate that the proposed algorithm can find many of theknown real regulatory relationships from literature and predict the others unknown with high validity and accuracy.  相似文献   

4.
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis–Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.  相似文献   

5.
The primary goal of this article is to infer genetic interactions based on gene expression data. A new method for multiorganism Bayesian gene network estimation is presented based on multitask learning. When the input datasets are sparse, as is the case in microarray gene expression data, it becomes difficult to separate random correlations from true correlations that would lead to actual edges when modeling the gene interactions as a Bayesian network. Multitask learning takes advantage of the similarity between related tasks, in order to construct a more accurate model of the underlying relationships represented by the Bayesian networks. The proposed method is tested on synthetic data to illustrate its validity. Then it is iteratively applied on real gene expression data to learn the genetic regulatory networks of two organisms with homologous genes.  相似文献   

6.
Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.  相似文献   

7.
8.
9.
Recent advances in high-throughput DNA microarrays and chromatin immunoprecipitation (ChIP) assays have enabled the learning of the structure and functionality of genetic regulatory networks. In light of these heterogeneous data sets, this paper proposes a novel approach for reconstruction of genetic regulatory networks based on the posterior probabilities of gene regulations. Built within the framework of Bayesian statistics and computational Monte Carlo techniques, the proposed approach prevents the dichotomy of classifying gene interactions as either being connected or disconnected, thereby it reduces significantly the inference errors. Simulation results corroborate the superior performance of the proposed approach relative to the existing state-of-the-art algorithms. A genetic regulatory network for Saccharomyces cerevisiae is inferred based on the published real data sets, and biological meaningful results are discussed.  相似文献   

10.
Duarte CW  Zeng ZB 《Genetics》2011,187(3):955-964
Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.  相似文献   

11.
Maximum likelihood haplotyping for general pedigrees   总被引:3,自引:0,他引:3  
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.  相似文献   

12.
MOTIVATION: Bayesian network methods have shown promise in gene regulatory network reconstruction because of their capability of capturing causal relationships between genes and handling data with noises found in biological experiments. The problem of learning network structures, however, is NP hard. Consequently, heuristic methods such as hill climbing are used for structure learning. For networks of a moderate size, hill climbing methods are not computationally efficient. Furthermore, relatively low accuracy of the learned structures may be observed. The purpose of this article is to present a novel structure learning method for gene network discovery. RESULTS: In this paper, we present a novel structure learning method to reconstruct the underlying gene networks from the observational gene expression data. Unlike hill climbing approaches, the proposed method first constructs an undirected network based on mutual information between two nodes and then splits the structure into substructures. The directional orientations for the edges that connect two nodes are then obtained by optimizing a scoring function for each substructure. Our method is evaluated using two benchmark network datasets with known structures. The results show that the proposed method can identify networks that are close to the optimal structures. It outperforms hill climbing methods in terms of both computation time and predicted structure accuracy. We also apply the method to gene expression data measured during the yeast cycle and show the effectiveness of the proposed method for network reconstruction.  相似文献   

13.
14.
MOTIVATION: Inferring the genetic interaction mechanism using Bayesian networks has recently drawn increasing attention due to its well-established theoretical foundation and statistical robustness. However, the relative insufficiency of experiments with respect to the number of genes leads to many false positive inferences. RESULTS: We propose a novel method to infer genetic networks by alleviating the shortage of available mRNA expression data with prior knowledge. We call the proposed method 'modularized network learning' (MONET). Firstly, the proposed method divides a whole gene set to overlapped modules considering biological annotations and expression data together. Secondly, it infers a Bayesian network for each module, and integrates the learned subnetworks to a global network. An algorithm that measures a similarity between genes based on hierarchy, specificity and multiplicity of biological annotations is presented. The proposed method draws a global picture of inter-module relationships as well as a detailed look of intra-module interactions. We applied the proposed method to analyze Saccharomyces cerevisiae stress data, and found several hypotheses to suggest putative functions of unclassified genes. We also compared the proposed method with a whole-set-based approach and two expression-based clustering approaches.  相似文献   

15.
16.
Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms.  相似文献   

17.
MOTIVATION: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers. RESULTS: We present a BNFinder software, which allows for Bayesian network reconstruction from experimental data. It supports dynamic Bayesian networks and, if the variables are partially ordered, also static Bayesian networks. The main advantage of BNFinder is the use exact algorithm, which is at the same time very efficient (polynomial with respect to the number of observations).  相似文献   

18.
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.  相似文献   

19.
目的:由基因芯片数据精确学习建模具有异步多时延表达调控关系的基因调控网络。方法:提出了一种高阶动态贝叶斯网 络模型,并给出了网络结构学习算法,该模型假定基因的调控过程为多阶马尔科夫过程,从而能够建模基因调控网络中的异步多 时延特性。结果:由酵母基因调控网络一个子网络人工生成了加入10%含噪声的表达数据用于调控网络结构学习。在75%的后验 概率下,本文提出的高阶动态贝叶斯网络模型能够正确建模实际网络中全部的异步多时延调控关系,而经典动态贝叶斯网络仅 能够正确建模实际网络中1/3的调控关系;ROC曲线对比表明在各个后验概率水平上高阶动态贝叶斯网络模型的效果均优于经 典动态贝叶斯网络。结论:本文提出的高阶动态贝叶斯网络模型能够精确学习建模具有异步多时延表达调控关系的基因调控网 络。  相似文献   

20.

Background  

Genes interact with each other as basic building blocks of life, forming a complicated network. The relationship between groups of genes with different functions can be represented as gene networks. With the deposition of huge microarray data sets in public domains, study on gene networking is now possible. In recent years, there has been an increasing interest in the reconstruction of gene networks from gene expression data. Recent work includes linear models, Boolean network models, and Bayesian networks. Among them, Bayesian networks seem to be the most effective in constructing gene networks. A major problem with the Bayesian network approach is the excessive computational time. This problem is due to the interactive feature of the method that requires large search space. Since fitting a model by using the copulas does not require iterations, elicitation of the priors, and complicated calculations of posterior distributions, the need for reference to extensive search spaces can be eliminated leading to manageable computational affords. Bayesian network approach produces a discretely expression of conditional probabilities. Discreteness of the characteristics is not required in the copula approach which involves use of uniform representation of the continuous random variables. Our method is able to overcome the limitation of Bayesian network method for gene-gene interaction, i.e. information loss due to binary transformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号