首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. METHODS: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. RESULTS: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.  相似文献   

2.
The advent of high-throughput metagenomic sequencing has prompted the development of efficient taxonomic profiling methods allowing to measure the presence, abundance and phylogeny of organisms in a wide range of environmental samples. Multivariate sequence-derived abundance data further has the potential to enable inference of ecological associations between microbial populations, but several technical issues need to be accounted for, like the compositional nature of the data, its extreme sparsity and overdispersion, as well as the frequent need to operate in under-determined regimes.The ecological network reconstruction problem is frequently cast into the paradigm of Gaussian Graphical Models (GGMs) for which efficient structure inference algorithms are available, like the graphical lasso and neighborhood selection. Unfortunately, GGMs or variants thereof can not properly account for the extremely sparse patterns occurring in real-world metagenomic taxonomic profiles. In particular, structural zeros (as opposed to sampling zeros) corresponding to true absences of biological signals fail to be properly handled by most statistical methods.We present here a zero-inflated log-normal graphical model (available at https://github.com/vincentprost/Zi-LN) specifically aimed at handling such “biological” zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of microbial association networks, with most notable gains obtained when analyzing taxonomic profiles displaying sparsity levels on par with real-world metagenomic datasets.  相似文献   

3.
4.
Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such genetic networks. However, most of the available methods for constructing Gaussian graphs do not account for the sparsity of the networks and are computationally more demanding or infeasible, especially in the settings of high dimension and low sample size. We introduce a threshold gradient descent (TGD) regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models and demonstrate its application to identifying genetic networks. Such a procedure is computationally feasible and can easily incorporate prior biological knowledge about the network structure. Simulation results indicate that the proposed method yields a better estimate of the precision matrix than the procedures that fail to account for the sparsity of the graphs. We also present the results on inference of a gene network for isoprenoid biosynthesis in Arabidopsis thaliana. These results demonstrate that the proposed procedure can indeed identify biologically meaningful genetic networks based on microarray gene expression data.  相似文献   

5.
6.
In today's world, it is becoming increasingly important to have the tools to understand, and ultimately to predict, the response of ecosystems to disturbance. However, understanding such dynamics is not simple. Ecosystems are a complex network of species interactions, and therefore any change to a population of one species will have some degree of community level effect. In recent years, the use of Bayesian networks (BNs) has seen successful applications in molecular biology and ecology, where they were able to recover plausible links in the respective systems they were applied to. The recovered network also comes with a quantifiable metric of interaction strength between variables. While the latter is an invaluable piece of information in ecology, an unexplored application of BNs would be using them as a novel variable selection tool in the training of predictive models. To this end, we evaluate the potential usefulness of BNs in two aspects: (1) we apply BN inference on species abundance data from a rocky shore ecosystem, a system with well documented links, to test the ecological validity of the revealed network; and (2) we evaluate BNs as a novel variable selection method to guide the training of an artificial neural network (ANN). Here, we demonstrate that not only was this approach able to recover meaningful species interactions networks from ecological data, but it also served as a meaningful tool to inform the training of predictive models, where there was an improvement in predictive performance in models with BN variable selection. Combining these results, we demonstrate the potential of this novel application of BNs in enhancing the interpretability and predictive power of ecological models; this has general applicability beyond the studied system, to ecosystems where existing relationships between species and other functional components are unknown.  相似文献   

7.
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/  相似文献   

8.
During the last decade the development of high-throughput biotechnologies has resulted in the production of exponentially expanding quantities of biological data, such as genomic and proteomic expression data. One fundamental problem in systems biology is to learn the architecture of biochemical pathways and regulatory networks in an inferential way from such postgenomic data. Along with the increasing amount of available data, a lot of novel statistical methods have been developed and proposed in the literature. This article gives a non-mathematical overview of three widely used reverse engineering methods, namely relevance networks, graphical Gaussian models, and Bayesian networks, whereby the focus is on their relative merits and shortcomings. In addition the reverse engineering results of these graphical methods on cytometric protein data from the RAF-signalling network are cross-compared via AUROC scatter plots.  相似文献   

9.
Cross‐sectional studies may shed light on the evolution of a disease like cancer through the comparison of patient traits among disease stages. This problem is especially challenging when a gene–gene interaction network needs to be reconstructed from omics data, and, in addition, the patients of each stage need not form a homogeneous group. Here, the problem is operationalized as the estimation of stage‐wise mixtures of Gaussian graphical models (GGMs) from high‐dimensional data. These mixtures are fitted by a (fused) ridge penalized EM algorithm. The fused ridge penalty shrinks GGMs of contiguous stages. The (fused) ridge penalty parameters are chosen through cross‐validation. The proposed estimation procedures are shown to be consistent and their performance in other respects is studied in simulation. The down‐stream exploitation of the fitted GGMs is outlined. In a data illustration the methodology is employed to identify gene–gene interaction network changes in the transition from normal to cancer prostate tissue.  相似文献   

10.
11.
Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.  相似文献   

12.
MOTIVATION: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene-expression data is problematic. Most gene-expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do not scale to such high-dimensional databases. This means that the user has to decide in advance which genes are included in the learning process, typically no more than a few hundreds, and which genes are excluded from it. This is not a trivial decision. We propose an alternative approach to overcome this problem. RESULTS: We propose a new algorithm for learning BN models of gene networks from gene-expression data. Our algorithm receives a seed gene S and a positive integer R from the user, and returns a BN for the genes that depend on S such that less than R other genes mediate the dependency. Our algorithm grows the BN, which initially only contains S, by repeating the following step R + 1 times and, then, pruning some genes; find the parents and children of all the genes in the BN and add them to it. Intuitively, our algorithm provides the user with a window of radius R around S to look at the BN model of a gene network without having to exclude any gene in advance. We prove that our algorithm is correct under the faithfulness assumption. We evaluate our algorithm on simulated and biological data (Rosetta compendium) with satisfactory results.  相似文献   

13.
14.
15.
Maximum Number of Fixed Points in Regulatory Boolean Networks   总被引:1,自引:0,他引:1  
Boolean networks (BNs) have been extensively used as mathematical models of genetic regulatory networks. The number of fixed points of a BN is a key feature of its dynamical behavior. Here, we study the maximum number of fixed points in a particular class of BNs called regulatory Boolean networks, where each interaction between the elements of the network is either an activation or an inhibition. We find relationships between the positive and negative cycles of the interaction graph and the number of fixed points of the network. As our main result, we exhibit an upper bound for the number of fixed points in terms of minimum cardinality of a set of vertices meeting all positive cycles of the network, which can be applied in the design of genetic regulatory networks.  相似文献   

16.
Dynamic Bayesian networks (DBNs) are considered as a promising model for inferring gene networks from time series microarray data. DBNs have overtaken Bayesian networks (BNs) as DBNs can construct cyclic regulations using time delay information. In this paper, a general framework for DBN modelling is outlined. Both discrete and continuous DBN models are constructed systematically and criteria for learning network structures are introduced from a Bayesian statistical viewpoint. This paper reviews the applications of DBNs over the past years. Real data applications for Saccharomyces cerevisiae time series gene expression data are also shown.  相似文献   

17.
MOTIVATION: The analysis of high-throughput experimental data, for example from microarray experiments, is currently seen as a promising way of finding regulatory relationships between genes. Bayesian networks have been suggested for learning gene regulatory networks from observational data. Not all causal relationships can be inferred from correlation data alone. Often several equivalent but different directed graphs explain the data equally well. Intervention experiments where genes are manipulated can help to narrow down the range of possible networks. RESULTS: We describe an active learning algorithm that suggests an optimized sequence of intervention experiments. Simulation experiments show that our selection scheme is better than an unguided choice of interventions in learning the correct network and compares favorably in running time and results with methods based on value of information calculations.  相似文献   

18.
MOTIVATION: Bayesian networks have been applied to infer genetic regulatory interactions from microarray gene expression data. This inference problem is particularly hard in that interactions between hundreds of genes have to be learned from very small data sets, typically containing only a few dozen time points during a cell cycle. Most previous studies have assessed the inference results on real gene expression data by comparing predicted genetic regulatory interactions with those known from the biological literature. This approach is controversial due to the absence of known gold standards, which renders the estimation of the sensitivity and specificity, that is, the true and (complementary) false detection rate, unreliable and difficult. The objective of the present study is to test the viability of the Bayesian network paradigm in a realistic simulation study. First, gene expression data are simulated from a realistic biological network involving DNAs, mRNAs, inactive protein monomers and active protein dimers. Then, interaction networks are inferred from these data in a reverse engineering approach, using Bayesian networks and Bayesian learning with Markov chain Monte Carlo. RESULTS: The simulation results are presented as receiver operator characteristics curves. This allows estimating the proportion of spurious gene interactions incurred for a specified target proportion of recovered true interactions. The findings demonstrate how the network inference performance varies with the training set size, the degree of inadequacy of prior assumptions, the experimental sampling strategy and the inclusion of further, sequence-based information. AVAILABILITY: The programs and data used in the present study are available from http://www.bioss.sari.ac.uk/~dirk/Supplements  相似文献   

19.
Ecological network studies are providing important advances about the organization, stability and dynamics of ecological systems. However, the ecological networks approach is being integrated very slowly in plant community ecology, even though the first studies on plant facilitation networks (FNs) were published more than a decade ago. The study of interaction networks between established plants and plants recruiting beneath them, which we call Recruitment Networks (RNs), can provide new insights on mechanisms driving plant community structure and dynamics. RNs basically describe which plants recruit under which others, so they can be seen as a generalisation of the classic FNs since they do not imply any particular effect (positive, negative or neutral) of the established plants on recruiting ones. RNs summarise information on the structure of sapling banks. More importantly, the information included in RNs can be incorporated into models of replacement dynamics to evaluate how different aspects of network structure, or different mechanisms of network assembly, may affect plant community stability and species coexistence. To allow an efficient development of the study of FNs and RNs, here we unify concepts, synthesise current knowledge, clarify some conceptual issues, and propose basic methodological guidelines to standardise sampling methods that could make future studies of these networks directly comparable.  相似文献   

20.

Background

Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size.

Results

We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network.

Conclusions

For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0395-x) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号