首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn''t make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.  相似文献   

2.
Chen M  Cho J  Zhao H 《PLoS genetics》2011,7(4):e1001353
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene-based method. We also illustrate the usefulness of our approach through its applications to a real data example.  相似文献   

3.
We give in this paper indications about the dynamical impact (as phenotypic changes) coming from the main sources of perturbation in biological regulatory networks. First, we define the boundary of the interaction graph expressing the regulations between the main elements of the network (genes, proteins, metabolites, ...). Then, we search what changes in the state values on the boundary could cause some changes of states in the core of the system (robustness to boundary conditions). After, we analyse the role of the mode of updating (sequential, block sequential or parallel) on the asymptotics of the network, essentially on the occurrence of limit cycles (robustness to updating methods). Finally, we show the influence of some topological changes (e.g. suppression or addition of interactions) on the dynamical behaviour of the system (robustness to topology perturbations).  相似文献   

4.
Recent analyses of biological and artificial networks have revealed a common network architecture, called scale-free topology. The origin of the scale-free topology has been explained by using growth and preferential attachment mechanisms. In a cell, proteins are the most important carriers of function, and are composed of domains as elemental units responsible for the physical interaction between protein pairs. Here, we propose a model for protein–protein interaction networks that reveals the emergence of two possible topologies. We show that depending on the number of randomly selected interacting domain pairs, the connectivity distribution follows either a scale-free distribution, even in the absence of the preferential attachment, or a normal distribution. This new approach only requires an evolutionary model of proteins (nodes) but not for the interactions (edges). The edges are added by means of random interaction of domain pairs. As a result, this model offers a new mechanistic explanation for understanding complex networks with a direct biological interpretation because only protein structures and their functions evolved through genetic modifications of amino acid sequences. These findings are supported by numerical simulations as well as experimental data.  相似文献   

5.
Jung S  Lee KH  Lee D 《Bio Systems》2007,90(1):197-210
The Bayesian network is a popular tool for describing relationships between data entities by representing probabilistic (in)dependencies with a directed acyclic graph (DAG) structure. Relationships have been inferred between biological entities using the Bayesian network model with high-throughput data from biological systems in diverse fields. However, the scalability of those approaches is seriously restricted because of the huge search space for finding an optimal DAG structure in the process of Bayesian network learning. For this reason, most previous approaches limit the number of target entities or use additional knowledge to restrict the search space. In this paper, we use the hierarchical clustering and order restriction (H-CORE) method for the learning of large Bayesian networks by clustering entities and restricting edge directions between those clusters, with the aim of overcoming the scalability problem and thus making it possible to perform genome-scale Bayesian network analysis without additional biological knowledge. We use simulations to show that H-CORE is much faster than the widely used sparse candidate method, whilst being of comparable quality. We have also applied H-CORE to retrieving gene-to-gene relationships in a biological system (The 'Rosetta compendium'). By evaluating learned information through literature mining, we demonstrate that H-CORE enables the genome-scale Bayesian analysis of biological systems without any prior knowledge.  相似文献   

6.
If perturbing two genes together has a stronger or weaker effect than expected, they are said to genetically interact. Genetic interactions are important because they help map gene function, and functionally related genes have similar genetic interaction patterns. Mapping quantitative (positive and negative) genetic interactions on a global scale has recently become possible. This data clearly shows groups of genes connected by predominantly positive or negative interactions, termed monochromatic groups. These groups often correspond to functional modules, like biological processes or complexes, or connections between modules. However it is not yet known how these patterns globally relate to known functional modules. Here we systematically study the monochromatic nature of known biological processes using the largest quantitative genetic interaction data set available, which includes fitness measurements for ~5.4 million gene pairs in the yeast Saccharomyces cerevisiae. We find that only 10% of biological processes, as defined by Gene Ontology annotations, and less than 1% of inter-process connections are monochromatic. Further, we show that protein complexes are responsible for a surprisingly large fraction of these patterns. This suggests that complexes play a central role in shaping the monochromatic landscape of biological processes. Altogether this work shows that both positive and negative monochromatic patterns are found in known biological processes and in their connections and that protein complexes play an important role in these patterns. The monochromatic processes, complexes and connections we find chart a hierarchical and modular map of sensitive and redundant biological systems in the yeast cell that will be useful for gene function prediction and comparison across phenotypes and organisms. Furthermore the analysis methods we develop are applicable to other species for which genetic interactions will progressively become more available.  相似文献   

7.
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.  相似文献   

8.
9.
Protein–protein interactions (PPIs) play very important roles in many cellular processes, and provide rich information for discovering biological facts and knowledge. Although various experimental approaches have been developed to generate large amounts of PPI data for different organisms, high-throughput experimental data usually suffers from high error rates, and as a consequence, the biological knowledge discovered from this data is distorted or incorrect. Therefore, it is vital to assess the quality of protein interaction data and extract reliable protein interactions from the high-throughput experimental data. In this paper, we propose a new Semantic Reliability (SR) method to assess the reliability of each protein interaction and identify potential false-positive protein interactions in a dataset. For each pair of target interacting proteins, the SR method takes into account the semantic influence between proteins that interact with the target proteins, and the semantic influence between the target proteins themselves when assessing the interaction reliability. Evaluations on real protein interaction datasets demonstrated that our method outperformed other existing methods in terms of extracting more reliable interactions from original protein interaction datasets.  相似文献   

10.
Genetic regulatory network inference is critically important for revealing fundamental cellular processes, investigating gene functions, and understanding their relations. The availability of time series gene expression data makes it possible to investigate the gene activities of whole genomes, rather than those of only a pair of genes or among several genes. However, current computational methods do not sufficiently consider the temporal behavior of this type of data and lack the capability to capture the complex nonlinear system dynamics. We propose a recurrent neural network (RNN) and particle swarm optimization (PSO) approach to infer genetic regulatory networks from time series gene expression data. Under this framework, gene interaction is explained through a connection weight matrix. Based on the fact that the measured time points are limited and the assumption that the genetic networks are usually sparsely connected, we present a PSO-based search algorithm to unveil potential genetic network constructions that fit well with the time series data and explore possible gene interactions. Furthermore, PSO is used to train the RNN and determine the network parameters. Our approach has been applied to both synthetic and real data sets. The results demonstrate that the RNN/PSO can provide meaningful insights in understanding the nonlinear dynamics of the gene expression time series and revealing potential regulatory interactions between genes.  相似文献   

11.
Epistasis, or gene–gene interaction, results from joint effects of genes on a trait; thus, the same alleles of one gene may display different genetic effects in different genetic backgrounds. In this study, we generalized the coding technique of a natural and orthogonal interaction (NOIA) model for association studies along with gene–gene interactions for dichotomous traits and human complex diseases. The NOIA model which has non-correlated estimators for genetic effects is important for estimating influence from multiple loci. We conducted simulations and data analyses to evaluate the performance of the NOIA model. Both simulation and real data analyses revealed that the NOIA statistical model had higher power for detecting main genetic effects and usually had higher power for some interaction effects than the usual model. Although associated genes have been identified for predisposing people to melanoma risk: HERC2 at 15q13.1, MC1R at 16q24.3 and CDKN2A at 9p21.3, no gene–gene interaction study has been fully explored for melanoma. By applying the NOIA statistical model to a genome-wide melanoma dataset, we confirmed the previously identified significantly associated genes and found potential regions at chromosomes 5 and 4 that may interact with the HERC2 and MC1R genes, respectively. Our study not only generalized the orthogonal NOIA model but also provided useful insights for understanding the influence of interactions on melanoma risk.  相似文献   

12.
13.
MOTIVATION: Gene association/interaction networks provide vast amounts of information about essential processes inside the cell. A complete picture of gene-gene associations/interactions would open new horizons for biologists, ranging from pure appreciation to successful manipulation of biological pathways for therapeutic purposes. Therefore, identification of important biological complexes whose members (genes and their products proteins) interact with each other is of prime importance. Numerous experimental methods exist but, for the most part, they are costly and labor intensive. Computational techniques, such as the one proposed in this work, provide a quick 'budget' solution that can be used as a screening tool before more expensive techniques are attempted. Here, we introduce a novel computational method based on the partial least squares (PLS) regression technique for reconstruction of genetic networks from microarray data. RESULTS: The proposed PLS method is shown to be an effective screening procedure for the detection of gene-gene interactions from microarray data. Both simulated and real microarray experiments show that the PLS-based approach is superior to its competitors both in terms of performance and applicability. AVAILABILITY: R code is available from the supplementary web-site whose URL is given below.  相似文献   

14.
MOTIVATION: The genetic basis of complex traits often involves the function of multiple genetic factors, their interactions and the interaction between the genetic and environmental factors. Gene-environment (G×E) interaction is considered pivotal in determining trait variations and susceptibility of many genetic disorders such as neurodegenerative diseases or mental disorders. Regression-based methods assuming a linear relationship between a disease response and the genetic and environmental factors as well as their interaction is the commonly used approach in detecting G×E interaction. The linearity assumption, however, could be easily violated due to non-linear genetic penetrance which induces non-linear G×E interaction. RESULTS: In this work, we propose to relax the linear G×E assumption and allow for non-linear G×E interaction under a varying coefficient model framework. We propose to estimate the varying coefficients with regression spline technique. The model allows one to assess the non-linear penetrance of a genetic variant under different environmental stimuli, therefore help us to gain novel insights into the etiology of a complex disease. Several statistical tests are proposed for a complete dissection of G×E interaction. A wild bootstrap method is adopted to assess the statistical significance. Both simulation and real data analysis demonstrate the power and utility of the proposed method. Our method provides a powerful and testable framework for assessing non-linear G×E interaction.  相似文献   

15.
Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. We employ the gradient descent boosting procedure to build an additive tree model and propose a new algorithm to utilize the network structure in fitting small tree weak learners. We illustrate by simulation studies and a real data example that, by making use of the network information, NetBoosting outperforms a few existing methods in terms of accuracy of prediction and variable selection.  相似文献   

16.
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach. Availability: The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.  相似文献   

17.
Recently, a number of advanced screening technologies have allowed for the comprehensive quantification of aggravating and alleviating genetic interactions among gene pairs. In parallel, TAP-MS studies (tandem affinity purification followed by mass spectroscopy) have been successful at identifying physical protein interactions that can indicate proteins participating in the same molecular complex. Here, we propose a method for the joint learning of protein complexes and their functional relationships by integration of quantitative genetic interactions and TAP-MS data. Using 3 independent benchmark datasets, we demonstrate that this method is >50% more accurate at identifying functionally related protein pairs than previous approaches. Application to genes involved in yeast chromosome organization identifies a functional map of 91 multimeric complexes, a number of which are novel or have been substantially expanded by addition of new subunits. Interestingly, we find that complexes that are enriched for aggravating genetic interactions (i.e., synthetic lethality) are more likely to contain essential genes, linking each of these interactions to an underlying mechanism. These results demonstrate the importance of both large-scale genetic and physical interaction data in mapping pathway architecture and function.  相似文献   

18.
Inferring genetic regulatory logic from expression data   总被引:1,自引:0,他引:1  
MOTIVATION: High-throughput molecular genetics methods allow the collection of data about the expression of genes at different time points and under different conditions. The challenge is to infer gene regulatory interactions from these data and to get an insight into the mechanisms of genetic regulation. RESULTS: We propose a model for genetic regulatory interactions, which has a biologically motivated Boolean logic semantics, but is of a probabilistic nature, and is hence able to confront noisy biological processes and data. We propose a method for learning the model from data based on the Bayesian approach and utilizing Gibbs sampling. We tested our method with previously published data of the Saccharomyces cerevisiae cell cycle and found relations between genes consistent with biological knowledge.  相似文献   

19.
The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.  相似文献   

20.
The genetic basis of complex diseases is expected to be highly heterogeneous, with complex interactions among multiple disease loci and environment factors. Due to the multi-dimensional property of interactions among large number of genetic loci, efficient statistical approach has not been well developed to handle the high-order epistatic complexity. In this article, we introduce a new approach for testing genetic epistasis in multiple loci using an entropy-based statistic for a case-only design. The entropy-based statistic asymptotically follows a χ2 distribution. Computer simulations show that the entropy-based approach has better control of type I error and higher power compared to the standard χ2 test. Motivated by a schizophrenia data set, we propose a method for measuring and testing the relative entropy of a clinical phenotype, through which one can test the contribution or interaction of multiple disease loci to a clinical phenotype. A sequential forward selection procedure is proposed to construct a genetic interaction network which is illustrated through a tree-based diagram. The network information clearly shows the relative importance of a set of genetic loci on a clinical phenotype. To show the utility of the new entropy-based approach, it is applied to analyze two real data sets, a schizophrenia data set and a published malaria data set. Our approach provides a fast and testable framework for genetic epistasis study in a case-only design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号