首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.  相似文献   

2.
Omic approaches to the analysis of plant-virus interactions are becoming increasingly popular. These types of data, in combination with models of interaction networks, will aid in revealing not only host components that are important for the virus life cycle, but also general patterns about the way in which different viruses manipulate host regulation of gene expression for their own benefit and possible mechanisms by which viruses evade host defenses. Here, we review studies identifying host genes regulated by viruses and discuss how these genes integrate in host regulatory and interaction networks, with a particular focus on the physical properties of these networks.  相似文献   

3.
The size and nature of data collected on gene and protein interactions has led to a rapid growth of interest in graph theory and modern techniques for describing, characterizing and comparing networks. Simultaneously, this is a field of growth within mathematics and theoretical physics, where the global properties, and emergent behavior of networks, as a function of the local properties has long been studied. In this review, a number of approaches for exploiting modern network theory to help describe and analyze different data sets and problems associated with proteomic data are considered. This review aims to help biologists find their way towards useful ideas and references, yet may also help scientists from a mathematics and physics background to understand where they may apply their expertise.  相似文献   

4.
Over the last decade transgenic mouse models have become a common experimental tool for unraveling gene function. During this time there has been a growing expectation that transgenes resemble the in vivo state as much as possible. To this end, a preference away from heterologous promoters has emerged, and transgene constructs often utilize the endogenous promoter and gene sequences in BAC, PAC and YAC form without the addition of selectable markers, or at least their subsequent removal. There has been a trend toward controlled integration by homologous recombination, either at a characterized chromosomal localization or in some cases within the allele of interest. Markers such as green fluorescent protein (GFP), beta-galactosidase (LacZ), and alkaline phosphatase (AP) continue to be useful to trace transgenic cells, or transgene expression. The development of technologies such as RNA interference (RNAi), are introducing new ways of using transgenic models. Future developments in RNAi technology may revolutionize tissue specific inactivation of gene function, without the requirement of generating conditionally targeted mice and tissue specific recombinase mice. Transgenic models are biological tools that aid discovery. Overall, the main consideration in the generation of transgenic models is that they are bona fide biological models that best impart the disease model or biological function of the gene that they represent. The main consideration is to make the best model for the biological question at heart and this review aims to simplify that task somewhat. Here we take a historical perspective on the development of transgenic models, with many of the important considerations to be made in design and development along the way.  相似文献   

5.
The size and nature of data collected on gene and protein interactions has led to a rapid growth of interest in graph theory and modern techniques for describing, characterizing and comparing networks. Simultaneously, this is a field of growth within mathematics and theoretical physics, where the global properties, and emergent behavior of networks, as a function of the local properties has long been studied. In this review, a number of approaches for exploiting modern network theory to help describe and analyze different data sets and problems associated with proteomic data are considered. This review aims to help biologists find their way towards useful ideas and references, yet may also help scientists from a mathematics and physics background to understand where they may apply their expertise.  相似文献   

6.
7.
Network graphs have become a popular tool to represent complex systems composed of many interacting subunits; especially in neuroscience, network graphs are increasingly used to represent and analyze functional interactions between multiple neural sources. Interactions are often reconstructed using pairwise bivariate analyses, overlooking the multivariate nature of interactions: it is neglected that investigating the effect of one source on a target necessitates to take all other sources as potential nuisance variables into account; also combinations of sources may act jointly on a given target. Bivariate analyses produce networks that may contain spurious interactions, which reduce the interpretability of the network and its graph metrics. A truly multivariate reconstruction, however, is computationally intractable because of the combinatorial explosion in the number of potential interactions. Thus, we have to resort to approximative methods to handle the intractability of multivariate interaction reconstruction, and thereby enable the use of networks in neuroscience. Here, we suggest such an approximative approach in the form of an algorithm that extends fast bivariate interaction reconstruction by identifying potentially spurious interactions post-hoc: the algorithm uses interaction delays reconstructed for directed bivariate interactions to tag potentially spurious edges on the basis of their timing signatures in the context of the surrounding network. Such tagged interactions may then be pruned, which produces a statistically conservative network approximation that is guaranteed to contain non-spurious interactions only. We describe the algorithm and present a reference implementation in MATLAB to test the algorithm’s performance on simulated networks as well as networks derived from magnetoencephalographic data. We discuss the algorithm in relation to other approximative multivariate methods and highlight suitable application scenarios. Our approach is a tractable and data-efficient way of reconstructing approximative networks of multivariate interactions. It is preferable if available data are limited or if fully multivariate approaches are computationally infeasible.  相似文献   

8.
Gene regulatory networks for animal development are the underlying mechanisms controlling cell fate specification and differentiation. The architecture of gene regulatory circuits determines their information processing properties and their developmental function. It is a major task to derive realistic network models from exceedingly advanced high throughput experimental data. Here we use mathematical modeling to study the dynamics of gene regulatory circuits to advance the ability to infer regulatory connections and logic function from experimental data. This study is guided by experimental methodologies that are commonly used to study gene regulatory networks that control cell fate specification. We study the effect of a perturbation of an input on the level of its downstream genes and compare between the cis-regulatory execution of OR and AND logics. Circuits that initiate gene activation and circuits that lock on the expression of genes are analyzed. The model improves our ability to analyze experimental data and construct from it the network topology. The model also illuminates information processing properties of gene regulatory circuits for animal development.  相似文献   

9.
Lin M  Zhou X  Shen X  Mao C  Chen X 《The Plant cell》2011,23(3):911-922
Predicted interactions are a valuable complement to experimentally reported interactions in molecular mechanism studies, particularly for higher organisms, for which reported experimental interactions represent only a small fraction of their total interactomes. With careful engineering consideration of the lessons from previous efforts, the predicted arabidopsis interactome resource (PAIR; ) presents 149,900 potential molecular interactions, which are expected to cover approximately 24% of the entire interactome with approximately 40% precision. This study demonstrates that, although PAIR still has limited coverage, it is rich enough to capture many significant functional linkages within and between higher-order biological systems, such as pathways and biological processes. These inferred interactions can nicely power several network topology-based systems biology analyses, such as gene set linkage analysis, protein function prediction, and identification of regulatory genes demonstrating insignificant expression changes. The drastically expanded molecular network in PAIR has considerably improved the capability of these analyses to integrate existing knowledge and suggest novel insights into the function and coordination of genes and gene networks.  相似文献   

10.
Analyzing time series gene expression data   总被引:7,自引:0,他引:7  
MOTIVATION: Time series expression experiments are an increasingly popular method for studying a wide range of biological systems. However, when analyzing these experiments researchers face many new computational challenges. Algorithms that are specifically designed for time series experiments are required so that we can take advantage of their unique features (such as the ability to infer causality from the temporal response pattern) and address the unique problems they raise (e.g. handling the different non-uniform sampling rates). RESULTS: We present a comprehensive review of the current research in time series expression data analysis. We divide the computational challenges into four analysis levels: experimental design, data analysis, pattern recognition and networks. For each of these levels, we discuss computational and biological problems at that level and point out some of the methods that have been proposed to deal with these issues. Many open problems in all these levels are discussed. This review is intended to serve as both, a point of reference for experimental biologists looking for practical solutions for analyzing their data, and a starting point for computer scientists interested in working on the computational problems related to time series expression analysis.  相似文献   

11.
12.
With numerous whole genomes now in hand, and experimental data about genes and biological pathways on the increase, a systems approach to biological research is becoming essential. Ontologies provide a formal representation of knowledge that is amenable to computational as well as human analysis, an obvious underpinning of systems biology. Mapping function to gene products in the genome consists of two, somewhat intertwined enterprises: ontology building and ontology annotation. Ontology building is the formal representation of a domain of knowledge; ontology annotation is association of specific genomic regions (which we refer to simply as 'genes', including genes and their regulatory elements and products such as proteins and functional RNAs) to parts of the ontology. We consider two complementary representations of gene function: the Gene Ontology (GO) and pathway ontologies. GO represents function from the gene's eye view, in relation to a large and growing context of biological knowledge at all levels. Pathway ontologies represent function from the point of view of biochemical reactions and interactions, which are ordered into networks and causal cascades. The more mature GO provides an example of ontology annotation: how conclusions from the scientific literature and from evolutionary relationships are converted into formal statements about gene function. Annotations are made using a variety of different types of evidence, which can be used to estimate the relative reliability of different annotations.  相似文献   

13.
Application of phylogenetic networks in evolutionary studies   总被引:42,自引:0,他引:42  
The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.  相似文献   

14.
State diagrams (stategraphs) are suitable for describing the behavior of dynamic systems. However, when they are used to model large and complex systems, determining the states and transitions among them can be overwhelming, due to their flat, unstratified structure. In this article, we present the use of statecharts as a novel way of modeling complex gene networks. Statecharts extend conventional state diagrams with features such as nested hierarchy, recursion, and concurrency. These features are commonly utilized in engineering for designing complex systems and can enable us to model complex gene networks in an efficient and systematic way. We modeled five key gene network motifs, simple regulation, autoregulation, feed-forward loop, single-input module, and dense overlapping regulon, using statecharts. Specifically, utilizing nested hierarchy and recursion, we were able to model a complex interlocked feed-forward loop network in a highly structured way, demonstrating the potential of our approach for modeling large and complex gene networks.  相似文献   

15.
Microarray technology is a powerful tool for animal functional genomics studies, with applications spanning from gene identification and mapping, to function and control of gene expression. Microarray assays, however, are complex and costly, and hence generally performed with relatively small number of animals. Nevertheless, they generate data sets of unprecedented complexity and dimensionality. Therefore, such trials require careful planning and experimental design, in addition to tailored statistical and computational tools for their appropriate data mining. In this review, we discuss experimental design and data analysis strategies, which incorporate prior genomic and biological knowledge, such as genotypes and gene function and pathway membership. We focus the discussion on the design of genetical genomics studies, and on significance testing for detection of differential expression. It is shown that the use of prior biological information can improve the efficiency of microarray experiments.  相似文献   

16.
Gene expression data analysis   总被引:2,自引:0,他引:2  
Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices, tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification as well as some possible future directions.  相似文献   

17.
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.  相似文献   

18.
After the major achievements of the DNA sequencing projects, an equally important challenge now is to uncover the functional relationships among genes (i.e. gene networks). It has become increasingly clear that computational algorithms are crucial for extracting meaningful information from the massive amount of data generated by high-throughput genome-wide technologies. Here, we summarise how systems identification algorithms, originating from physics and control theory, have been adapted for use in biology. We also explain how experimental perturbations combined with genome-wide measurements are being used to uncover gene networks. Perturbation techniques could pave the way for identifying gene networks in more complex settings such as multifactorial diseases and for improving the efficacy of drug evaluation.  相似文献   

19.
20.
ABSTRACT: BACKGROUND: Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. RESULTS: We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in-silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters. Furthermore, in both the in silico and experimental case studies, the predicted gene expression profiles are in very close agreement with the dynamics of the input data. CONCLUSIONS: Our integer programming algorithm effectively utilizes bootstrapping to identify robust gene regulatory networks from noisy, non-linear time-series gene expression data. With significant noise and non-linearities being inherent to biological systems, the present formulism, with the incorporation of network sparsity, is extremely relevant to gene regulatory networks, and while the formulation has been validated against in silico and E. Coli data, it can be applied to any biological system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号