首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Dehmer M  Sivakumar L 《PloS one》2012,7(2):e31395
In this article, we tackle a challenging problem in quantitative graph theory. We establish relations between graph entropy measures representing the structural information content of networks. In particular, we prove formal relations between quantitative network measures based on Shannon's entropy to study the relatedness of those measures. In order to establish such information inequalities for graphs, we focus on graph entropy measures based on information functionals. To prove such relations, we use known graph classes whose instances have been proven useful in various scientific areas. Our results extend the foregoing work on information inequalities for graphs.  相似文献   

Evolutionary processes such as hybridisation, lateral gene transfer, and recombination are all key factors in shaping the structure of genes and genomes. However, since such processes are not always best represented by trees, there is now considerable interest in using more general networks instead. For example, in recent studies it has been shown that networks can be used to provide lower bounds on the number of recombination events and also for the number of lateral gene transfers that took place in the evolutionary history of a set of molecular sequences. In this paper we describe the theoretical performance of some related bounds that result when merging pairs of trees into networks.  相似文献   

Gene set analysis aims to identify predefined sets of functionally related genes that are differentially expressed between two conditions. Although gene set analysis has been very successful, by incorporating biological knowledge about the gene sets and enhancing statistical power over gene-by-gene analyses, it does not take into account the correlation (association) structure among the genes. In this work, we present CoGA (Co-expression Graph Analyzer), an R package for the identification of groups of differentially associated genes between two phenotypes. The analysis is based on concepts of Information Theory applied to the spectral distributions of the gene co-expression graphs, such as the spectral entropy to measure the randomness of a graph structure and the Jensen-Shannon divergence to discriminate classes of graphs. The package also includes common measures to compare gene co-expression networks in terms of their structural properties, such as centrality, degree distribution, shortest path length, and clustering coefficient. Besides the structural analyses, CoGA also includes graphical interfaces for visual inspection of the networks, ranking of genes according to their “importance” in the network, and the standard differential expression analysis. We show by both simulation experiments and analyses of real data that the statistical tests performed by CoGA indeed control the rate of false positives and is able to identify differentially co-expressed genes that other methods failed.  相似文献   

The brain''s structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e.g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a “fingerprint”. Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the “uncertainty” of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them to (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed.  相似文献   

MOTIVATION: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. RESULTS: Analyses of tree network connectivity in TreeBASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent approximately 1.87; second approximately 4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining. AVAILABILITY: http://treebase.org  相似文献   

Biological signaling networks comprise the chemical processes by which cells detect and respond to changes in their environment. Such networks have been implicated in the regulation of important cellular activities, including cellular reproduction, mobility, and death. Though technological and scientific advances have facilitated the rapid accumulation of information about signaling networks, utilizing these massive information resources has become infeasible except through computational methods and computer-based tools. To date, visualization and simulation tools have received significant emphasis. In this paper, we present a graph-theoretic formalization of biological signaling network models that are in wide but informal use, and formulate two problems on the graph: the Constrained Downstream and Minimum Knockout Problems. Solutions to these problems yield qualitative tools for generating hypotheses about the networks, which can then be experimentally tested in a laboratory setting. Using established graph algorithms, we provide a solution to the Constrained Downstream Problem. We also show that the Minimum Knockout Problem is NP-Hard, propose a heuristic, and assess its performance. In tests on the Epidermal Growth Factor Receptor (EGFR) network, we find that our heuristic reports the correct solution to the problem in seconds. Source code for the implementations of both solutions is available from the authors upon request.  相似文献   


Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree T, called a base tree, and adding arcs between arcs of T. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network N, we characterise the support trees of N, that is, the tree-based embeddings of N. We use this characterisation to give an explicit formula for the number of support trees of N when N is binary. This formula is written in terms of the components of a bipartite graph.


Sivakumar L  Dehmer M 《PloS one》2012,7(6):e38159
In this article, we discuss the problem of establishing relations between information measures for network structures. Two types of entropy based measures namely, the Shannon entropy and its generalization, the Rényi entropy have been considered for this study. Our main results involve establishing formal relationships, by means of inequalities, between these two kinds of measures. Further, we also state and prove inequalities connecting the classical partition-based graph entropies and partition-independent entropy measures. In addition, several explicit inequalities are derived for special classes of graphs.  相似文献   

Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of approaches have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes) from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.  相似文献   


Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.


We prove that Nakhleh's metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phylogenetic networks.  相似文献   

The general problem of representing collections of trees as a single graph has led to many tree summary techniques. Many consensus approaches take sets of trees (either inferred as separate gene trees or gleaned from the posterior of a Bayesian analysis) and produce a single “best” tree. In scenarios where horizontal gene transfer or hybridization are suspected, networks may be preferred, which allow for nodes to have two parents, representing the fusion of lineages. One such construct is the cluster union network (CUN), which is constructed using the union of all clusters in the input trees. The CUN has a number of mathematically desirable properties, but can also present edges not observed in the input trees. In this paper we define a new network construction, the edge union network (EUN), which displays edges if and only if they are contained in the input trees. We also demonstrate that this object can be constructed with polynomial time complexity given arbitrary phylogenetic input trees, and so can be used in conjunction with network analysis techniques for further phylogenetic hypothesis testing.  相似文献   

The problem of sorting by transpositions asks for a sequence of adjacent interval exchanges that sorts a permutation and is of the shortest possible length. The distance of the permutation is defined as the length of such a sequence. Despite the apparently intuitive nature of this problem, introduced in 1995 by Bafna and Pevzner, the complexity of both finding an optimal sequence and computing the distance remains open today. In this paper, we establish connections between two different graph representations of permutations, which allows us to compute the distance of a few nontrivial classes of permutations in linear time and space, bypassing the use of any graph structure. By showing that every permutation can be obtained from one of these classes, we prove a new tight upper bound on the transposition distance. Finally, we give improved bounds on some other families of permutations and prove formulas for computing the exact distance of other classes of permutations, again in polynomial time  相似文献   

MOTIVATION: Metabolic networks are organized in a modular, hierarchical manner. Methods for a rational decomposition of the metabolic network into relatively independent functional subsets are essential to better understand the modularity and organization principle of a large-scale, genome-wide network. Network decomposition is also necessary for functional analysis of metabolism by pathway analysis methods that are often hampered by the problem of combinatorial explosion due to the complexity of metabolic network. Decomposition methods proposed in literature are mainly based on the connection degree of metabolites. To obtain a more reasonable decomposition, the global connectivity structure of metabolic networks should be taken into account. RESULTS: In this work, we use a reaction graph representation of a metabolic network for the identification of its global connectivity structure and for decomposition. A bow-tie connectivity structure similar to that previously discovered for metabolite graph is found also to exist in the reaction graph. Based on this bow-tie structure, a new decomposition method is proposed, which uses a distance definition derived from the path length between two reactions. An hierarchical classification tree is first constructed from the distance matrix among the reactions in the giant strong component of the bow-tie structure. These reactions are then grouped into different subsets based on the hierarchical tree. Reactions in the IN and OUT subsets of the bow-tie structure are subsequently placed in the corresponding subsets according to a 'majority rule'. Compared with the decomposition methods proposed in literature, ours is based on combined properties of the global network structure and local reaction connectivity rather than, primarily, on the connection degree of metabolites. The method is applied to decompose the metabolic network of Escherichia coli. Eleven subsets are obtained. More detailed investigations of the subsets show that reactions in the same subset are really functionally related. The rational decomposition of metabolic networks, and subsequent studies of the subsets, make it more amenable to understand the inherent organization and functionality of metabolic networks at the modular level. SUPPLEMENTARY INFORMATION: http://genome.gbf.de/bioinformatics/  相似文献   

Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known graph isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks.  相似文献   

We describe several analytical techniques for use in developing genetic models of oncogenesis including: methods for the selection of important genetic events, construction of graph models (including distance-based trees, branching trees, contingency trees and directed acyclic graph models) from these events and methods for interpretation of the resulting models. The models can be used to make predictions about: which genetic events tend to occur early, which events tend to occur together and the likely order of events. Unlike simple path models of oncogenesis, our models allow dependencies to exist between specific genetic changes and allow for multiple, divergent paths in tumor progression. A variety of genetic events can be used with the graph models including chromosome breaks, losses or gains of large DNA regions, small mutations and changes in methylation. As an application of the techniques, we use a recently published cytogenetic analysis of 206 melanoma cases [Nelson et al. (2000), Cancer Genet. Cytogenet.122, 101-109] to derive graph models for chromosome breaks in melanoma. Among our predictions are: (1) breaks in 6q1 and 1q1 are early events, with 6q1 preferentially occurring first and increasing the probability of a break in 1q1 and (2) breaks in the two sets [1p1, 1p2, 9q1] and [1q1, 7p2, 9p2] tend to occur together. This study illustrates that the application of graph models to genetic data from tumor sets provide new information on the interrelationships among genetic changes during tumor progression.  相似文献   

By viewing the ancestral recombination graph as defining a sequence of trees, we show how possible evolutionary histories consistent with given data can be constructed using the minimum number of recombination events. In contrast to previously known methods, which yield only estimated lower bounds, our method of detecting recombination always gives the minimum number of recombination events if the right kind of rooted trees are used in our algorithm. A new lower bound can be defined if rooted trees with fewer constraints are used. As well as studying how often it actually is equal to the minimum, we test how this new lower bound performs in comparison to some other lower bounds. Our study indicates that the new lower bound is an improvement on earlier bounds. Also, using simulated data, we investigate how well our method can recover the actual site-specific evolutionary relationships. In the presence of recombination, using a single tree to describe the evolution of the entire locus clearly leads to lower average recovery percentages than does our method. Our study shows that recovering the actual local tree topologies can be done more accurately than estimating the actual number of recombination events.  相似文献   

The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard model of a phylogenetic tree by also allowing for cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees or their unrooted versions, rooted phylogenetic networks are notoriously difficult to understand. To help alleviate this, recent work on them has also centered on their “uprooted” versions. By focusing on such graphs and the combinatorial concept of a split system which underpins an unrooted phylogenetic network, we show that not only can a so-called (uprooted) 1-nested network N be obtained from the Buneman graph (sometimes also called a median network) associated with the split system \(\Sigma (N)\) induced on the set of leaves of N but also that that graph is, in a well-defined sense, optimal. Along the way, we establish the 1-nested analogue of the fundamental “splits equivalence theorem” for phylogenetic trees and characterize maximal circular split systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号