共查询到20条相似文献,搜索用时 15 毫秒
1.
Böcker and Dress (Adv Math 138:105–125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion. 相似文献
2.
Julien Baste Christophe Paul Ignasi Sau Celine Scornavacca 《Bulletin of mathematical biology》2017,79(4):920-938
In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree—a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes—called the “species tree.” One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the “concordance” with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping—but not identical—sets of labels, is called “supertree.” In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of “containing as a minor” and “containing as a topological minor” in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time \(2^{O(k^2)} \cdot n\), where n is the total size of the input. 相似文献
3.
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn''t make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. 相似文献
4.
The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard model of a phylogenetic tree by also allowing for cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees or their unrooted versions, rooted phylogenetic networks are notoriously difficult to understand. To help alleviate this, recent work on them has also centered on their “uprooted” versions. By focusing on such graphs and the combinatorial concept of a split system which underpins an unrooted phylogenetic network, we show that not only can a so-called (uprooted) 1-nested network N be obtained from the Buneman graph (sometimes also called a median network) associated with the split system \(\Sigma (N)\) induced on the set of leaves of N but also that that graph is, in a well-defined sense, optimal. Along the way, we establish the 1-nested analogue of the fundamental “splits equivalence theorem” for phylogenetic trees and characterize maximal circular split systems. 相似文献
5.
Huson Daniel H. 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(1):103-109
The evolutionary history of a collection of species is usually represented by a phylogenetic tree. Sometimes, phylogenetic networks are used as a means of representing reticulate evolution or of showing uncertainty and incompatibilities in evolutionary datasets. This is often done using unrooted phylogenetic networks such as split networks, due in part, to the availability of software (SplitsTree) for their computation and visualization. In this paper we discuss the problem of drawing rooted phylogenetic networks as cladograms or phylograms in a number of different views that are commonly used for rooted trees. Implementations of the algorithms are available in new releases of the Dendroscope and SplitsTree programs. 相似文献
6.
Stephen J. Willson 《Bulletin of mathematical biology》2010,72(2):340-358
A phylogenetic network is a rooted acyclic digraph with vertices corresponding to taxa. Let X denote a set of vertices containing the root, the leaves, and all vertices of outdegree 1. Regard X as the set of vertices on which measurements such as DNA can be made. A vertex is called normal if it has one parent, and hybrid if it has more than one parent. The network is called normal if it has no redundant arcs and also from every vertex there is a directed path to a member of X such that all vertices after the first are normal. This paper studies properties of normal networks. Under a simple model of inheritance that allows homoplasies only at hybrid vertices, there is essentially unique determination of the genomes at all vertices by the genomes at members of X if and only if the network is normal. This model is a limiting case of more standard models of inheritance when the substitution rate is sufficiently low. Various mathematical properties of normal networks are described. These properties include that the number of vertices grows at most quadratically with the number of leaves and that the number of hybrid vertices grows at most linearly with the number of leaves. 相似文献
7.
Cardona Gabriel Rossello Francesc Valiente Gabriel 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(4):552-569
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/. 相似文献
8.
9.
In the last decade, the use of phylogenetic networks to analyze the evolution of species whose past is likely to include reticulation events, such as horizontal gene transfer or hybridization, has gained popularity among evolutionary biologists. Nevertheless, the evolution of a particular gene can generally be described without reticulation events and therefore be represented by a phylogenetic tree. While this is not in contrast to each other, it places emphasis on the necessity of algorithms that analyze and summarize the tree-like information that is contained in a phylogenetic network. We contribute to the toolbox of such algorithms by investigating the question of whether or not a phylogenetic network embeds a tree twice and give a quadratic-time algorithm to solve this problem for a class of networks that is more general than tree-child networks. 相似文献
10.
Cardona Gabriel Llabres Merce Rossello Francesc Valiente Gabriel 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(4):629-638
We prove that Nakhleh's metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phylogenetic networks. 相似文献
11.
van Iersel Leo Keijsper Judith Kelk Steven Stougie Leen Hagen Ferry Boekhout Teun 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(4):667-681
Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so, to construct such a network [24]. Here, we extend this work by showing that this problem is even polynomial time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily nontree-like. This further strengthens the case for the use of triplet-based methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data. 相似文献
12.
Leo van Iersel Vincent Moulton Eveline de Swart Taoyang Wu 《Bulletin of mathematical biology》2017,79(5):1135-1154
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known graph isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks. 相似文献
13.
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks. 相似文献
14.
Juan Wang 《PloS one》2016,11(11)
Rooted phylogenetic networks are primarily used to represent conflicting evolutionary information and describe the reticulate evolutionary events in phylogeny. So far a lot of methods have been presented for constructing rooted phylogenetic networks, of which the methods based on the decomposition property of networks and by means of the incompatible graph (such as the CASS, the LNETWORK and the BIMLR) are more efficient than other available methods. The paper will discuss and compare these methods by both the practical and artificial datasets, in the aspect of the running time of the methods and the effective of constructed phylogenetic networks. The results show that the LNETWORK can construct much simper networks than the others. 相似文献
15.
Stephen J. Willson 《Bulletin of mathematical biology》2013,75(10):1840-1878
Trees are commonly utilized to describe the evolutionary history of a collection of biological species, in which case the trees are called phylogenetic trees. Often these are reconstructed from data by making use of distances between extant species corresponding to the leaves of the tree. Because of increased recognition of the possibility of hybridization events, more attention is being given to the use of phylogenetic networks that are not necessarily trees. This paper describes the reconstruction of certain such networks from the tree-average distances between the leaves. For a certain class of phylogenetic networks, a polynomial-time method is presented to reconstruct the network from the tree-average distances. The method is proved to work if there is a single reticulation cycle. 相似文献
16.
Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations. 相似文献
17.
Willson SJ 《Bulletin of mathematical biology》2006,68(4):919-944
In this paper, a class of rooted acyclic directed graphs (called TOM-networks) is defined that generalizes rooted trees and
allows for models including hybridization events. It is argued that the defining properties are biologically plausible. Each
TOM-network has a distance defined between each pair of vertices. For a TOM-network N, suppose that the set X consisting of the leaves and the root is known, together with the distances between members of X. It is proved that N is uniquely determined from this information and can be reconstructed in polynomial time. Thus, given exact distance information
on the leaves and root, the phylogenetic network can be uniquely recovered, provided that it is a TOM-network. An outgroup
can be used instead of a true root. 相似文献
18.
Cardona Gabriel Llabres Merce Rossello Francesc Valiente Gabriel 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(3):454-469
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the second in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we generalize to phylogenetic networks two metrics that have already been introduced in the literature for phylogenetic trees: the nodal distance and the triplets distance. We prove that they are metrics on any class of tree-child time consistent phylogenetic networks on the same set of taxa, as well as some basic properties for them. To prove these results, we introduce a reduction/expansion procedure that can be used not only to establish properties of tree-child time consistent phylogenetic networks by induction, but also to generate all tree-child time consistent phylogenetic networks with a given number of leaves. 相似文献
19.
Willson SJ 《Bulletin of mathematical biology》2007,69(8):2561-2590
Suppose G is a phylogenetic network given as a rooted acyclic directed graph. Let X be a subset of the vertex set containing the root, all leaves, and all vertices of outdegree 1. A vertex is “regular” if
it has a unique parent, and “hybrid” if it has two parents. Consider the case where each gene is binary. Assume an idealized
system of inheritance in which no homoplasies occur at regular vertices, but homoplasies can occur at hybrid vertices. Under
our model, the distances between taxa are shown to be described using a system of numbers called “originating weights” and
“homoplasy weights.” Assume that the distances are known between all members of X. Sufficient conditions are given such that the graph G and all the originating and homoplasy weights can be reconstructed from the given distances. 相似文献
20.
Cardona Gabriel Llabr s Merc Rossell Francesc Valiente Gabriel 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(1):46-61
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the $mu$-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network. 相似文献