首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree—a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes—called the “species tree.” One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the “concordance” with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping—but not identical—sets of labels, is called “supertree.” In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of “containing as a minor” and “containing as a topological minor” in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time \(2^{O(k^2)} \cdot n\), where n is the total size of the input.  相似文献   

2.
Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.  相似文献   

3.
The accuracy of the Fitch method for reconstructing ancestral states on ultrametric phylogenetic trees is studied. Two recurrence relations for computing the accuracy are given here. Using these relations, we analyze the convergence of the accuracy of the Fitch method for reconstructing the root state on a complete binary tree of 2 n leaves as n goes to infinity, present a closed-form formula for the accuracy on ultrametric comb trees, and provide a lower bound on the accuracy on arbitrary ultrametric phylogenetic trees.  相似文献   

4.
We describe a method that will reconstruct an unrooted binary phylogenetic level-1 network on \(n\) taxa from the set of all quartets containing a certain fixed taxon, in \(O(n^3)\) time. We also present a more general method which can handle more diverse quartet data, but which takes \(O(n^6)\) time. Both methods proceed by solving a certain system of linear equations over the two-element field \(\mathrm{GF}(2)\) . For a general dense quartet set, i.e. a set containing at least one quartet on every four taxa, our \(O(n^6)\) algorithm constructs a phylogenetic level-1 network consistent with the quartet set if such a network exists and returns an \(O(n^2)\) -sized certificate of inconsistency otherwise. This answers a question raised by Gambette, Berry and Paul regarding the complexity of reconstructing a level-1 network from a dense quartet set, and more particularly regarding the complexity of constructing a cyclic ordering of taxa consistent with a dense quartet set.  相似文献   

5.
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the second in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we generalize to phylogenetic networks two metrics that have already been introduced in the literature for phylogenetic trees: the nodal distance and the triplets distance. We prove that they are metrics on any class of tree-child time consistent phylogenetic networks on the same set of taxa, as well as some basic properties for them. To prove these results, we introduce a reduction/expansion procedure that can be used not only to establish properties of tree-child time consistent phylogenetic networks by induction, but also to generate all tree-child time consistent phylogenetic networks with a given number of leaves.  相似文献   

6.
V'yugin  V. V.  Gelfand  M. S.  Lyubetsky  V. A. 《Molecular Biology》2003,37(4):571-584
We suggest a new procedure to search for the genes with horizontal transfer events in their evolutionary history. The search is based on analysis of topology difference between the phylogenetic trees of gene (protein) groups and the corresponding phylogenetic species trees. Numeric values are introduced to measure the discrepancy between the trees. This approach was applied to analyze 40 prokaryotic genomes classified into 132 classes of orthologs. This resulted in a list of the candidate genes for which the hypothesis of horizontal transfer in evolution looks true.  相似文献   

7.
We present a new class of metrics for unrooted phylogenetic X-trees inspired by the Gromov–Hausdorff distance for (compact) metric spaces. These metrics can be efficiently computed by linear or quadratic programming. They are robust under NNI operations, too. The local behaviour of the metrics shows that they are different from any previously introduced metrics. The performance of the metrics is briefly analysed on random weighted and unweighted trees as well as random caterpillars.  相似文献   

8.
N. Takezaki  M. Nei 《Genetics》1996,144(1):389-399
Recently many investigators have used microsatellite DNA loci for studying the evolutionary relationships of closely related populations or species, and some authors proposed new genetic distance measures for this purpose. However, the efficiencies of these distance measures in obtaining the correct tree topology remains unclear. We therefore investigated the probability of obtaining the correct topology (P(C)) for these new distances as well as traditional distance measures by using computer simulation. We used both the infinite-allele model (IAM) and the stepwise mutation model (SMM), which seem to be appropriate for classical markers and microsatellite loci, respectively. The results show that in both the IAM and SMM CAVALLI-SFORZA and EDWARDS'' chord distance (D(C)) and NEI et al.''s D(A) distance generally show higher P(C) values than other distance measures, whether the bottleneck effect exists or not. For estimating evolutionary times, however, NEI''s standard distance and GOLDSTEIN et al.''s (δ μ)(2) are more appropriate than other distances. Microsatellite DNA seems to be very useful for clarifying the evolutionary relationships of closely related populations.  相似文献   

9.
Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.  相似文献   

10.
The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However, even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information, which raises the following fundamental question. For what (proper) subsets of a dendogram’s leaf set can we uniquely reconstruct the dendogram from the distances that it induces on the elements of such a subset? By formalizing a dendogram in terms of an edge-weighted, rooted, phylogenetic tree on a pre-given finite set X with |X|≥3 whose edge-weighting is equidistant and subsets Y of X for which the distances between every pair of elements in Y is known in terms of sets of 2-subsets of X, we investigate this problem from the perspective of when such a tree is lassoed, that is, uniquely determined by the elements in . For this, we consider four different formalizations of the idea of “uniquely determining” giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary, then all four types of lasso must coincide.  相似文献   

11.
12.
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the ‘kernel trick’, a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly ‘star-like’ shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.  相似文献   

13.
In a rapidly changing biosphere, approaches to understanding the ecology and evolution of forest species will be critical to predict and mitigate the effects of anthropogenic global change on forest ecosystems. Utilizing 26 forest species in a factorial experiment with two levels each of atmospheric CO2 and soil nitrogen, we examined the hypothesis that phylogeny would influence plant performance in response to elevated CO2 and nitrogen fertilization. We found highly idiosyncratic responses at the species level. However, significant, among-genetic lineage responses were present across a molecularly determined phylogeny, indicating that past evolutionary history may have an important role in the response of whole genetic lineages to future global change. These data imply that some genetic lineages will perform well and that others will not, depending upon the environmental context.  相似文献   

14.
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the $mu$-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network.  相似文献   

15.
We present fast new algorithms for evaluating trees with respectto least squares and minimum evolution (ME), the most commonlyused criteria for inferring phylogenetic trees from distancedata. The new algorithms include an optimal O(N2) time algorithmfor calculating the edge (branch or internode) lengths on atree according to ordinary or unweighted least squares (OLS);an O(N3) time algorithm for edge lengths under weighted leastsquares (WLS) including the Fitch-Margoliash method; and anoptimal O(N4) time algorithm for generalized least-squares (GLS)edge lengths (where N is the number of taxa in the tree). TheME criterion is based on the sum of edge lengths. Consequently,the edge lengths algorithms presented here lead directly toO(N2), O(N3), and O(N4) time algorithms for ME under OLS, WLS,and GLS, respectively. All of these algorithms are as fast asor faster than any of those previously published, and the algorithmsfor OLS and GLS are the fastest possible (with respect to orderof computational complexity). A major advantage of our new methodsis that they are as well adapted to multifurcating trees asthey are to binary trees. An optimal algorithm for determiningpath lengths from a tree with given edge lengths is also developed.This leads to an optimal O(N2) algorithm for OLS sums of squaresevaluation and corresponding O(N3) and O(N4) time algorithmsfor WLS and GLS sums of squares, respectively. The GLS algorithmis time-optimal if the covariance matrix is already inverted.The speed of each algorithm is assessed analytically—thespeed increases we calculate are confirmed by the dramatic speedincreases resulting from their implementation in PAUP* 4.0.The new algorithms enable far more extensive tree searches andstatistical evaluations (e.g., bootstrap, parametric bootstrap,or jackknife) in the same amount of time. Hopefully, the fastalgorithms for WLS and GLS will encourage the use of these criteriafor evaluating trees and their edge lengths (e.g., for approximatedivergence time estimates), since they should be more statisticallyefficient than OLS.  相似文献   

16.
We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.  相似文献   

17.
C. W. Birky-Jr. 《Genetics》1996,144(1):427-437
Little attention has been paid to the consequences of long-term asexual reproduction for sequence evolution in diploid or polyploid eukaryotic organisms. Some elementary theory shows that the amount of neutral sequence divergence between two alleles of a protein-coding gene in an asexual individual will be greater than that in a sexual species by a factor of 2tu, where t is the number of generations since sexual reproduction was lost and u is the mutation rate per generation in the asexual lineage. Phylogenetic trees based on only one allele from each of two or more species will show incorrect divergence times and, more often than not, incorrect topologies. This allele sequence divergence can be stopped temporarily by mitotic gene conversion, mitotic crossing-over, or ploidy reduction. If these convergence events are rare, ancient asexual lineages can be recognized by their high allele sequence divergence. At intermediate frequencies of convergence events, it will be impossible to reconstruct the correct phylogeny of an asexual clade from the sequences of protein coding genes. Convergence may be limited by allele sequence divergence and heterozygous chromosomal rearrangements which reduce the homology needed for recombination and result in aneuploidy after crossing-over or ploidy cycles.  相似文献   

18.
V'yugin  V. V.  Gelfand  M. S.  Lyubetsky  V. A. 《Molecular Biology》2002,36(5):650-658
It is well known that phylogenetic trees derived from different protein families are often incongruent. This is explained by mapping errors and by the essential processes of gene duplication, loss, and horizontal transfer. Therefore, the problem is to derive a consensus tree best fitting the given set of gene trees. This work presents a new method of deriving this tree. The method is different from the existing ones, since it considers not only the topology of the initial gene trees, but also the reliability of their branches. Thereby one can explicitly take into account the possible errors in the gene trees caused by the absence of reliable models of sequence evolution, by uneven evolution of different gene families and taxonomic groups, etc.  相似文献   

19.

Background

Phylogenetic trees are complex data forms that need to be graphically displayed to be human-readable. Traditional techniques of plotting phylogenetic trees focus on rendering a single static image, but increases in the production of biological data and large-scale analyses demand scalable, browsable, and interactive trees.

Methodology/Principal Findings

We introduce TreeVector, a Scalable Vector Graphics–and Java-based method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required, and can be modified and linked using standard web technologies. There are now many bioinformatics servers and databases with a range of dynamic processes and updates to cope with the increasing volume of data. TreeVector is designed as a framework to integrate with these processes and produce user-customized phylogenies automatically. We also address the strengths of phylogenetic trees as part of a linked-in browsing process rather than an end graphic for print.

Conclusions/Significance

TreeVector is fast and easy to use and is available to download precompiled, but is also open source. It can also be run from the web server listed below or the user''s own web server. It has already been deployed on two recognized and widely used database Web sites.  相似文献   

20.
Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was recently introduced. In this paper we analyze in detail the meaning of this compatibility from the points of view of the local structure of the trees, of the existence of embeddings into a common supertree, and of the joint properties of their cluster representations. Our analysis leads to a very simple polynomial-time algorithm for testing this compatibility, which we have implemented and is freely available for download from the BioPerl collection of Perl modules for computational biology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号