首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Böcker and Dress (Adv Math 138:105–125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.  相似文献   

2.
Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was recently introduced. In this paper we analyze in detail the meaning of this compatibility from the points of view of the local structure of the trees, of the existence of embeddings into a common supertree, and of the joint properties of their cluster representations. Our analysis leads to a very simple polynomial-time algorithm for testing this compatibility, which we have implemented and is freely available for download from the BioPerl collection of Perl modules for computational biology.  相似文献   

3.
4.
5.
We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.  相似文献   

6.

Background

Phylogenetic trees are complex data forms that need to be graphically displayed to be human-readable. Traditional techniques of plotting phylogenetic trees focus on rendering a single static image, but increases in the production of biological data and large-scale analyses demand scalable, browsable, and interactive trees.

Methodology/Principal Findings

We introduce TreeVector, a Scalable Vector Graphics–and Java-based method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required, and can be modified and linked using standard web technologies. There are now many bioinformatics servers and databases with a range of dynamic processes and updates to cope with the increasing volume of data. TreeVector is designed as a framework to integrate with these processes and produce user-customized phylogenies automatically. We also address the strengths of phylogenetic trees as part of a linked-in browsing process rather than an end graphic for print.

Conclusions/Significance

TreeVector is fast and easy to use and is available to download precompiled, but is also open source. It can also be run from the web server listed below or the user''s own web server. It has already been deployed on two recognized and widely used database Web sites.  相似文献   

7.
In this paper, we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state symmetric (CFN) model. In particular, we show that the set of mixture distributions forms a convex polytope and we calculate its dimension; corollaries include a simple criterion for when a mixture of branch lengths on the star tree can mimic the site pattern frequency vector of a resolved quartet tree. Furthermore, by computing volumes of polytopes we can clarify how “common” non-identifiable mixtures are under the CFN model. We also present a new combinatorial result which extends any identifiability result for a specific pair of trees of size six to arbitrary pairs of trees. Next we present a positive result showing identifiability of rates-across-sites models. Finally, we answer a question raised in a previous paper concerning “mixed branch repulsion” on trees larger than quartet trees under the CFN model. F.A. Matsen’s and M. Steel’s research was supported by the Allan Wilson Centre for Molecular Ecology and Evolution. E. Mossel’s research was supported by a Sloan fellowship in Mathematics, NSF awards DMS 0528488 and DMS 0548249 (CAREER) and by ONR grant N0014-07-1-05-06.  相似文献   

8.
To aid in future efforts to accurately reconstruct the vertebrate tree, a quantitative measure of phylogenetic informativeness was applied to nucleotide and amino acid sequences for a set of 11 genes. We identified orthologues and assembled published fossil-calibrated divergence times between taxa that had been sequenced for each gene. Rates of molecular evolution for each site were estimated to characterize the molecular evolutionary pattern of genes and to calculate the phylogenetic informativeness. The fast-evolving gene albumin yielded the highest informativeness over the period from 60 million years ago to 500 million years ago. In contrast, calmodulin yielded the lowest informativeness, presumably because functional constraint minimized substitutions in the amino acid sequence. The gene c-myc showed an intermediate level of informativeness. The nucleotide sequence of cytochrome b showed extremely high utility for recent epochs, but low utility for times before 100 million years ago. We ranked nine other genes for their utility during the epochs of the divergence of the muroid rodents, early placental mammals, early vertebrates, and early metazoa, yielding results consistent with, but more precise than, previous studies. Interestingly, DNA sequence always exceeded amino acid sequence in informativeness over all time scales, yet support values were at best moderately higher. For epochs not subject to strong phylogenetic conflict due to convergence, we advocate gleaning the additional power of the threefold increase in number of characters that is present for DNA sequences over resorting to the less noisy but less informative amino acid sequences.  相似文献   

9.
10.
It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.  相似文献   

11.
We present fast new algorithms for evaluating trees with respectto least squares and minimum evolution (ME), the most commonlyused criteria for inferring phylogenetic trees from distancedata. The new algorithms include an optimal O(N2) time algorithmfor calculating the edge (branch or internode) lengths on atree according to ordinary or unweighted least squares (OLS);an O(N3) time algorithm for edge lengths under weighted leastsquares (WLS) including the Fitch-Margoliash method; and anoptimal O(N4) time algorithm for generalized least-squares (GLS)edge lengths (where N is the number of taxa in the tree). TheME criterion is based on the sum of edge lengths. Consequently,the edge lengths algorithms presented here lead directly toO(N2), O(N3), and O(N4) time algorithms for ME under OLS, WLS,and GLS, respectively. All of these algorithms are as fast asor faster than any of those previously published, and the algorithmsfor OLS and GLS are the fastest possible (with respect to orderof computational complexity). A major advantage of our new methodsis that they are as well adapted to multifurcating trees asthey are to binary trees. An optimal algorithm for determiningpath lengths from a tree with given edge lengths is also developed.This leads to an optimal O(N2) algorithm for OLS sums of squaresevaluation and corresponding O(N3) and O(N4) time algorithmsfor WLS and GLS sums of squares, respectively. The GLS algorithmis time-optimal if the covariance matrix is already inverted.The speed of each algorithm is assessed analytically—thespeed increases we calculate are confirmed by the dramatic speedincreases resulting from their implementation in PAUP* 4.0.The new algorithms enable far more extensive tree searches andstatistical evaluations (e.g., bootstrap, parametric bootstrap,or jackknife) in the same amount of time. Hopefully, the fastalgorithms for WLS and GLS will encourage the use of these criteriafor evaluating trees and their edge lengths (e.g., for approximatedivergence time estimates), since they should be more statisticallyefficient than OLS.  相似文献   

12.
本文提出了一种计算蛋白质绝对进化距离和进化速率的方法,它根据现有同源蛋白质的序列构建分子进化树,并推断进化过程中各结点处的共同祖先序列,根据某成员与某结点处共同祖先序列的氨基酸差异百分率,计算该蛋白质序列的特异进化距离和进化速率。比较我们的算法和Dayhoff等的模拟统计方法表明,我们的算法在一定范围内是正确的。结合计算哺乳动物红细胞生成素的进化速率,讨论了本算法在分子进化研究中的应用。  相似文献   

13.
Because of the correlation expected between the phylogenetic relatedness of two taxa and their net ecological similarity, a measure of the overall phylogenetic relatedness of a community of interacting organisms can be used to investigate the contemporary ecological processes that structure community composition. I describe two indices that use the number of nodes that separate taxa on a phylogeny as a measure of their phylogenetic relatedness. As an example of the use of these indices in community analysis, I compared the mean observed net relatedness of trees (>/=10 cm diameter at breast height) in each of 28 plots (each 0.16 ha) in a Bornean rain forest with the net relatedness expected if species were drawn randomly from the species pool (of the 324 species in the 28 plots), using a supertree that I assembled from published sources. I found that the species in plots were more phylogenetically related than expected by chance, a result that was insensitive to various modifications to the basic methodology. I tentatively infer that variation in habitat among plots causes ecologically more similar species to co-occur within plots. Finally, I suggest a range of applications for phylogenetic relatedness measures in community analysis.  相似文献   

14.
Ori Sargsyan 《Genetics》2010,185(4):1355-1368
The general coalescent tree framework is a family of models for determining ancestries among random samples of DNA sequences at a nonrecombining locus. The ancestral models included in this framework can be derived under various evolutionary scenarios. Here, a computationally tractable full-likelihood-based inference method for neutral polymorphisms is presented, using the general coalescent tree framework and the infinite-sites model for mutations in DNA sequences. First, an exact sampling scheme is developed to determine the topologies of conditional ancestral trees. However, this scheme has some computational limitations and to overcome these limitations a second scheme based on importance sampling is provided. Next, these schemes are combined with Monte Carlo integrations to estimate the likelihood of full polymorphism data, the ages of mutations in the sample, and the time of the most recent common ancestor. In addition, this article shows how to apply this method for estimating the likelihood of neutral polymorphism data in a sample of DNA sequences completely linked to a mutant allele of interest. This method is illustrated using the data in a sample of DNA sequences at the APOE gene locus.THE interest in analyzing polymorphism data in contemporary samples of DNA sequences under various evolutionary scenarios creates a demand to design computationally tractable full-likelihood-based inference methods. For an evolutionary scenario of interest, an ancestral-mutation model can be used to design such a method. The ancestral-mutation model for a sample of DNA sequences at a nonrecombining locus is a combination of two processes: one is an ancestral process that traces the lineages of the sample back in time until the most recent common ancestor, constructing an ancestral tree for the sample. The second is a mutation process that is superimposed on the ancestral tree. The complexities of ancestral-mutation models make the design of such methods challenging. Full data are used instead of summary statistics, which can result in loss of important information in the data (see Felsenstein 1992; Donnelly and Tavaré 1995). In addition, current methods use specific features of the underlying ancestral-mutation models, so they lose flexibility to be applicable to other ancestral-mutation models.More specifically, Griffiths and Tavaré (1994c, 1995) and Kuhner et al. (1995) developed full-likelihood-based inference methods for neutral polymorphisms at a nonrecombining locus. They used the combinations of the standard coalescent (Kingman 1982a,b,c; Hudson 1983; Tajima 1983) with the finite-sites or infinite-sites (Watterson 1975) models as ancestral-mutation models. Stephens and Donnelly (2000) designed an importance sampling method to estimate the full likelihood of the data using the same settings for the ancestral-mutation models. Hobolth et al. (2008) provided another importance sampling scheme restricted to the infinite-sites model. The last two methods are computationally more efficient than the first two methods, but they lose flexibility to be applicable to ancestral models without standard coalescent features with independent coalescence waiting times, such as the coalescent processes with exponential growth (Slatkin and Hudson 1991; Griffiths and Tavaré 1994b).To incorporate the coalescent processes with exponential growth, Kuhner et al. (1998) and Griffiths and Tavaré (1994a, 1999) extended their previous methods. For example, the method of Griffiths and Tavaré (1994a, 1999) allows one to consider ancestral models based on coalescent processes with variable population sizes. Coop and Griffiths (2004) modified this inference method and made it applicable for analyzing full polymorphism data in a sample of DNA sequences from a nonrecombining locus completely linked to a mutant allele of interest, either neutral or under selection. Additionally, ancestral models have been developed for this type of sample, where the mutant allele is either neutral (Griffiths and Tavaré 1998, 2003; Wiuf and Donnelly 1999; Stephens 2000) or under selection (Slatkin and Rannala 1997; Stephens and Donnelly 2003). The ancestral model of Slatkin and Rannala (1997) is part of a family of ancestral models derived by Thompson (1975), Nee et al. (1994), and Rannala (1997), using a linear birth–death process as an evolutionary process in a population. Although all the ancestral models mentioned above differ in their properties and evolutionary scenarios, they are part of the general coalescent tree framework (Griffiths and Tavaré 1998). Therefore, a computationally tractable full-likelihood-based inference method based on this general framework is of great interest.For a sample of n sequences, an ancestral model in the general coalescent tree framework is described as a bifurcating rooted tree with n − 1 internal nodes and n leaves, where the internal nodes are coalescent events that happen one at a time. The tree is a combination of two independent components: the topology and the branch lengths. The topology of the tree is constructed going backward in time by combining two randomly chosen ancestral lineages of the sample at each node; the branch lengths of the tree are defined by the joint distribution function of the coalescence waiting times. Note that any density function for coalescence waiting times can define an ancestral model in the general coalescent tree framework.The n leaves (and the sequences in the sample) are labeled from 1 to n; and the n − 1 internal nodes of the ancestral tree are labeled from 1 to n − 1 (in order of occurrence of the coalescent events backward in time). Thus, the topology of an ancestral tree is a leaf-labeled bifurcating rooted tree with totally ordered interior vertices. These trees are called topological trees.When using the general coalescent tree framework and the infinite-sites model, an evolutionary process that generates polymorphism data in a sample of DNA sequences can be described in the following way. An ancestral tree is constructed, as described above, and mutations are added independently on different branches of the ancestral tree as Poisson processes with equal rates, θ/2, in which θ is the mutation rate at the locus. Then, at the mutation events, the ancestral sequences of the sample are changed according to the infinite-sites model; that is, each mutation occurs at a site of an ancestral sequence at which no previous mutations occurred. Thus, these changes define polymorphism data.Naively, this probabilistic framework can be used to estimate the likelihood of the full observed data in a sample of n sequences. That is, data sets are simulated independently as described above and each simulated data set is compared to the observed data. The proportion of the simulated data sets that match the observed data is an estimate of the likelihood of the observed data. Although this approach provides an estimate for the likelihood of the observed data, this method is computationally infeasible, because the topologies of the ancestral trees of the generated data sets are sampled from the space of all the possible topological trees with n leaves. This space has size n!(n − 1)!/2n−1 (Edwards 1970), which is huge for moderate values of n. The topologies of the ancestral trees of the generated data sets that match the observed data represent a small portion of that space. Thus, designing a method that samples topologies of the ancestral trees from this subspace can make the method computationally tractable.On the basis of this idea, I use the general coalescent tree framework with the infinite-sites model to develop a computationally tractable full-likelihood-based inference method for polymorphisms in DNA sequences at a nonrecombining locus. First, an exact sampling scheme for topologies of the conditional ancestral trees is developed. This method has some computational limitations, so to overcome these limitations a second scheme based on an importance sampling is provided. These sampling schemes are combined with Monte Carlo integrations to estimate the likelihood of the full data, the ages of the mutations in the sample, and the time of the most recent common ancestor of the sample. I describe an application of this method for neutral polymorphism data in a sample of DNA sequences at a nonrecombining locus that is completely linked to a mutant allele of interest, either neutral or under selection. The method is illustrated using the data in a sample of DNA sequences at the APOE gene locus from Fullerton et al. (2000).  相似文献   

15.
Many layouts exist for visualizing phylogenetic trees, allowing to display the same information (evolutionary relationships) in different ways. For large phylogenies, the choice of the layout is a key element, because the printable area is limited, and because interactive on-screen visualizers can lead to unreadable phylogenetic relationships at high zoom levels. A visual inspection of available layouts for rooted trees reveals large empty areas that one may want to fill in order to use less drawing space and eventually gain readability. This can be achieved by using the nonlayered tidy tree layout algorithm that was proposed earlier but was never used in a phylogenetic context so far. Here, we present its implementation, and we demonstrate its advantages on simulated and biological data (the measles virus phylogeny). Our results call for the integration of this new layout in phylogenetic software. We implemented the nonlayered tidy tree layout in R language as a stand-alone function (available at https://github.com/damiendevienne/non-layered-tidy-trees), as an option in the tree plotting function of the R package ape, and in the recent tool for visualizing reconciled phylogenetic trees thirdkind (https://github.com/simonpenel/thirdkind/wiki).  相似文献   

16.
In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree—a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes—called the “species tree.” One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the “concordance” with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping—but not identical—sets of labels, is called “supertree.” In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of “containing as a minor” and “containing as a topological minor” in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time \(2^{O(k^2)} \cdot n\), where n is the total size of the input.  相似文献   

17.
Given a set X of taxa, a phylogenetic X-tree T that is only partially resolved, and a collection of characters on X, we consider the problem of finding a resolution (refinement) of T that minimizes the parsimony score of the given characters. Previous work has shown that this problem has a polynomial time solution provided certain strong constraints are imposed on the input. In this paper we provide a new algorithm for this problem, and show that it is fixed parameter tractable under more general conditions.  相似文献   

18.
Using gene genealogies constructed from gene sequence data, we show that both the mucosal and cutaneous papillomaviruses (PV)—supergroups A and B—appear to have been transmitted through susceptible populations faster than exponentially. The data and methods involved (1) examining the PV database for phylogenetic signal in an L1 open reading frame (ORF) fragment and an E1 ORF segment, (2) demonstrating that the same two fragments have evolved in a way consistent with a molecular clock, and (3) applying methods of phylogenetic tree analysis that test different scenarios for the dynamics of viral transmission within populations. The results indicate increases in PV populations of both supergroups A and B in the recent past. This form of the increases, which fit a null model of population growth with an exponent increasing in time, is compatible with the fact that human populations have grown at a faster than exponential rate, thus increasing the numbers of susceptible hosts for HPVs. There are, however, indications that the population of supergroup A has now stopped increasing in size. Received: 4 June 1996 / Accepted: 12 August 1996  相似文献   

19.
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the ‘kernel trick’, a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly ‘star-like’ shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.  相似文献   

20.
The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号