共查询到20条相似文献,搜索用时 0 毫秒
1.
We have developed a phylogenetic tree reconstruction method that detects and reports multiple topologically distant low-cost solutions. Our method is a generalization of the neighbor-joining method of Saitou and Nei and affords a more thorough sampling of the solution space by keeping track of multiple partial solutions during its execution. The scope of the solution space sampling is controlled by a pair of user-specified parameters--the total number of alternate solutions and the number of alternate solutions that are randomly selected--effecting a smooth trade-off between run time and solution quality and diversity. This method can discover topologically distinct low-cost solutions. In tests on biological and synthetic data sets using either the least-squares distance or minimum-evolution criterion, the method consistently performed as well as, or better than, both the neighbor-joining heuristic and the PHYLIP implementation of the Fitch-Margoliash distance measure. In addition, the method identified alternative tree topologies with costs within 1% or 2% of the best, but with topological distances of 9 or more partitions from the best solution (16 taxa); with 32 taxa, topologies were obtained 17 (least-squares) and 22 (minimum-evolution) partitions from the best topology when 200 partial solutions were retained. Thus, the method can find lower-cost tree topologies and near-best tree topologies that are significantly different from the best topology. 相似文献
2.
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths. 相似文献
3.
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data. 相似文献
4.
Background
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.Results
We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.Conclusions
We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.5.
Sridhar S Dhamdhere K Blelloch G Halperin E Ravi R Schwartz R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):561-571
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity. 相似文献
6.
Manfred Grasshoff 《Acta biotheoretica》1985,34(2-4):149-156
The conditions are outlined under which the body construction of annelids could have been transformed into that of arthropods. As an adaptation to a vagile life and an uptake of food by filtering particles from the sediment, the body was more and more flattened. Thus lateral protrusions, the subsequent pleurotergites, developed, and the parapodia were shifted to a more ventral position and could differentiate into the branched limbs typical for arthropods. This is the condition under which parts of the body wall were kept immobile, so that they could become sclerotized in the form of rigid plates. 相似文献
7.
Two different methods of using paralogous genes for phylogenetic inference have been proposed: reconciled trees (or gene tree parsimony) and uninode coding. Gene tree parsimony suffers from 10 serious problems, including differential weighting of nucleotide and gap characters, undersampling which can be misinterpreted as synapomorphy, all of the characters not being allowed to interact, and conflict between gene trees being given equal weight, regardless of branch support. These problems are largely avoided by using uninode coding. The uninode coding method is elaborated to address multiple gene duplications within a single gene tree family and handle problems caused by lack of gene tree resolution. An example of vertebrate phylogeny inferred from nine genes is reanalyzed using uninode coding. We suggest that uninode coding be used instead of gene tree parsimony for phylogenetic inference from paralogous genes. 相似文献
8.
Hollich V Milchert L Arvestad L Sonnhammer EL 《Molecular biology and evolution》2005,22(11):2257-2264
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods. 相似文献
9.
Roch S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(1):92-94
Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 相似文献
10.
Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast ‘guide tree’ to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes–Cantor, Kimura, F84 and Tamura–Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences. 相似文献
11.
Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution. We perform simulation analyses to examine the relative ability of these three criteria to correctly identify the root of the tree. The outgroup and molecular clock criteria were best able to identify the root of the tree, whereas the nonreversible model was able to identify the root only when the substitution process was highly nonreversible. We also examined the performance of the criteria for a tree of four species for which the topology and root position are well supported. Results of the analyses of these data are consistent with the simulation results. 相似文献
12.
Phylogenetic trees are important in many areas of biological research, ranging from systematic studies to the methods used for genome annotation. Finding the best scoring tree under any optimality criterion is an NP-hard problem, which necessitates the use of heuristics for tree-search. Although tree-search plays a major role in obtaining a tree estimate, there remains a limited understanding of its characteristics and how the elements of the statistical inferential procedure interact with the algorithms used. This study begins to answer some of these questions through a detailed examination of maximum likelihood tree-search on a wide range of real genome-scale data sets. We examine all 10,395 trees for each of the 106 genes of an eight-taxa yeast phylogenomic data set, then apply different tree-search algorithms to investigate their performance. We extend our findings by examining two larger genome-scale data sets and a large disparate data set that has been previously used to benchmark the performance of tree-search programs. We identify several broad trends occurring during tree-search that provide an insight into the performance of heuristics and may, in the future, aid their development. These trends include a tendency for the true maximum likelihood (best) tree to also be the shortest tree in terms of branch lengths, a weak tendency for tree-search to recover the best tree, and a tendency for tree-search to encounter fewer local optima in genes that have a high information content. When examining current heuristics for tree-search, we find that nearest-neighbor-interchange performs poorly, and frequently finds trees that are significantly different from the best tree. In contrast, subtree-pruning-and-regrafting tends to perform well, nearly always finding trees that are not significantly different to the best tree. Finally, we demonstrate that the precise implementation of a tree-search strategy, including when and where parameters are optimized, can change the character of tree-search, and that good strategies for tree-search may combine existing tree-search programs. 相似文献
13.
Flavonoids have been used successfully for interpreting evolutionary relationships in many groups of angiosperms. These interpretations often have been presented in narrative fashion without specific indications of the kinds of relationships expressed. In this paper a method of phylogeny reconstruction with flavonoid data showing cladistic, patristic, and phenetic relationships is presented. Such a phylogram contains maximal information about flavonoid evolution. As an example, relationships in the North American species ofCoreopsis (Compositae), containing 46 species in 11 sections, are analyzed by this approach. A phylogeny of sections of the genus from previous morphological, chromosomal and hybridization data is compared with that from data on anthochlors (chalcones and aurones). Strong correspondence of these evolutionary interpretations gives support to the hypothesized evolutionary trends within the group. 相似文献
14.
In many phylogenetic problems, assuming that species have evolved from a common ancestor by a simple branching process is unrealistic. Reticulate phylogenetic models, however, have been largely neglected because the concept of reticulate evolution have not been supported by using appropriate analytical tools and software. The reticulate model can adequately describe such complicated mechanisms as hybridization between species or lateral gene transfer in bacteria. In this paper, we describe a new algorithm for inferring reticulate phylogenies from evolutionary distances among species. The algorithm is capable of detecting contradictory signals encompassed in a phylogenetic tree and identifying possible reticulate events that may have occurred during evolution. The algorithm produces a reticulate phylogeny by gradually improving upon the initial solution provided by a phylogenetic tree model. The new algorithm is compared to the popular SplitsGraph method in a reanalysis of the evolution of photosynthetic organisms. A computer program to construct and visualize reticulate phylogenies, called T-Rex (Tree and Reticulogram Reconstruction), is available to researchers at the following URL: www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex. 相似文献
15.
CONSEL: for assessing the confidence of phylogenetic tree selection. 总被引:10,自引:0,他引:10
CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Probability (BP), the Kishino-Hasegawa (KH) test, the Shimodaira-Hasegawa (SH) test, and the Weighted Shimodaira-Hasegawa (WSH) test. CONSEL calculates all these p-values from the output of the phylogeny program packages such as Molphy, PAML, and PAUP*. Furthermore, CONSEL is applicable to a wide class of problems where the BPs are available. AVAILABILITY: The programs are written in C language. The source code for Unix and the executable binary for DOS are found at http://www.ism.ac.jp/~shimo/ CONTACT: shimo@ism.ac.jp 相似文献
16.
17.
Ronald Sluys 《Acta biotheoretica》1983,32(1):29-41
A method of phylogenetic reconstruction as proposed by a number of scientists of the Senckenberg Research Institute is discussed. The method is based on functional-morphological studies, the evolutionary adaptation principle of Bock and Von Wahlert (1965) and so-called model reconstruction. It is argued in this paper that direction of the adaptation process cannot be determined because of lack of knowledge about particular selective forces and that theories of model reconstruction are not open to contradiction in the sense of Popperian falsification. Although it has been claimed that the method provides the only valid directional argument for morphoclines in cladistic studies, it remains unclear how to proceed when morphoclines show contradictory polarities. Moreover, it is doubtful whether polarities of morphoclines can be determined independently of phylogenetic hypotheses, and also whether the use of multistate morphoclines is methodologically valid. By relying on a particular evolutionary theory, i.e. the neo-Darwinian theory, and consequently assigning natural selection as the major agent of directional progress, the Senckenburg method of phylogenetic reconstruction restricts itself to microevolutionary change and, therefore, cannot be used when other hypotheses on the evolutionary process appear to explain the speciation process more plausibly, i.e. hypotheses on macroevolution. Furthermore, it is an unproved statement that evolution always proceeds according to the principle of economy. 相似文献
18.
19.
以系统发育树构建的原有距离方法为基础,吸取了NJ法和FM法中的部分理论,提出了以节点引入为手段的新的简易方法,通过该方法构建了分子系统发育树,结果表明这种方法更加快捷,而且所得结果与FM法完全一致。 相似文献
20.
Cross-immunity among related strains can account for the selection producing the slender phylogenetic tree of influenza A and B in humans. Using a model of seasonal influenza epidemics with drift (Andreasen, 2003. Dynamics of annual influenza A epidemics with immuno-selection. J. Math. Biol. 46, 504-536), and assuming that two mutants arrive in the host population sequentially, we determine the threshold condition for the establishment of the second mutant in the presence of partial cross-protection caused by the first mutant and their common ancestors. For fixed levels of cross-protection, the chance that the second mutant establishes increases with rho the basic reproduction ratio and some temporary immunity may be necessary to explain the slenderness of flu's phylogenetic tree. In the presence of moderate levels of temporary immunity, an asymmetric situation can arise in the season after the two mutants were introduced and established: if the offspring of the new mutant arrives before the offspring of the resident type, then the mutant-line may produce a massive epidemic suppressing the original lineage. However, if the original lineage arrives first then both strains may establish and the phylogenetic tree may bifurcate. 相似文献