期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quartet MaxCut: A fast algorithm for amalgamating quartet trees

Sagi Snir Satish Rao 《Molecular phylogenetics and evolution》2012,62(1):1-8

Accurate phylogenetic reconstruction methods are inherently computationally heavy and therefore are limited to relatively small numbers of taxa. Supertree construction is the task of amalgamating small trees over partial sets into a big tree over the complete taxa set. The need for fast and accurate supertree methods has become crucial due to the enormous number of new genomic sequences generated by modern technology and the desire to use them for classification purposes. In particular, the Assembling the Tree of Life (ATOL) program aims at constructing the evolutionary history of all living organisms on Earth. When dealing with unrooted trees, a quartet - an unrooted tree over four taxa - is the most basic piece of phylogenetic information. Therefore, quartet amalgamation stands at the heart of any supertree problem as it concerns combining many minimal pieces of information into a single, coherent, and more comprehensive piece of information.We have devised an extremely fast algorithm for quartet amalgamation and implemented it in a very efficient code. The new code can handle over a hundred millions of quartet trees over several hundreds of taxa with very high accuracy. 相似文献

2.

An algorithm for constructing local regions in a phylogenetic network

Huber KT Watson EE Hendy MD 《Molecular phylogenetics and evolution》2001,19(1):1-8

The groupings of taxa in a phylogenetic tree cannot represent all the conflicting signals that usually occur among site patterns in aligned homologous genetic sequences. Hence a tree-building program must compromise by reporting a subset of the patterns, using some discriminatory criterion. Thus, in the worst case, out of possibly a large number of equally good trees, only an arbitrarily chosen tree might be reported by the tree-building program as "The Tree." This tree might then be used as a basis for phylogenetic conclusions. One strategy to represent conflicting patterns in the data is to construct a network. The Buneman graph is a theoretically very attractive example of such a network. In particular, a characterization for when this network will be a tree is known. Also the Buneman graph contains each of the most parsimonious trees indicated by the data. In this paper we describe a new method for constructing the Buneman graph that can be used for a generalization of Hadamard conjugation to networks. This new method differs from previous methods by allowing us to focus on local regions of the graph without having to first construct the full graph. The construction is illustrated by an example. 相似文献

3.

Quartets MaxCut: a divide and conquer quartets algorithm

Snir S Rao S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(4):704-718

Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest. 相似文献

4.

RBT--a tool for building refined Buneman trees

Besenbacher S Mailund T Westh-Nielsen L Pedersen CN 《Bioinformatics (Oxford, England)》2005,21(8):1711-1712

SUMMARY: We have developed a tool implementing an efficient algorithm for refined Buneman tree reconstruction. The algorithm--which has the same complexity as the neighbour-joining method and the (plain) Buneman tree construction--enables refined Buneman tree reconstruction on large taxa sets. AVAILABILITY: The source code for RBT, written in Java, is available under the GNU Public License (GPL) at http://www.birc.dk/Software/RBT CONTACT: besen@daimi.au.dk. 相似文献

5.

Bayes estimators for phylogenetic reconstruction

Huggins PM Li W Haws D Friedrich T Liu J Yoshida R 《Systematic biology》2011,60(4):528-540

Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet, most reconstruction methods like maximum likelihood (ML) do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate that is closest on average to the samples. This "median" tree is known as the Bayes estimator (BE). The BE literally maximizes posterior expected accuracy, measured in terms of closeness (distance) to the true tree. We discuss a unified framework of BE trees, focusing especially on tree distances that are expressible as squared euclidean distances. Notable examples include Robinson-Foulds (RF) distance, quartet distance, and squared path difference. Using both simulated and real data, we show that BEs can be estimated in practice by hill-climbing. In our simulation, we find that BEs tend to be closer to the true tree, compared with ML and neighbor joining. In particular, the BE under squared path difference tends to perform well in terms of both path difference and RF distances. 相似文献

6.

IQPNNI: moving fast through tree space and stopping in time 总被引：12，自引：0，他引：12

Vinh le S Von Haeseler A 《Molecular biology and evolution》2004,21(8):1565-1571

An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni). 相似文献

7.

Encoding phylogenetic trees in terms of weighted quartets

Grünewald S Huber KT Moulton V Semple C 《Journal of mathematical biology》2008,56(4):465-477

One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees. 相似文献

8.

Optimal algorithms for local vertex quartet cleaning

Della Vedova G Wareham HT 《Bioinformatics (Oxford, England)》2002,18(10):1297-1304

MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes. 相似文献

9.

Using max cut to enhance rooted trees consistency

Snir S Rao S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(4):323-333

Supertree methods are used to construct a large tree over a large set of taxa from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of a few dozen taxa, the use of a supertree method in order to construct the tree of life is inevitable. Supertree methods are broadly divided according to the input trees: When the input trees are unrooted, the basic reconstruction unit is a quartet tree. In this case, the basic decision problem of whether there exists a tree that agrees with all quartets is NP-complete. On the other hand, when the input trees are rooted, the basic reconstruction unit is a rooted triplet and the above decision problem has a polynomial time algorithm. However, when there is no tree which agrees with all triplets, it would be desirable to find the tree that agrees with the maximum number of triplets. However, this optimization problem was shown to be NP-hard. Current heuristic approaches perform min cut on a graph representing the triplets inconsistency and return a tree that is guaranteed to satisfy some required properties. In this work, we present a different heuristic approach that guarantees the properties provided by the current methods and give experimental evidence that it significantly outperforms currently used methods. This method is based on a divide and conquer approach, where the min cut in the divide step is replaced by a max cut in a variant of the same graph. The latter is achieved by a lightweight semidefinite programming-like heuristic that leads to very fast running times 相似文献

10.

√2 Rule for Controlling the Tree Pattern in Forest Cut

Luo Chuanwen Li Jihong 《生物学前沿》2006,(1)

A forest's productivity can be optimized by the application of rules derived from monopolized circles.A monopolized circle is defined as a circle whose center is a tree and whose radius is half of the distance between the tree itself and its nearest neighbor.Three characteristics of monopolized circle are proved.(1) Monopolized circles do not overlay each other,the nearest relationship being tangent.(2)"Full uniform pattern"means that the grid of trees (a×=N) covers the whole plot,so that the distance between each tree in a row is the same as the row spacing.The total monopolized circle area with a full uniform pattern is independent on the number of trees and π/4 times the plot area.(3) If a tree is removed,the area of some trees'monopolized circle will increase without decreasing the monopolized circles of the other trees.According to the above three characteristics,"uniform index"is defined as the total area of monopolized circles in a plot divided by the total area of the monopolized circles,arranged in a uniform pattern in the same shaped plot.According to the definition of monopolized circle,the distribution of uniform index (L) = x2(2n)/2πn for a random pattern and E(L)=1/π;the variance of L is D(L)=1/nπ2.It is evident that E(L) is independent on N and the plot area;hence,L is a relative index.L can be used to compare the uniformity among plots with different areas and the numbers of trees.In a random pattern,where L is equivalent to the tree density of the plot in which the number of trees is 1 and the area is π,the influence of tree number and plot area to L is eliminated.When n→∞,D(L)→0 and L→1/π= 0.318;it indicates that the greater the number of tree is in the plots,the smaller the difference between the uniform indices will be.There are three types of patterns for describing tree distribution (aggregated,random,and uniform patterns).Since the distribution of L in the random pattern is accurately derived,L can be used to test the pattern types.The research on Moarshan showed that the whole plot has an aggregated pattern;the first,third,and sixth parts have an aggregated pattern;and the second,fourth,and fifth parts have a random pattern.None of the uniform indices is more than 0.318 (1/Ⅱ),which indicates that uniform patterns are rare in natural forests.The rules of uniform index can be applied to forest thinning.If you want to increase the value of uniform index,you must increase the total area of monopolized circles,which can be done by removing select trees."Increasing area trees"are the removed trees and can increase the value of the uniform index.A tree is an increasing area tree if the distance between the tree and its second nearest neighbor is √2 times longer than that between the tree itself and its first nearest neighbor,which is called the √2 rule.It was very interesting to find that when six plots were randomly separated from the original plot,the proportion of increasing area trees in each plot was always about 0.5 without exception.In random pattern,the expected proportion of increasing area trees is about 0.35-0.44,which is different from the sampling value of 0.5.The reason is very difficult to explain,and further study is needed.Two criteria can be used to identify which trees should be removed to increase the uniform index during forest thinning.Those trees should be (1) trees whose monopolized circle areas are on the small side and (2) increasing area trees,which are found via the √2 rule. 相似文献

11.

Complete Mitochondrial Genomes and Eutherian Evolution

Patrice Showers Corneli 《Journal of Mammalian Evolution》2002,9(4):281-305

Recent large-scale nuclear DNA phylogenies have supported unconventional interordinal relationships among modern eutherians as well as divergence dates (100 mya) that substantially predate the first appearance of fossils from modern eutherians near the Cretaceous/Cenozoic (K/T) boundary (65-70 mya). For comparison to the nuclear data, I analyzed 12 complete mitochondrial DNA (mtDNA) protein-coding genes (10,677 bp) from 53 eutherian taxa, using maximum-likelihood methods to estimate model parameters (GTR + I + ) and to optimize topology and branch-length estimates. Although closely resembling the nuclear DNA trees, the mtDNA maximum-likelihood tree is just one of seven statistically indistinguishable ( lnL 1.747) trees, each suggesting different evolutionary relationships. This 53-taxon data set and another including 56 taxa provide no statistically significant support for a monophyletic afrotherian clade. In fact, these mitochondrial DNA sequences fail to support the monophyly of three putative eutherian divisions suggested by the nuclear data (Afrotheria, Laurasiatheria or Euarchontoglires). By comparison to well-supported branches describing relationships among families, those describing interordinal relationships are extremely short and only tenuously supported. Neither these sequences, nor sequences simulated under a known tree, fully resolve any interordinal relationship. Even simulated sequences that are twice as long (22kb) as mtDNA protein-coding genes are too short and too saturated to resolve the deepest and shortest interordinal relationships. Further, the mammalian mtDNA sequences appear to depart significantly from molecular-clock and quartet dating assumptions. Unlike recent nuclear DNA studies, I find that mtDNA genes, by themselves, are inadequate to describe relationships or divergence times at the base of the eutherian tree. 相似文献

12.

Fast computation of supertrees for compatible phylogenies with nested taxa

Berry V Semple C 《Systematic biology》2006,55(2):270-288

相似文献

13.

PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades

Alexe G Satya RV Seiler M Platt D Bhanot T Hui S Tanaka M Levine AJ Bhanot G 《Journal of molecular evolution》2008,67(5):465-487

Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a "common" polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k = 2,3,4,...,kmax clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a "European R clade" containing the haplogroups H, V, H/V, J, T, and U and a "Eurasian N subclade" including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1-20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 +/- 14,000 years before present. 相似文献

14.

Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting

Desper R Gascuel O 《Molecular biology and evolution》2004,21(3):587-598

Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web. 相似文献

15.

The pitfalls of molecular phylogeny based on four species,as illustrated by the Cetacea/Artiodactyla relationships 总被引：6，自引：0，他引：6

Hervé Philippe Emmanuel Douzery 《Journal of Mammalian Evolution》1994,2(2):133-152

We study the reliability of phylogeny based on four taxa, when the internal, ancestral, branch is short. Such a quartet approach has been broadly used for inferring phylogenetic patterns. The question of branching pattern between the suborders Ruminantia and Suiformes (order Artiodactyla) and the order Cetacea is chosen as an example. All the combinations of four taxa were generated by taking on and only one species per group under study (three ingroups and one outgroup). Using real sequences, the analysis of these combinations demonstrates that the quartet approach is seriously misleading. Using both maximum parsimony and distance methods, it is possible to find a quartet of species which provided a high bootstrap proportion for each of the three possible unrooted trees. With the same set of sequences, we used all the available species simultaneously to construct a molecular phylogeny. This approach proved much more reliable than the quartet approach. When the number of informative sites is rather low, the branching patterns are not supported through bootstrap analysis, preventing us from false inference due to the lack of information. The reliable resolution of the phylogenetic relationships among Ruminantia, Suiformes, and Cetacea will therefore require a large number of nucleotides, such as the complete mitochondrial genomes of at least 30 species. 相似文献

16.

Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Improved consensus network techniques for genome-scale phylogeny

Holland BR Jermiin LS Moulton V;SMBE Tri-National Young Investigators 《Molecular biology and evolution》2006,23(5):848-855

Although recent studies indicate that estimating phylogenies from alignments of concatenated genes greatly reduces the stochastic error, the potential for systematic error still remains, heightening the need for reliable methods to analyze multigene data sets. Consensus methods provide an alternative, more inclusive, approach for analyzing collections of trees arising from multiple genes. We extend a previously described consensus network method for genome-scale phylogeny (Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. 21:1459-1461) to incorporate additional information. This additional information could come from bootstrap analysis, Bayesian analysis, or various methods to find confidence sets of trees. The new methods can be extended to include edge weights representing genetic distance. We use three data sets to illustrate the approach: 61 genes from 14 angiosperm taxa and one gymnosperm, 106 genes from eight yeast taxa, and 46 members of a gene family from 15 vertebrate taxa. 相似文献

17.

Phylogeny of shorebirds, gulls, and alcids (Aves: Charadrii) from the cytochrome-b gene: parsimony, Bayesian inference, minimum evolution, and quartet puzzling 总被引：3，自引：0，他引：3

Thomas GH Wills MA Székely T 《Molecular phylogenetics and evolution》2004,30(3):516-526

Charadrii (shorebirds, gulls, and alcids) have exceptional diversity in ecological, behavioral, and life-history traits. A phylogenetic framework is necessary to fully understand the relationships among these traits. Despite several attempts to resolve the phylogeny of the Charadrii, none have comprehensively utilized molecular sequence data. Complete and partial cytochrome-b gene sequences for 86 Charadrii and five Falconides species (as outgroup taxa) were obtained from GenBank and aligned. We analyzed the resulting matrices using parsimony, Bayesian inference, minimum evolution, and quartet puzzling methods. Posterior probabilities, decay indices, and bootstrapping provide strong support for four major lineages consisting of gulls, alcids, plovers, and sandpipers, respectively. The broad structure of the trees differ significantly from all previous hypotheses of Charadrii phylogeny in placing the plovers at the base of the tree below the sandpipers in a pectinate sequence towards a large clade of gulls and alcids. The parsimony, Bayesian, and minimum evolution models provide strong evidence for this phylogenetic hypothesis. This is further corroborated by non-tree based measures of support and conflict (Lento plots). The quartet puzzling trees are poorly resolved and inconclusive. 相似文献

18.

QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets

Grünewald S Forslund K Dress A Moulton V 《Molecular biology and evolution》2007,24(2):532-538

We present QNet, a method for constructing split networks from weighted quartet trees. QNet can be viewed as a quartet analogue of the distance-based Neighbor-Net (NNet) method for network construction. Just as NNet, QNet works by agglomeratively computing a collection of circular weighted splits of the taxa set which is subsequently represented by a planar split network. To illustrate the applicability of QNet, we apply it to a previously published Salmonella data set. We conclude that QNet can provide a useful alternative to NNet if distance data are not available or a character-based approach is preferred. Moreover, it can be used as an aid for determining when a quartet-based tree-building method may or may not be appropriate for a given data set. QNet is freely available for download. 相似文献

19.

Fast local fragment chaining using sum-of-pair gap costs

Otto Christian Hoffmann Steve Gorodkin Jan Stadler Peter F 《Algorithms for molecular biology : AMB》2011,6(1):1-8

Background

When inferring phylogenetic trees different algorithms may give different trees. To study such effects a measure for the distance between two trees is useful. Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

Results

We have derived a new algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is sub-cubic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm so far for computing the quartet distance between general trees independent of the degree of the inner nodes.

Conclusions

We have implemented our algorithm and two of the best competitors. Our new algorithm is significantly faster than the competition and seems to run in close to quadratic time in practice. 相似文献

20.

Neighbor joining algorithms for inferring phylogenies via LCA distances.

Ilan Gronau Shlomo Moran 《Journal of computational biology》2007,14(1):1-15

Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou and Nei's NJ. One member of this family is used to provide a new simple version of the 3-approximation algorithm for the closest additive metric under the iota (infinity) norm. A byproduct of our work is a novel technique which yields a time optimal O (n (2)) implementation of common clustering algorithms such as UPGMA. 相似文献