首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 475 毫秒
1.
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n ≤ 8. This requires the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of Monte Carlo methods and polyhedral algorithms. Our results include a demonstration that highly unrelated trees can be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the l 2 radius for neighbor-joining for n = 5 and we conjecture that the ability of the neighbor-joining algorithm to recover the BME tree depends on the diameter of the BME tree.  相似文献   

2.
Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin’s formula applied to all binary trees. The BME method is related to the Neighbor Joining (NJ) Algorithm, now known to be a greedy optimization of the BME principle. Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parameterized by disjoint clades. We show that these clade-faces are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly, we show that the BME cone and every NJ cone associated to a tree T have an intersection of positive measure.  相似文献   

3.
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy.  相似文献   

4.
The computationally challenging problem of reconstructing the phylogeny of a set of contemporary data, such as DNA sequences or morphological attributes, was treated by an extended version of the neighbor-joining (NJ) algorithm. The original NJ algorithm provides a single-tree topology, after a cascade of greedy pairing decisions that tries to simultaneously optimize the minimum evolution and the least squares criteria. Given that some sub-trees are more stable than others, and that the minimum evolution tree may not be achieved by the original NJ algorithm, we propose a multi-neighbor-joining (MNJ) algorithm capable of performing multiple pairing decisions at each level of the tree reconstruction, keeping various partial solutions along the recursive execution of the NJ algorithm. The main advantages of the new reconstruction procedure are: 1) as is the case for the original NJ algorithm, the MNJ algorithm is still a low-cost reconstruction method; 2) a further investigation of the alternative topologies may reveal stable and unstable sub-trees; 3) the chance of achieving the minimum evolution tree is greater; 4) tree topologies with very similar performances will be simultaneously presented at the output. When there are multiple unrooted tree topologies to be compared, a visualization tool is also proposed, using a radial layout to uniformly distribute the branches with the help of well-known metaheuristics used in computer science.  相似文献   

5.
Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.  相似文献   

6.
A stepwise algorithm for finding minimum evolution trees   总被引:7,自引:6,他引:1  
A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets.   相似文献   

7.
It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon” such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we computationally construct polyhedral cones that give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.  相似文献   

8.
A rapid heuristic algorithm for finding minimum evolution trees   总被引:2,自引:0,他引:2  
The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.  相似文献   

9.
Founder group size is of prime importance in tree breeding programs. We determined whether sampling 20-plus trees for breeding in Allanblackia floribunda, a tropical forest tree species that has been recently enrolled in tree improvement program for fruit and seed production, would affect neutral genetic diversity and inbreeding level in both breeding and production populations. Using eight informative microsatellite loci, we: (a) assessed the nuclear genetic diversity of ten natural populations, and of the breeding population in the humid forest zone of Cameroon; (b) investigated temporal effective-size fluctuations in A. floribunda natural populations, with a view to identifying the role of past demographic events in the genetic structure of the studied species; and (c) tested the hypothesis that genetic diversity in a founder group of 20 individuals is not different from that existing in the wild. The eight loci were variable. High levels of genetic diversity (A = 4.96; H E = 0.59) and moderate differentiation (R ST = 0.061) were found within and among populations in wild stands. High genetic distances existed between populations ( \textaverage chord distance = 0.\text2769 ±0.00\text554 ) \left( {{\text{average chord distance}} = 0.{\text{2769}} \pm 0.00{\text{554}}} \right) . Eight of the ten surveyed populations showed signs of deviation from mutation-drift equilibrium, suggesting Pleistocene population bottlenecks and fluctuations in effective population size. Mantel tests did not reveal any relationships between genetic and geographic distances. A neighbor-joining dendrogram showed a population structure that could be explained by historical factors. The hypothesis tested has been accepted. However, a slight increase in inbreeding was observed in the breeding population.  相似文献   

10.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

11.
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou and Nei's NJ. One member of this family is used to provide a new simple version of the 3-approximation algorithm for the closest additive metric under the iota (infinity) norm. A byproduct of our work is a novel technique which yields a time optimal O (n (2)) implementation of common clustering algorithms such as UPGMA.  相似文献   

12.
Branch length estimates play a central role in maximum-likelihood (ML) and minimum-evolution (ME) methods of phylogenetic inference. For various reasons, branch length estimates are not statistically independent under ML or ME. We studied the response of correlations among branch length estimates to the degree of among-branch length heterogeneity (BLH) in the model (true) tree. The frequency and magnitude of (especially negative) correlations among branch length estimates were both shown to increase as BLH increases under simulation and analytically. For ML, we used the correct model (Jukes–Cantor). For ME, we employed ordinary least-squares (OLS) branch lengths estimated under both simple p-distances and Jukes–Cantor distances, analyzed with and without an among-site rate heterogeneity parameter. The efficiency of ME and ML was also shown to decrease in response to increased BLH. We note that the shape of the true tree will in part determine BLH and represents a critical factor in the probability of recovering the correct topology. An important finding suggests that researchers cannot expect that different branches that were in fact the same length will have the same probability of being accurately reconstructed when BLH exists in the overall tree. We conclude that methods designed to minimize the interdependencies of branch length estimates (BLEs) may (1) reduce both the variance and the covariance associated with the estimates and (2) increase the efficiency of model-based optimality criteria. We speculate on possible ways to reduce the nonindependence of BLEs under OLS and ML. Received: 9 March 1999 / Accepted: 4 May 1999  相似文献   

13.
In order to elucidate the mechanisms of purinergic transmission of calcium (Ca2 + ) waves between microglial cells, we have employed micro-photolithographic methods to form discrete patterns of microglia that allow quantitative measurements of Ca2 +  wave propagation. Microglia were confined to lanes 20–100 wide and Ca2 +  waves propagated from a point of mechanical stimulation, with a diminution in amplitude, for about 120 . The number of cells participating in propagation also decreased over this distance. Ca2 +  waves could propagate across a cell-free lane from one microglia lane to another if this distance of separation was less than about 60 , indicating that propagation involved diffusion of a chemical transmitter. This transmitter was identified as ATP since all Ca2 +  wave propagation was blocked by the purinoceptor antagonist suramin, which blocks P2Y2 and P2Y12 at relatively low concentrations. Antibodies to P2Y12 showed these at very high density compared with P2Y2, indicating a role for P2Y12 receptors. These observations were quantitatively accounted for by a model in which the main determinants are the diffusion of ATP released from a stimulated microglial cell and differences in the dissociation constant of the purinoceptors on the microglial cells.  相似文献   

14.
Radiation-shielding of healthy tissue is mandatory in breast intraoperative electron radiotherapy (IOERT). In this regard, dedicated radioprotection disks have been introduced. The aim of this study was to evaluate and compare the performance of three radioprotection disks widely used for breast IOERT. A Monte Carlo simulation approach was used for this purpose. The considered disks included Al + Pb, PMMA + Copper, and PTFE + Steel. They were stimulated by means of the MCNPX Monte Carlo code at depths around R100 and R90 of different electron energies in a water phantom, and their impact on the dosimetric properties of the therapeutic beam was evaluated in both correct and upside down disk placements. The electron energy spectrum immediately above and below each disk was calculated and analyzed. Furthermore, performance characteristics of the studied disks such as backscatter factors (BSFs) and transmission factors (TFs) at different electron energies were determined and compared. The results show that the Al + Pb disk most effectively attenuates the beam, while at the same time exhibits maximum BSF values. Employing the PMMA + Copper disk can minimize the BSF value but at the expense of an increased TF. The Al + Pb disk showed the best performance from the radiation protection viewpoint, while its highest BSF values could lead to perturbation of dose homogeneity within the target volume. PTFE + Steel disk showed an intermediate performance regarding the electron backscattering and transmission among the studied disks. The reverse placement of each disk can substantially increase the BSF value as compared to the correct situation but had less impact on the TF value.  相似文献   

15.
Araucaria angustifolia is an endangered tropical/subtropical coniferous of great interest for conservation due its economical, ecological, and social value. Only 3% of original Araucaria forests remain, which are generally confined to small forest fragments. Forest fragmentation can have serious consequences on genetic process in tree population, affecting long-term fitness and adaptability. To investigate the effects of forest fragmentation on genetic diversity and the structure of A. angustifolia populations, the genetic diversity of eight microsatellite loci was compared in four small fragmented populations (<22 ha), four tree groups (five to 11 trees) occurring in pastures and in three plots in a large continuous population. The clearest effect of fragmentation was the loss of rare alleles (p ≤ 0.05) in fragmented populations (19.4% to 47.2%) and intermediate frequency (0.05 < p ≤ 0.25) and rare alleles (p ≤ 0.05) in tree groups (19% to 86.1%) in comparison to continuous populations. Fragmented populations have significant higher fixation index ( [^(F)]\textIS = 0.121 \widehat{F}_{\text{IS}} = 0.121 , P < 0.05) than continuous populations ( [^(F)]\textIS = 0.083 \widehat{F}_{\text{IS}} = 0.083 , P < 0.05). High genetic differentiation was detected among tree groups ( [^(G)]\textST = 0.258 \widehat{G}_{{{\text{ST}}}}^{\prime } = 0.258 , P < 0.01) and low among fragments ( [^(G)]\textST = 0.031 \widehat{G}_{{{\text{ST}}}}^{\prime } = 0.031 , P < 0.05) and continuous populations ( [^(G)]\textST = 0.026 \widehat{G}_{{{\text{ST}}}}^{\prime } = 0.026 , P < 0.05), showing a significant bottleneck effect in tree groups. Evidence that forest fragments have experienced a recent bottleneck was confirmed in at least two studied fragments. The implications of the results for conservation of the fragmented A. angustifolia populations are discussed.  相似文献   

16.
Hypokinesia (HK) induces electrolyte losses in electrolyte-deficient tissue, yet the mechanisms of electrolyte losses in electrolyte-deficient tissue remain unknown. Mechanisms of electrolyte deposition could be involved. To determine the effect of prolonged HK on potassium (K+) deposition were measured muscle K+ content and K+ losses. Studies were conducted on 20 physically healthy male volunteers during 30 days pre-experimental period and 364 days experimental period. Subjects were equally divided into two groups: control subjects (CS) and experimental subjects (ES). The CS group was run average distances of 9.8 ± 1.7 km day−1 and the ES group was walked average distances of 2.7 ± 0.6 km day−1. Muscle K+ content decreased (p < 0.05) and plasma K+ concentration, and K+ losses in urine and feces increased (p < 0.05) in the ES group compared to their pre-experimental level and the values in their respective CS group. Muscle K+ content, plasma K+ level, and urine and fecal K+ losses did not show any changes in the CS group compared to their pre-experimental values. The conclusion was that K+ losses in K+-deficient muscle of healthy subjects could have been attributable to the less efficient K+ deposition inherently to prolonged HK.  相似文献   

17.
Clearcut: a fast implementation of relaxed neighbor joining   总被引:1,自引:0,他引:1  
SUMMARY: Clearcut is an open source implementation for the relaxed neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) remains a popular method for distance-based phylogenetic tree reconstruction, it suffers from a O(N(3)) time complexity, where N represents the number of taxa in the input. Due to this steep asymptotic time complexity, NJ cannot reasonably handle very large datasets. In contrast, RNJ realizes a typical-case time complexity on the order of N(2)logN without any significant qualitative difference in output. RNJ is particularly useful when inferring a very large tree or a large number of trees. In addition, RNJ retains the desirable property that it will always reconstruct the true tree given a matrix of additive pairwise distances. Clearcut implements RNJ as a C program, which takes either a set of aligned sequences or a pre-computed distance matrix as input and produces a phylogenetic tree. Alternatively, Clearcut can reconstruct phylogenies using an extremely fast standard NJ implementation. AVAILABILITY: Clearcut source code is available for download at: http://bioinformatics.hungry.com/clearcut  相似文献   

18.
A forest’s productivity can be optimized by the application of rules derived from monopolized circles. A monopolized circle is defined as a circle whose center is a tree and whose radius is half of the distance between the tree itself and its nearest neighbor. Three characteristics of monopolized circle are proved. (1) Monopolized circles do not overlay each other, the nearest relationship being tangent. (2) “Full uniform pattern” means that the grid of trees (a×b=N) covers the whole plot, so that the distance between each tree in a row is the same as the row spacing. The total monopolized circle area with a full uniform pattern is independent on the number of trees and times the plot area. (3) If a tree is removed, the area of some trees’ monopolized circle will increase without decreasing the monopolized circles of the other trees. According to the above three characteristics, “uniform index” is defined as the total area of monopolized circles in a plot divided by the total area of the monopolized circles, arranged in a uniform pattern in the same shaped plot. According to the definition of monopolized circle, the distribution of uniform index for a random pattern and the variance of L is . It is evident that E(L) is independent on N and the plot area; hence, L is a relative index. L can be used to compare the uniformity among plots with different areas and the numbers of trees. In a random pattern, where L is equivalent to the tree density of the plot in which the number of trees is 1 and the area is π, the influence of tree number and plot area to L is eliminated. When n→∞, D(L)→0 and it indicates that the greater the number of tree is in the plots, the smaller the difference between the uniform indices will be. There are three types of patterns for describing tree distribution (aggregated, random, and uniform patterns). Since the distribution of L in the random pattern is accurately derived, L can be used to test the pattern types. The research on Moarshan showed that the whole plot has an aggregated pattern; the first, third, and sixth parts have an aggregated pattern; and the second, fourth, and fifth parts have a random pattern. None of the uniform indices is more than 0.318 (1/∏), which indicates that uniform patterns are rare in natural forests. The rules of uniform index can be applied to forest thinning. If you want to increase the value of uniform index, you must increase the total area of monopolized circles, which can be done by removing select trees. “Increasing area trees” are the removed trees and can increase the value of the uniform index. A tree is an increasing area tree if the distance between the tree and its second nearest neighbor is times longer than that between the tree itself and its first nearest neighbor, which is called the rule. It was very interesting to find that when six plots were randomly separated from the original plot, the proportion of increasing area trees in each plot was always about 0.5 without exception. In random pattern, the expected proportion of increasing area trees is about 0.35–0.44, which is different from the sampling value of 0.5. The reason is very difficult to explain, and further study is needed. Two criteria can be used to identify which trees should be removed to increase the uniform index during forest thinning. Those trees should be (1) trees whose monopolized circle areas are on the small side and (2) increasing area trees, which are found via the rule. Translated from Acta Ecologica Sinica, 2005, 25(1) (in Chinese)  相似文献   

19.

Background  

Distance matrix methods constitute a major family of phylogenetic estimation methods, and the minimum evolution (ME) principle (aiming at recovering the phylogeny with shortest length) is one of the most commonly used optimality criteria for estimating phylogenetic trees. The major difficulty for its application is that the number of possible phylogenies grows exponentially with the number of taxa analyzed and the minimum evolution principle is known to belong to the -hard class of problems.  相似文献   

20.
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号