期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FastJoin, an improved neighbor-joining algorithm 总被引：1，自引：0，他引：1

J Wang MZ Guo LL Xing 《Genetics and molecular research : GMR》2012,11(3):1909-1922

Reconstructing the evolutionary history of a set of species is an elementary problem in biology, and methods for solving this problem are evaluated based on two characteristics: accuracy and efficiency. Neighbor-joining reconstructs phylogenetic trees by iteratively picking a pair of nodes to merge as a new node until only one node remains; due to its good accuracy and speed, it has been embraced by the phylogeny research community. With the advent of large amounts of data, improved fast and precise methods for reconstructing evolutionary trees have become necessary. We improved the neighbor-joining algorithm by iteratively picking two pairs of nodes and merging as two new nodes, until only one node remains. We found that another pair of true neighbors could be chosen to merge as a new node besides the pair of true neighbors chosen by the criterion of the neighbor-joining method, in each iteration of the clustering procedure for the purely additive tree. These new neighbors will be selected by another iteration of the neighbor-joining method, so that they provide an improved neighbor-joining algorithm, by iteratively picking two pairs of nodes to merge as two new nodes until only one node remains, constructing the same phylogenetic tree as the neighbor-joining algorithm for the same input data. By combining the improved neighbor-joining algorithm with styles upper bound computation optimization of RapidNJ and external storage of ERapidNJ methods, a new method of reconstructing phylogenetic trees, FastJoin, was proposed. Experiments with sets of data showed that this new neighbor-joining algorithm yields a significant speed-up compared to classic neighbor-joining, showing empirically that FastJoin is superior to almost all other neighbor-joining implementations. 相似文献

2.

Theoretical foundation of the minimum-evolution method of phylogenetic inference 总被引：26，自引：5，他引：21

Rzhetsky A; Nei M 《Molecular biology and evolution》1993,10(5):1073-1095

The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them. 相似文献

3.

Accuracy of estimated phylogenetic trees from molecular data 总被引：2，自引：0，他引：2

Yoshio Tateno Masatoshi Nei Fumio Tajima 《Journal of molecular evolution》1982,18(6):387-404

Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained. 相似文献

4.

Property and efficiency of the maximum likelihood method for molecular phylogeny

Naruya Saitou 《Journal of molecular evolution》1988,27(3):261-273

Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods. 相似文献

5.

Random local neighbor joining: a new method for reconstructing phylogenetic trees 总被引：3，自引：0，他引：3

Zhang W Sun Z 《Molecular phylogenetics and evolution》2008,47(1):117-128

We have developed a new method for reconstructing phylogenetic trees called random local neighbor-joining (RLNJ). Our method is different from the neighbor-joining method (NJ) of Saitou and Nei and affords a more thorough sampling of solution space by randomly searching for local pair of neighbors in each step. Results using the RLNJ method to analyze yeast data show an increasing possibility to get a smaller S value (sum of branch lengths) compared with the NJ method as cases with more taxa are analyzed and many individual runs using the RLNJ method usually generate more than one topology with small S values. Computer simulation shows the fact that the RLNJ method can improve the possibility of recovering correct topology significantly by affording more than one topology. In addition, when using the RLNJ method, computer simulation also shows that the proportion of correct topologies (P(C)) will increase as the number of different topologies decreases and as the proportion of "most frequent topology" increases. Thus, the number of different topologies and the proportion of "most frequent topology" can be used as auxiliary criteria to evaluate reliability of a phylogenetic tree. 相似文献

6.

A rapid heuristic algorithm for finding minimum evolution trees 总被引：2，自引：0，他引：2

Rodin A Li WH 《Molecular phylogenetics and evolution》2000,16(2):173-179

The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved. 相似文献

7.

A stepwise algorithm for finding minimum evolution trees 总被引：7，自引：6，他引：1

Kumar S 《Molecular biology and evolution》1996,13(4):584-593

A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets. 相似文献

8.

Limitations of the evolutionary parsimony method of phylogenetic analysis [published erratum appears in Mol Biol Evol 1990 Mar;7(2):201]

Jin L; Nei M 《Molecular biology and evolution》1990,7(1):82-102

Lake's evolutionary parsimony (EP) method of constructing a phylogenetic tree is primarily applied to four DNA sequences. In this method, three quantities--X, Y, and Z--that correspond to three possible unrooted trees are computed, and an invariance property of these quantities is used for choosing the best tree. However, Lake's method depends on a number of unrealistic assumptions. We therefore examined the theoretical basis of his method and reached the following conclusions: (1) When the rates of two transversional changes from a nucleotide are unequal, his invariance property breaks down. (2) Even if the rates of two transversional changes are equal, the invariance property requires some additional conditions. (3) When Kimura's two- parameter model of nucleotide substitution applies and the rate of nucleotide substitution varies greatly with branch, the EP method is generally better than the standard maximum-parsimony (MP) method in recovering the correct tree but is inferior to the neighbor-joining (NJ) and a few other distance matrix methods. (4) When the rate of nucleotide substitution is the same or nearly the same for all branches, the EP method is inferior to the MP method even if the proportion of transitional changes is high. (5) When Lake's assumptions fail, his chi2 test may identify an erroneous tree as the correct tree. This happens because the test is not for comparing different trees. (6) As long as a proper distance measure is used, the NJ method is better than the EP and MP methods whether there is a transition/transversion bias or whether there is variation in substitution rate among different nucleotide sites. 相似文献

9.

基于k-mer组分信息的系统发生树构建方法

下载免费PDF全文

刘红梅刘国庆《生物信息学》2013,11(2):100-104

随着越来越多基因组的测序完成,基于全基因组的非比对的系统发生分析已成为研究热点。不同的生物物种或个体基因组之间的核酸组分不完全相同。遗传语言-DNA序列的信息很大程度上反映在其k—mer频数中。基于基因组序列k-mer频数的系统发生树则从新的角度为我们提供物种之间的亲缘关系。本文定义基于k-mer,频数的信息参数,并用它表征基因组序列,计算不同基因组之间信息参数的距离,用邻接法对84个病毒构建了系统发生树,发现构建的系统发生树很大程度上与已有的系统发生树相吻合。相似文献

10.

Longitudinal phylogenetic tree of within-host viral evolution from noncontemporaneous samples: a distance-based sequential-linking method

Ren F Ogishima S Tanaka H 《Gene》2003,317(1-2):89-95

A new method for reconstructing phylogenetic relationships of within-host (patient) viral evolution from noncontemporaneous samples is presented. This method has two important features: noncontemporaneous viral samples can be dealt with by a simple computing algorithm, and both neutral and adaptive evolution patterns occurring during the process of viral evolution can be estimated. In our previous study, we proposed a preliminary formulation of this algorithm that was based on the maximum likelihood method. However, that preliminary formulation was difficult to use because the calculation of the likelihood required an extremely large amount of time and the number of possible tree topologies increased exponentially according to the increase in the number of viral variants. In this paper, we propose another new algorithm, referred to as a distance-based sequential-linking algorithm, in which the neighbor-joining method is employed for reconstruction of the longitudinal phylogenetic tree from serial viral samples. This algorithm is applied to a longitudinal data set of the env gene (V3 region) of human immunodeficiency virus type 1 (HIV-1) obtained over 7 years after the infection of a single patient. The results suggest that this method can successfully reconstruct a longitudinal phylogenetic tree from noncontemporaneous viral samples within a reasonable calculation time. This revised method proved to be a useful tool for estimating the dynamic process of within-host viral evolution. 相似文献