首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Clearcut: a fast implementation of relaxed neighbor joining   总被引:1,自引:0,他引:1  
SUMMARY: Clearcut is an open source implementation for the relaxed neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) remains a popular method for distance-based phylogenetic tree reconstruction, it suffers from a O(N(3)) time complexity, where N represents the number of taxa in the input. Due to this steep asymptotic time complexity, NJ cannot reasonably handle very large datasets. In contrast, RNJ realizes a typical-case time complexity on the order of N(2)logN without any significant qualitative difference in output. RNJ is particularly useful when inferring a very large tree or a large number of trees. In addition, RNJ retains the desirable property that it will always reconstruct the true tree given a matrix of additive pairwise distances. Clearcut implements RNJ as a C program, which takes either a set of aligned sequences or a pre-computed distance matrix as input and produces a phylogenetic tree. Alternatively, Clearcut can reconstruct phylogenies using an extremely fast standard NJ implementation. AVAILABILITY: Clearcut source code is available for download at: http://bioinformatics.hungry.com/clearcut  相似文献   

2.
Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin’s formula applied to all binary trees. The BME method is related to the Neighbor Joining (NJ) Algorithm, now known to be a greedy optimization of the BME principle. Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parameterized by disjoint clades. We show that these clade-faces are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly, we show that the BME cone and every NJ cone associated to a tree T have an intersection of positive measure.  相似文献   

3.
4.
The chloroplast-encoded large subunit of the ribulose-1, 5-bisphosphate carboxylase / oxygenase (rbcL) gene was sequenced from 20 species of the colonial Volvocales (the Volvacaceae, Goniaceae, and Tetrabaenaceae) in order to elucidate phylogenetic relationships within the colonial Volvocales. Eleven hundred twenty-eight base pairs in the coding regions of the (rbcL) gene were analyzed by the neighbor-joining (NJ) method using three kinds of distance estimations, as well as by the maximum parsimony (MP) method. A large group comprising all the anisogamous and oogamous volvocacean species was resolved in the MP tree as well as in the NJ trees based on overall and synonymous substitutions. In all the trees constructed, Basichlamys and Tetrabaena (Tetrabaenaceae) constituted a very robust phylogenetic group. Although not supported by high bootstrap values, the MP tree and the NJ tree based on nonsynonymous substitutions indicated that the Tetrabaenaceae is the sister group to the large group comprising the Volvocaceae and the Goniaceae. In addition, the present analysis strongly suggested that Pandorina and Astrephomene are monophyletic genera whereas Eudorina is nonmonophyletic. These results are essentially consistent with the results of the recent cladistic analyses of morphological data. However, the monophyly of the Volvocaceae previously supported by four morphological synapomorphies is found only in the NJ tree based on nonsynonymous substitutions (with very low bootstrap values). The genus Volvox was clearly resolved as a polyphyletic group with V. rousseletii Pocock separated from other species of Volvox in the rbcL gene comparisons, although this genus represents a monophyletic group in the previous morphological analyses. Furthermore, none of the rbcL gene trees supported the monophyly of the Goniaceae; Astrephomene was placed in various phylogenetic positions .  相似文献   

5.
QuickJoin--fast neighbour-joining tree reconstruction   总被引:1,自引:0,他引:1  
We have built a tool for fast construction of very large phylogenetic trees. The tool uses heuristics for speeding up the neighbour-joining algorithm-while still constructing the same tree as the original neighbour-joining algorithm-making it possible to construct trees for 8000 species in <10 min on a single desktop PC. In comparison, the same task takes more than 30 min using the QuickTree neighbour-joining implementation.  相似文献   

6.
A rapid heuristic algorithm for finding minimum evolution trees   总被引:2,自引:0,他引:2  
The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.  相似文献   

7.
Fishes of the genus Herichthys are the only representatives of the family Cichlidae to have colonized the Neartic region. In this study, we used DNA barcode of 64 individuals of the Herichthys bartoni species group to test the monophyly of the species and the efficiency of this tool to discriminate among species. The Bayesian phylogenetic tree and the neighbour joining (NJ) tree obtained from the Kimura two‐parameter model (K2P) give similar topologies. DNA barcoding resolution was very poor (25%). Additionally, the low levels of genetic divergence among taxa preclude the use of threshold values as has been suggested in earlier studies.  相似文献   

8.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

9.
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy.  相似文献   

10.
To infer the monophyletic origin and phylogenetic relationships of the order Desmoscolecida, a unique and puzzling group of mainly free-living marine nematodes, we newly determined nearly complete 18S rDNA sequences for six marine desmoscolecid nematodes belonging to four genera (Desmoscolex, Greeffiella, Tricoma and Paratricoma). Based on the present data and those of 72 nematode species previously reported, the first molecular phylogenetic analysis focusing on Desmoscolecida was done by using neighbor joining (NJ), maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI) methods. All four resultant trees consistently and strongly supported that the family Desmoscolecidae forms a monophyletic group with very high node confidence values. The monophyletic clade of desmocolecid nematodes was placed as a sister group of the clade including some members of Monhysterida and Araeolaimida, Cyartonema elegans (Cyartonematidae) and Terschellingia longicaudata (Linhomoeidae) in all the analyses. However, the present phylogenetic trees do not show any direct attraction between the families Desmoscolecidae and Cyartonematidae. Within the monophyletic clade of the family Desmoscolecidae in all of the present phylogenetic trees, there were consistently observed two distinct sub-groups which correspond to the subfamilies Desmoscolecinae [Greeffiella sp. + Desmoscolex sp.] and Tricominae [Paratricoma sp. + Tricoma sp].  相似文献   

11.
In the reconstruction of a large phylogenetic tree, the most difficult part is usually the problem of how to explore the topology space to find the optimal topology. We have developed a "divide-and-conquer" heuristic algorithm in which an initial neighbor-joining (NJ) tree is divided into subtrees at internal branches having bootstrap values higher than a threshold. The topology search is then conducted by using the maximum-likelihood method to reevaluate all branches with a bootstrap value lower than the threshold while keeping the other branches intact. Extensive simulation showed that our simple method, the neighbor-joining maximum-likelihood (NJML) method, is highly efficient in improving NJ trees. Furthermore, the performance of the NJML method is nearly equal to or better than existing time-consuming heuristic maximum-likelihood methods. Our method is suitable for reconstructing relatively large molecular phylogenetic trees (number of taxa >/= 16).  相似文献   

12.
拓扑树间的通经拓扑距离   总被引:1,自引:1,他引:0  
给出了一种新的系统树间的拓扑距离,使用NJ,MP,UPGMA等3种方法对13种动物的线粒体中14个基因(含组合的)DNA序列数据进行系统树的构建,利用分割拓扑距离和本文给出的通经拓扑距离对这14种系统树这间及其与真树进行比较。结果显示,NJ法对获得已知树的有效率最高,MP法次之,UPGMA法最低。这14种DNA序列所构建的系统树与已知树的拓扑距离基本上是随其DNA序列长度增加而减小,但两者的相关系数并未达到显著水平,分割拓扑距离在总体上可反映树间的拓扑结构差异,但其测度精确度比通经拓扑距离要低。  相似文献   

13.
Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last few years, and most are polynomial time. In this paper, we compare pre-existing fast converging methods as well as some new polynomial time methods that we have developed. Our study, based upon simulating evolution under a wide range of model conditions, establishes that our new methods outperform both neighbor joining and the previous fast converging methods, returning very accurate large trees, when these other methods do poorly.  相似文献   

14.
The phylogenetic positioning of the non-pathogenic genusSpiromastix in the Onygenales was studied based on large subunit rDNA (LSU rDNA) partial sequences (ca. 570 bp.). FourSpiromastix species and 28 representative taxa of the Onygenales were newly sequenced. Phylogenetic trees were constructed by the neighbor-joining (NJ) method and evaluated by the maximum parsimony (MP) method with the data of 13 taxa retrieved from DNA databases.Spiromastix and dimorphic systemic pathogens,Ajellomyces andParacoccidioides, appear to be a monophyletic group with 74% bootstrap probability (BP) in the NJ tree constructed with the representative taxa of the Onygenales. The tree topology was concordant with the NJ tree based on SSU rDNA sequences of our previous work and corresponded to the classification system of the Onygenales by Currah (1985) and its minor modification by Udagawa (1997) with the exception of the classification of the Onygenaceae. The Onygeneceae sensu Udagawa may still be polyphyletic, since three independent lineages were recognized. The taxa forming helicoid peridial appendages were localized to two clades on the tree. The topology of the NJ tree constructed withSpiromastix and its close relatives suggested that the helicoid peridial appendages were apomorphic and acquired independently in the two clades of the Onygenales.  相似文献   

15.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

16.
We have developed a new method for reconstructing phylogenetic trees called random local neighbor-joining (RLNJ). Our method is different from the neighbor-joining method (NJ) of Saitou and Nei and affords a more thorough sampling of solution space by randomly searching for local pair of neighbors in each step. Results using the RLNJ method to analyze yeast data show an increasing possibility to get a smaller S value (sum of branch lengths) compared with the NJ method as cases with more taxa are analyzed and many individual runs using the RLNJ method usually generate more than one topology with small S values. Computer simulation shows the fact that the RLNJ method can improve the possibility of recovering correct topology significantly by affording more than one topology. In addition, when using the RLNJ method, computer simulation also shows that the proportion of correct topologies (P(C)) will increase as the number of different topologies decreases and as the proportion of "most frequent topology" increases. Thus, the number of different topologies and the proportion of "most frequent topology" can be used as auxiliary criteria to evaluate reliability of a phylogenetic tree.  相似文献   

17.
The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages. Received: 7 March 2000 / Accepted: 2 August 2000  相似文献   

18.
通过对类人猿亚目中部分种类的孕激素受体基因进行分析,重建类人猿亚目的 系统发育关系.扩增并测定了来源于14个属的类人猿亚目物种的孕激素受体编码区序列,并基于这一序列数据,分别采用邻接法、最大简约法和最大似然法重建了系统发育关系.除了阔鼻下目,3种方法构建的系统发生树的拓扑结构类似且各节点支持率高.重建的人猿超科和猴超科内部亲缘关系支持多数人所认可的分类系统.本研究为黑猩猩和人的姐妹群关系提供了证据,提示黑猩猩比大猩猩或其他猿猴更接近人类.阔鼻下目中蜘蛛猴科、卷尾猴科和僧面猴科三者之间的系统发育关系在本研究中未得到很好辨析.  相似文献   

19.
Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 5' end of the nucleus-encoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphology-based classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal); Mycena and Resinomycena; Termitomyces, Podabrella, and Lyophyllum; and Pleurotus with Hohenbuehelia. Several groups revealed by these data to be nonmonophyletic include the families Tricholomataceae, Cortinariaceae, and Hygrophoraceae and the genera Clitocybe, Omphalina, and Marasmius. This study provides a framework for future systematics studies in the Agaricales and suggestions for analyzing large molecular data sets.  相似文献   

20.
田鹏  刘占林 《生物信息学》2009,7(3):232-233
以系统发育树构建的原有距离方法为基础,吸取了NJ法和FM法中的部分理论,提出了以节点引入为手段的新的简易方法,通过该方法构建了分子系统发育树,结果表明这种方法更加快捷,而且所得结果与FM法完全一致。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号