首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
While the maximum-likelihood (ML) method of tree reconstruction is statistically rigorous, it is extremely time-consuming for reconstructing large trees. We previously developed a hybrid method (NJML) that combines the neighbor-joining (NJ) and ML methods and thus is much faster than the ML method and improves the performance of NJ. However, we considered only nucleotide sequence data, so NJML is not suitable for handling amino acid sequence data, which requires even more computer time. NJML+ is an implementation of a further improved method for practical data analyses (including protein sequence data). Our extensive simulations using nucleotide and amino acid sequences showed that NJML+ gave good results in tree reconstruction. Indeed, NJML+ showed substantial improvements over existing methods in terms of both computational times and efficiencies, especially for amino acid sequence data. We also developed a "user-friendly" interface for the NJML+ program, including a simple tree viewer.  相似文献   

2.
Katoh K  Miyata T 《FEBS letters》1999,463(1-2):129-132
Applying the tree bisection and reconnection (TBR) algorithm, we have developed a heuristic method (maximum likelihood (ML)-TBR) for inferring the ML tree based on tree topology search. For initial trees from which iterative processes start in ML-TBR, two cases were considered: one is 100 neighbor-joining (NJ) trees based on the bootstrap resampling and the other is 100 randomly generated trees. The same ML tree was obtained in both cases. All different iterative processes started from 100 independent initial trees ultimately converged on one optimum tree with the largest log-likelihood value, suggesting that a limited number of initial trees will be quite enough in ML-TBR. This also suggests that the optimum tree corresponds to the global optimum in tree topology space and thus probably coincides with the ML tree inferred by intact ML analysis. This method has been applied to the inference of phylogenetic tree of the SOX family members. The mammalian testis-determining gene SRY is believed to have evolved from SOX-3, a member of the SOX family, based on several lines of evidence, including their sequence similarity, the location of SOX-3 on the X chromosome and some aspects of their expression. This model should be supported directly from the phylogenetic tree of the SOX family, but no evidence has been provided to date. A recently published NJ tree shows implausibly remote origin of SRY, suggesting that a more sophisticated method is required for understanding this problem. The ML tree inferred by the present method showed that the SRYs of marsupial and placental mammals form a monophyletic cluster which had diverged from the mammalian SOX-3 in the early evolution of mammals.  相似文献   

3.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

4.
The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages. Received: 7 March 2000 / Accepted: 2 August 2000  相似文献   

5.
We have developed a new method for reconstructing phylogenetic trees called random local neighbor-joining (RLNJ). Our method is different from the neighbor-joining method (NJ) of Saitou and Nei and affords a more thorough sampling of solution space by randomly searching for local pair of neighbors in each step. Results using the RLNJ method to analyze yeast data show an increasing possibility to get a smaller S value (sum of branch lengths) compared with the NJ method as cases with more taxa are analyzed and many individual runs using the RLNJ method usually generate more than one topology with small S values. Computer simulation shows the fact that the RLNJ method can improve the possibility of recovering correct topology significantly by affording more than one topology. In addition, when using the RLNJ method, computer simulation also shows that the proportion of correct topologies (P(C)) will increase as the number of different topologies decreases and as the proportion of "most frequent topology" increases. Thus, the number of different topologies and the proportion of "most frequent topology" can be used as auxiliary criteria to evaluate reliability of a phylogenetic tree.  相似文献   

6.
A rapid heuristic algorithm for finding minimum evolution trees   总被引:2,自引:0,他引:2  
The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.  相似文献   

7.
The chloroplast-encoded large subunit of the ribulose-1, 5-bisphosphate carboxylase / oxygenase (rbcL) gene was sequenced from 20 species of the colonial Volvocales (the Volvacaceae, Goniaceae, and Tetrabaenaceae) in order to elucidate phylogenetic relationships within the colonial Volvocales. Eleven hundred twenty-eight base pairs in the coding regions of the (rbcL) gene were analyzed by the neighbor-joining (NJ) method using three kinds of distance estimations, as well as by the maximum parsimony (MP) method. A large group comprising all the anisogamous and oogamous volvocacean species was resolved in the MP tree as well as in the NJ trees based on overall and synonymous substitutions. In all the trees constructed, Basichlamys and Tetrabaena (Tetrabaenaceae) constituted a very robust phylogenetic group. Although not supported by high bootstrap values, the MP tree and the NJ tree based on nonsynonymous substitutions indicated that the Tetrabaenaceae is the sister group to the large group comprising the Volvocaceae and the Goniaceae. In addition, the present analysis strongly suggested that Pandorina and Astrephomene are monophyletic genera whereas Eudorina is nonmonophyletic. These results are essentially consistent with the results of the recent cladistic analyses of morphological data. However, the monophyly of the Volvocaceae previously supported by four morphological synapomorphies is found only in the NJ tree based on nonsynonymous substitutions (with very low bootstrap values). The genus Volvox was clearly resolved as a polyphyletic group with V. rousseletii Pocock separated from other species of Volvox in the rbcL gene comparisons, although this genus represents a monophyletic group in the previous morphological analyses. Furthermore, none of the rbcL gene trees supported the monophyly of the Goniaceae; Astrephomene was placed in various phylogenetic positions .  相似文献   

8.
The phylogenetic positioning of the non-pathogenic genusSpiromastix in the Onygenales was studied based on large subunit rDNA (LSU rDNA) partial sequences (ca. 570 bp.). FourSpiromastix species and 28 representative taxa of the Onygenales were newly sequenced. Phylogenetic trees were constructed by the neighbor-joining (NJ) method and evaluated by the maximum parsimony (MP) method with the data of 13 taxa retrieved from DNA databases.Spiromastix and dimorphic systemic pathogens,Ajellomyces andParacoccidioides, appear to be a monophyletic group with 74% bootstrap probability (BP) in the NJ tree constructed with the representative taxa of the Onygenales. The tree topology was concordant with the NJ tree based on SSU rDNA sequences of our previous work and corresponded to the classification system of the Onygenales by Currah (1985) and its minor modification by Udagawa (1997) with the exception of the classification of the Onygenaceae. The Onygeneceae sensu Udagawa may still be polyphyletic, since three independent lineages were recognized. The taxa forming helicoid peridial appendages were localized to two clades on the tree. The topology of the NJ tree constructed withSpiromastix and its close relatives suggested that the helicoid peridial appendages were apomorphic and acquired independently in the two clades of the Onygenales.  相似文献   

9.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.  相似文献   

10.
Tie trees generated by distance methods of phylogenetic reconstruction   总被引:2,自引:0,他引:2  
In examining genetic data in recent publications, Backeljau et al. showed cases in which two or more different trees (tie trees) were constructed from a single data set for the neighbor-joining (NJ) method and the unweighted pair group method with arithmetic mean (UPGMA). However, it is still unclear how often and under what conditions tie trees are generated. Therefore, I examined these problems by computer simulation. Examination of cases in which tie trees occur shows that tie trees can appear when no substitutions occur along some interior branch(es) on a tree. However, even when some substitutions occur along interior branches, tie trees can appear by chance if parallel or backward substitutions occur at some sites. The simulation results showed that tie trees occur relatively frequently for sequences with low divergence levels or with small numbers of sites. For such data, UPGMA sometimes produced tie trees quite frequently, whereas tie trees for the NJ method were generally rare. In the simulation, bootstrap values for clusters (tie clusters) that differed among tie trees were mostly low (< 60%). With a small probability, relatively high bootstrap values (at most 70%-80%) appeared for tie clusters. The bias of the bootstrap values caused by an input order of sequence can be avoided if one of the different paths in the cycles of making an NJ or UPGMA tree is chosen at random in each bootstrap replication.   相似文献   

11.
隙蛛亚科Coelotinae主要分布于东亚地区,其中我国的已有种类占到全世界种数的一半以上,因此对于我国隙蛛类蜘蛛的研究已经成为世界暗蛛科研究的重点之一。隙蛛亚科属于无筛器类群,于1893年,由Cambridge以隙蛛属为模式属而建立,归属于无筛器的漏斗蛛科。之后,虽然经历了数次修订  相似文献   

12.
克隆得到2种缘毛类纤毛虫——钟形钟虫(Vorticella campanula)和螅状独缩虫(Carchesium polypinum)的胞质Hsp70基因部分序列,长度均为438bp,编码146个氨基酸。以细菌为外类群,利用最大似然法和邻接法构建包括其他5种纤毛虫在内的共26个物种的Hsp70基因氨基酸序列系统发育树,其拓扑结构显示:V.campanula和C.polypinum聚在一起,并与另2种寡膜纲的嗜热四膜虫(Tetrahymena thermophila)及草履虫(Paramecium tetraurelia)聚为姊妹枝,提示了缘毛类纤毛虫为单系,且隶属于寡膜纲的系统发育地位。  相似文献   

13.
Evolutionary relationships of human populations on a global scale   总被引:28,自引:2,他引:26  
Using gene frequency data for 29 polymorphic loci (121 alleles), we conducted a phylogenetic analysis of 26 representative populations from around the world by using the neighbor-joining (NJ) method. We also conducted a separate analysis of 15 populations by using data for 33 polymorphic loci. These analyses have shown that the first major split of the phylogenetic tree separates Africans from non-Africans and that this split occurs with a 100% bootstrap probability. The second split separates Caucasian populations from all other non-African populations, and this split is also supported by bootstrap tests. The third major split occurs between Native American populations and the Greater Asians that include East Asians (mongoloids), Pacific Islanders, and Australopapuans (native Australians and Papua New Guineans), but Australopapuans are genetically quite different from the rest of the Greater Asians. The second and third levels of population splitting are quite different from those of the phylogenetic tree obtained by Cavalli- Sforza et al. (1988), where Caucasians, Northeast Asians, and Ameridians from the Northeurasian supercluster and the rest of non- Africans form the Southeast Asian supercluster. One of the major factors that caused the difference between the two trees is that Cavalli-Sforza et al. used unweighted pair-group method with arithmetic mean (UPGMA) in phylogenetic inference, whereas we used the NJ method in which evolutionary rate is allowed to vary among different populations. Bootstrap tests have shown that the UPGMA tree receives poor statistical support whereas the NJ tree is well supported. Implications that the phylogenetic tree obtained has on the current controversy over the out-of-Africa and the multiregional theories of human origins are discussed.   相似文献   

14.
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n ≤ 8. This requires the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of Monte Carlo methods and polyhedral algorithms. Our results include a demonstration that highly unrelated trees can be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the l 2 radius for neighbor-joining for n = 5 and we conjecture that the ability of the neighbor-joining algorithm to recover the BME tree depends on the diameter of the BME tree.  相似文献   

15.
The phylogenetic relationship among the kingdoms Animalia, Plantae, and Fungi remains uncertain, because of lack of solid fossil evidence. In spite of the extensive molecular phylogenetic analyses since the early report, this problem is a longstanding controversy; the proposed phylogenetic relationships differ for different authors, depending on the molecules and methods that they use. To settle this problem, we have accumulated 23 different protein species from the three kingdoms and have inferred the phylogenetic trees by three different methods-- the maximum-likelihood method, the neighbor-joining method, and the maximum-parsimony method--for each data set. Although inferred tree topologies differ for different protein species and methods used, both the maximum-likelihood analysis based on the difference (delta l) between the total log-likelihood of a tree and that of the maximum- likelihood tree and bootstrap probability (P) of 23 proteins consisting of 10,051 amino acid sites in total have shown that a tree ((A,F),P), in which Plantae (P) is an outgroup to an Animalia (A)-Fungi (F) clade, is the maximum-likelihood tree; the delta l (= 0.0) and P (94%) of ((A,F),P) are significantly larger than those of ((A,P),F) (delta l = - 54.4 +/- 36.3; and P = 6%) and ((F,P),A) (delta l = -141.1 +/- 30.9; and P = 0%).(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

16.
Contemporary phylogenomic studies frequently incorporate two-step coalescent analyses wherein the first step is to infer individual-gene trees, generally using maximum-likelihood implemented in the popular programs PhyML or RAxML . Four concerns with this approach are that these programs only present a single fully resolved gene tree to the user despite potential for ambiguous support, insufficient phylogenetic signal to fully resolve each gene tree, inexact computer arithmetic affecting the reported likelihood of gene trees, and an exclusive focus on the most likely tree while ignoring trees that are only slightly suboptimal or within the error tolerance. Taken together, these four concerns are sufficient for RAxML and Phy ML users to be suspicious of the resulting (perhaps over-resolved) gene-tree topologies and (perhaps unjustifiably high) bootstrap support for individual clades. In this study, we sought to determine how frequently these concerns apply in practice to contemporary phylogenomic studies that use RAxML for gene-tree inference. We did so by re-analyzing 100 genes from each of ten studies that, taken together, are representative of many empirical phylogenomic studies. Our seven findings are as follows. First, the few search replicates that are frequently applied in phylogenomic studies are generally insufficient to find the optimal gene-tree topology. Second, there is often more topological variation among slightly suboptimal gene trees relative to the best-reported tree than can be safely ignored. Third, the Shimodaira–Hasegawa-like approximate likelihood ratio test is highly effective at identifying dubiously supported clades and outperforms the alternative approaches of relying on bootstrap support or collapsing minimum-length branches. Fourth, the bootstrap can, but rarely does, indicate high support for clades that are not supported amongst slightly suboptimal trees. Fifth, increasing the accuracy by which RA xML optimizes model-parameter values generally has a nominal effect on selection of optimal trees. Sixth, tree searches using the GTRCAT model were generally less effective at finding optimal known trees than those using the GTRGAMMA model. Seventh, choice of gene-tree sampling strategy can affect inferred coalescent branch lengths, species-tree topology and branch support.  相似文献   

17.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

18.
Summary The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences were studied by considering model trees of three taxa with an outgroup. The cases of constant and varying rates of nucleotide substitution were compared. From sequences obtained by simulation, phylogenetic trees were constructed by using the maximum parsimony (MP) and neighbor joining (NJ) methods. The effectiveness and consistency of the MP method were studied in terms of proportions of informative sites. The results of simulation showed that bootstrap estimation of the confidence level for an inferred phylogeny can be used even under unequal rates of evolution if the rate differences are not large so that the MP method is not misleading. The condition under which the MP method becomes misleading (inconsistent) is more stringent for slowly evolving sequences than for rapidly evolving ones, and it also depends on the length of the internal branch. If the rate differences are large so that the MP method becomes consistently misleading, then bootstrap estimation will reinforce an erroneous conclusion on topology. Similar conclusions apply to the NJ method with uncorrected distances. The NJ method with corrected distances performs poorly when the sequence length is short but can avoid the inconsistency problem if the sequence length is long and if the distances can be estimated accurately.Offprint requests to: W.-H. Li  相似文献   

19.
芍药属牡丹组基于形态学证据的系统发育关系分析   总被引:1,自引:4,他引:1  
对芍药属牡丹组Paeonia L.sect.Moutan DC.(全部野生种)40个居群进行了基于形态学证据的系统学分析,试图建立组内种间的系统发育关系。利用PAUP (4.0)计算机程序分别构建了建立在25个形态学性状基础上的所有研究类群的距离树(UPGMA、NJ)和最大简约树(MP)。所得树的拓扑结构基本一致,差异只发生在距离树和简约树之间,在由形态和细胞学关系都很近的5个种(牡丹P.suffruticosa、矮牡丹P.jishanensis、卵叶牡丹P.qiui、紫斑牡丹P.rockii和凤丹P.o  相似文献   

20.
Partial DNA and amino acid sequences translated from the mitochondrial cytochrome subunit I gene (408 bp) of 17 mite species have been used for analyzing the phylogenetic relationships within the terrestrial Parasitengona (Trombidia). Due to mutational saturation of the third codon position, only first and second codon positions and amino acid sequences were analyzed, applying neighbor-joining, maximum-parsimony, and maximum-likelihood tree-building methods. The reconstructed trees revealed similar topologies of taxa; however, the phylogenetic relationships could be convincingly resolved only within several trombidioid taxa. The proposed basic relationships within the Parasitengona, in particular those of Calyptostomatoidea, Smarididae, and Erythraeidae, were poorly supported in bootstrap tests. A comparison of the presented gene tree with a phylogenetic tree based upon traditional characters revealed only few contradictions in nodes only weakly supported by morphological data. The most astonishing result is the proposed early derivative position of Microtrombidiidae within the terrestrial Parasitengona.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号