首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.  相似文献   

2.
We analyze the performance of quartet methods in phylogenetic reconstruction. These methods first compute four-taxon trees (4-trees) and then use a combinatorial algorithm to infer a phylogeny that respects the inferred 4-trees as much as possible. Quartet puzzling (QP) is one of the few methods able to take weighting of the 4-trees, which is inferred by maximum likelihood, into account. QP seems to be widely used. We present weight optimization (WO), a new algorithm which is also based on weighted 4-trees. WO is faster and offers better theoretical guarantees than QP. Moreover, computer simulations indicate that the topological accuracy of WO is less dependent on the shape of the correct tree. However, although the performance of WO is better overall than that of QP, it is still less efficient than traditional phylogenetic reconstruction approaches based on pairwise evolutionary distances or maximum likelihood. This is likely related to long-branch attraction, a phenomenon to which quartet methods are very sensitive, and to inappropriate use of the initial results (weights) obtained by maximum likelihood for every quartet.  相似文献   

3.
为了探究进化模型对DNA条形码分类的影响, 本研究以雾灵山夜蛾科44个种的标本为材料, 获得COI基因序列。使用邻接法(neighbor-joining)、 最大简约法(maximum parsimony)、 最大似然法(maximum likelihood)以及贝叶斯法(Bayesian inference)构建系统发育树, 并且对邻接法的12种模型、 最大似然法的7种模型、 贝叶斯法的2种模型进行模型成功率的评估。结果表明, 邻接法的12种模型成功率相差不大, 较稳定; 最大似然法及贝叶斯法的不同模型成功率存在明显差异, 不稳定; 最大简约法不基于模型, 成功率比较稳定。邻接法及最大似然法共有6种相同的模型, 这6种模型在不同的方法中成功率存在差异。此外, 分子数据中存在单个物种仅有一条序列的情况, 显著降低了模型成功率, 表明在DNA条形码研究中, 每个物种需要有多个样本。  相似文献   

4.
Complete chloroplast 23S rRNA and psbA genes from five peridinin-containing dinoflagellates (Heterocapsa pygmaea, Heterocapsa niei, Heterocapsa rotun-data, Amphidinium carterae, and Protoceratium reticulatum) were amplified by PCR and sequenced; partial sequences were obtained from Thoracosphaera heimii and Scrippsiella trochoidea. Comparison with chloroplast 23S rRNA and psbA genes of other organisms shows that dinoflagellate chloroplast genes are the most divergent and rapidly evolving of all. Quartet puzzling, maximum likelihood, maximum parsimony, neighbor joining, and LogDet trees were constructed. Intersite rate variation and invariant sites were allowed for with quartet puzzling and neighbor joining. All psbA and 23S rRNA trees showed peridinin-containing dinoflagellate chloroplasts as monophyletic. In psbA trees they are related to those of chromists and red algae. In 23S rRNA trees, dinoflagellates are always the sisters of Sporozoa (apicomplexans); maximum likelihood analysis of Heterocapsa triquetra 16S rRNA also groups the dinoflagellate and sporozoan sequences, but the other methods were inconsistent. Thus, dinoflagellate chloroplasts may actually be related to sporozoan plastids, but the possibility of reproducible long-branch artifacts cannot be strongly ruled out. The results for all three genes fit the idea that dinoflagellate chloroplasts originated from red algae by a secondary endosymbiosis, possibly the same one as for chromists and Sporozoa. The marked disagreement between 16S rRNA trees using different phylogenetic algorithms indicates that this is a rather poor molecule for elucidating overall chloroplast phylogeny. We discuss possible reasons why both plastid and mitochondrial genomes of alveolates (Dinozoa, Sporozoa and Ciliophora) have ultra-rapid substitution rates and a proneness to unique genomic rearrangements. Received: 27 December 1999 / Accepted: 24 March 2000  相似文献   

5.

Background  

In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.  相似文献   

6.
We introduce a distance-based phylogeny reconstruction method called "weighted neighbor joining," or "Weighbor" for short. As in neighbor joining, two taxa are joined in each iteration; however, the Weighbor criterion for choosing a pair of taxa to join takes into account that errors in distance estimates are exponentially larger for longer distances. The criterion embodies a likelihood function on the distances, which are modeled as correlated Gaussian random variables with different means and variances, computed under a probabilistic model for sequence evolution. The Weighbor criterion consists of two terms, an additivity term and a positivity term, that quantify the implications of joining the pair. The first term evaluates deviations from additivity of the implied external branches, while the second term evaluates confidence that the implied internal branch has a positive branch length. Compared with maximum-likelihood phylogeny reconstruction, Weighbor is much faster, while building trees that are qualitatively and quantitatively similar. Weighbor appears to be relatively immune to the "long branches attract" and "long branch distracts" drawbacks observed with neighbor joining, BIONJ, and parsimony.  相似文献   

7.
Intraspecific variation is abundant in all types of systematic characters but is rarely addressed in simulation studies of phylogenetic method performance. We compared the accuracy of 15 phylogenetic methods using simulations to (1) determine the most accurate method(s) for analyzing polymorphic data (under simplified conditions) and (2) test if generalizations about the performance of phylogenetic methods based on previous simulations of fixed (nonpolymorphic) characters are robust to a very different evolutionary model that explicitly includes intraspecific variation. Simulated data sets consisted of allele frequencies that evolved by genetic drift. The phylogenetic methods included eight parsimony coding methods, continuous maximum likelihood, and three distance methods (UPGMA, neighbor joining, and Fitch-Margoliash) applied to two genetic distance measures (Nei's and the modified Cavalli-Sforza and Edwards chord distance). Two sets of simulations were performed. The first examined the effects of different branch lengths, sample sizes (individuals sampled per species), numbers of characters, and numbers of alleles per locus in the eight-taxon case. The second examined more extensively the effects of branch length in the four-taxon, two-allele case. Overall, the most accurate methods were likelihood, the additive distance methods (neighbor joining and Fitch-Margoliash), and the frequency parsimony method. Despite the use of a very different evolutionary model in the present article, many of the results are similar to those from simulations of fixed characters. Similarities include the presence of the "Felsenstein zone," where methods often fail, which suggests that long-branch attraction may occur among closely related species through genetic drift. Differences between the results of fixed and polymorphic data simulations include the following: (1) UPGMA is as accurate or more accurate than nonfrequency parsimony methods across nearly all combinations of branch lengths, and (2) likelihood and the additive distance methods are not positively misled under any combination of branch lengths tested (even when the assumptions of the methods are violated and few characters are sampled). We found that sample size is an important determinant of accuracy and affects the relative success of methods (i.e., distance and likelihood methods outperform parsimony at small sample sizes). Attempts to generalize about the behavior of phylogenetic methods should consider the extreme examples offered by fixed-mutation models of DNA sequence data and genetic-drift models of allele frequencies.  相似文献   

8.
Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.  相似文献   

9.
The Testaceafilosia includes amoebae with filopodia and with a proteinaceous, agglutinated or siliceous test. To explore the deeper phylogeny of this group, we sequenced the small subunit ribosomal RNA coding region of 13 species, including the first sequence of an amoeba with an agglutinated test, Pseudodifflugia sp. Phylogenetic analyses using maximum parsimony and maximum likelihood methods as well as neighbor joining method yielded the following results: the order Euglyphida forms a monophyletic lineage with the sarcomonads as sister group. The next related taxa are the Chlorarachnea and the unidentified filose strain N-Por. In agreement with the previous studies the Phytomyxea branch off at the base of this lineage. The Monadofilosa (Testaceafilosia and Sarcomonadea) appear monophyletic. The Testaceafilosia are polyphyletic, because Pseudodifflugia sp. is positioned as the sister taxon to the sarcomonads. Within the order Euglyphida Paulinella branches off first, together with Cyphoderia followed by Tracheleuglypha. In maximum likelihood and neighbor joining analyses, the genus Euglypha is monophyletic. The branching pattern within the order Euglyphida reflects the evolution of shell morphology from simple to complex built test.  相似文献   

10.
We give an explicit construction to solve a conjecture of Mike Steel and David Penny that any phylogeny involving N taxa can be recovered unambiguously using on the order of log N binary characters and the method of maximum parsimony. Biologically, this means that homoplasy need not be a deterrent to parsimony methods. Some patterns of homoplasy are phylogenetically informative and can exponentially reduce the amount of data needed to resolve a phylogeny.  相似文献   

11.
Using simulated data, we compared five methods of phylogenetic tree estimation: parsimony, compatibility, maximum likelihood, Fitch- Margoliash, and neighbor joining. For each combination of substitution rates and sequence length, 100 data sets were generated for each of 50 trees, for a total of 5,000 replications per condition. Accuracy was measured by two measures of the distance between the true tree and the estimate of the tree, one measure sensitive to accuracy of branch lengths and the other not. The distance-matrix methods (Fitch- Margoliash and neighbor joining) performed best when they were constrained from estimating negative branch lengths; all comparisons with other methods used this constraint. Parsimony and compatibility had similar results, with compatibility generally inferior; Fitch- Margoliash and neighbor joining had similar results, with neighbor joining generally slightly inferior. Maximum likelihood was the most successful method overall, although for short sequences Fitch- Margoliash and neighbor joining were sometimes better. Bias of the estimates was inferred by measuring whether the independent estimates of a tree for different data sets were closer to the true tree than to each other. Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches. When rates of evolution varied among different sites, all methods showed signs of inaccuracy and bias.   相似文献   

12.
MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes.  相似文献   

13.
基于12S rRNA基因的鹳形目系统发生关系   总被引:2,自引:0,他引:2  
采用分子系统学的方法探讨鹳形目5个科之间的系统发生关系.文中测出鹳形目鸟类7种mtDNA 12SrRNA基因全序列,并结合来自Genbank的鹳形目另外7个物种及原鸡的同源区序列,经Clustal W软件对位排列后共1 009位点,包含405个变异位点,其中多态性位点381个,260个简约信息位点.基于上述序列数据,以原鸡为外群,使用距离邻接法、最大简约法、最大似然法及贝叶斯法分别重建了鹳形目5科14种的系统发生树.重建的系统发生树显示,内群中的14个种聚合为4支:鹮科构成第一支,聚在系统树的基部;锤头鹳科与鲸头鹳科聚为一支;鹭科和鹳科各自聚成一支.在比较不同建树方法的结果并进行合意树分析后认为:在鹳形目的系统发生中,鹮科可能是最早分化出的一支;锤头鹳科与鲸头鹳科之间的亲缘关系最近,它们祖先与鹭科、鹳科之间的分歧在时间上可能非常接近.鹳形目5个科之间的系统关系可以表示为:(鹮科,(鹭科,鹳科,(锤头鹳科,鲸头鹳科))).  相似文献   

14.
We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.  相似文献   

15.
In order to elucidate the phylogenetic relationship among groups of the order Entomobryomorpha (Collembola), the sequences on the ITS 1 to ITS 2 fragments of the rRNA gene were analyzed in 11 species of three families. In order to avoid the potential risks and inconsistencies of a single method or data set, the phylogenetic reconstructions were based on three different approaches: methods of maximum parsimony, maximum likelihood and neighbor joining. The inferred phylogenies supported monophyly of the order Entomobryomorpha. The relationships between families were different, but the orders of branching within each family were the same. Entomobryidae and Isotomidae were paraphyletic, whereas Tomoceridae was monophyletic. Tomoceridae was subdivided into two branches; the molecular analysis provided results distinctive enough to separate the two genera by the high bootstrap value. On the other hand, two different populations of putative Homidia koreana appeared to be different species, although their chaetotaxy is identical. A wide coverage of characters, including not only morphological characters but also genetic data such as allozymes and DNA sequences, will give a more accurate picture of the classification and phylogeny of the studied group.  相似文献   

16.

Background  

Maximum parsimony is one of the most commonly used and extensively studied phylogeny reconstruction methods. While current evaluation methodologies such as computer simulations provide insight into how well maximum parsimony reconstructs phylogenies, they tell us little about how well maximum parsimony performs on taxa drawn from populations of organisms that evolved subject to natural selection in addition to the random factors of drift and mutation. It is clear that natural selection has a significant impact on Among Site Rate Variation (ASRV) and the rate of accepted substitutions; that is, accepted mutations do not occur with uniform probability along the genome and some substitutions are more likely to occur than other substitutions. However, little is know about how ASRV and non-uniform character substitutions impact the performance of reconstruction methods such as maximum parsimony. To gain insight into these issues, we study how well maximum parsimony performs with data generated by Avida, a digital life platform where populations of digital organisms evolve subject to natural selective pressures.  相似文献   

17.
Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transpositions, as well as through operations, such as duplications, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the tree of life; the combination of gene order data with sequence data also has the potential to provide more robust phylogenetic reconstructions, since each can elucidate evolution at different time scales. Distance corrections greatly improve the accuracy of phylogeny reconstructions from DNA sequences, enabling distance-based methods to approach the accuracy of the more elaborate methods based on parsimony or likelihood at a fraction of the computational cost. This paper focuses on developing distance correction methods for phylogeny reconstruction from whole genomes. The main question we investigate is how to estimate evolutionary histories from whole genomes with equal gene content, and we present a technique, the empirically derived estimator (EDE), that we have developed for this purpose. We study the use of EDE on whole genomes with identical gene content, and we explore the accuracy of phylogenies inferred using EDE with the neighbor joining and minimum evolution methods under a wide range of model conditions. Our study shows that tree reconstruction under these two methods is much more accurate when based on EDE distances than when based on other distances previously suggested for whole genomes. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

18.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

19.
Phylogenetic analyses of 110 serpin protein sequences revealed clades consistent with independent phylogenetic analyses based on exon-intron structure and diagnostic amino acid sites. Trees were estimated by maximum likelihood, neighbor joining, and partial split decomposition using both the BLOSUM 62 and Jones-Taylor-Thornton substitution matrices. Neighbor-joining trees gave results closest to those based on independent analyses using genomic and chromosomal data. The maximum-likelihood trees derived using the quartet puzzling algorithm were very conservative, producing many small clades that separated groups of proteins that other results suggest were related. Independent analyses based on exon-intron structure suggested that a neighbor-joining tree was more accurate than maximum-likelihood trees obtained using the quartet puzzling algorithm.  相似文献   

20.
We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号