首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phylogenetic inference under the pure drift model   总被引:1,自引:1,他引:0  
When pairwise genetic distances are used for phylogenetic reconstruction, it is usually assumed that the genetic distance between two taxa contains information about the time after the two taxa diverged. As a result, upon an appropriate transformation if necessary, the distance usually can be fitted to a linear model such that it is expressed as the sum of lengths of all branches that connect the two taxa in a given phylogeny. This kind of distance is referred to as "additive distance." For a phylogenetic tree exclusively driven by random genetic drift, genetic distances related to coancestry coefficients (theta XY) between any two taxa are more suitable. However, these distances are fundamentally different from the additive distance in that coancestry does not contain any information about the time after two taxa split from a common ancestral population; instead, it reflects the time before the two taxa diverged. In other words, the magnitude of theta XY provides information about how long the two taxa share the same evolutionary pathways. The fundamental difference between the two kinds of distances has led to a different algorithm of evaluating phylogenetic trees when theta XY and related distance measures are used. Here we present the new algorithm using the ordinary- least-squares approach but fitting to a different linear model. This treatment allows genetic variation within a taxon to be included in the model. Monte Carlo simulation for a rooted phylogeny of four taxa has verified the efficacy and consistency of the new method. Application of the method to human population was demonstrated.   相似文献   

2.
The Testaceafilosia includes amoebae with filopodia and with a proteinaceous, agglutinated or siliceous test. To explore the deeper phylogeny of this group, we sequenced the small subunit ribosomal RNA coding region of 13 species, including the first sequence of an amoeba with an agglutinated test, Pseudodifflugia sp. Phylogenetic analyses using maximum parsimony and maximum likelihood methods as well as neighbor joining method yielded the following results: the order Euglyphida forms a monophyletic lineage with the sarcomonads as sister group. The next related taxa are the Chlorarachnea and the unidentified filose strain N-Por. In agreement with the previous studies the Phytomyxea branch off at the base of this lineage. The Monadofilosa (Testaceafilosia and Sarcomonadea) appear monophyletic. The Testaceafilosia are polyphyletic, because Pseudodifflugia sp. is positioned as the sister taxon to the sarcomonads. Within the order Euglyphida Paulinella branches off first, together with Cyphoderia followed by Tracheleuglypha. In maximum likelihood and neighbor joining analyses, the genus Euglypha is monophyletic. The branching pattern within the order Euglyphida reflects the evolution of shell morphology from simple to complex built test.  相似文献   

3.
We introduce a new approach to estimate the evolutionary distance between two sequences. This approach uses a tree with three leaves: two of them correspond to the studied sequences, whereas the third is chosen to handle long-distance estimation. The branch lengths of this tree are obtained by likelihood maximization and are then used to deduce the desired distance. This approach, called TripleML, improves the precision of evolutionary distance estimates, and thus the topological accuracy of distance-based methods. TripleML can be used with neighbor-joining-like (NJ-like) methods not only to compute the initial distance matrix but also to estimate new distances encountered during the agglomeration process. Computer simulations indicate that using TripleML significantly improves the topological accuracy of NJ, BioNJ, and Weighbor, while conserving a reasonable computation time. With randomly generated 24-taxon trees and realistic parameter values, combining NJ with TripleML reduces the number of wrongly inferred branches by about 11% (against 2.6% and 5.5% for BioNJ and Weighbor, respectively). Moreover, this combination requires only about 1.5 min to infer a phylogeny of 96 sequences composed of 1,200 nucleotides, as compared with 6.5 h for FastDNAml on the same machine (PC 466 MHz).  相似文献   

4.
Summary The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences were studied by considering model trees of three taxa with an outgroup. The cases of constant and varying rates of nucleotide substitution were compared. From sequences obtained by simulation, phylogenetic trees were constructed by using the maximum parsimony (MP) and neighbor joining (NJ) methods. The effectiveness and consistency of the MP method were studied in terms of proportions of informative sites. The results of simulation showed that bootstrap estimation of the confidence level for an inferred phylogeny can be used even under unequal rates of evolution if the rate differences are not large so that the MP method is not misleading. The condition under which the MP method becomes misleading (inconsistent) is more stringent for slowly evolving sequences than for rapidly evolving ones, and it also depends on the length of the internal branch. If the rate differences are large so that the MP method becomes consistently misleading, then bootstrap estimation will reinforce an erroneous conclusion on topology. Similar conclusions apply to the NJ method with uncorrected distances. The NJ method with corrected distances performs poorly when the sequence length is short but can avoid the inconsistency problem if the sequence length is long and if the distances can be estimated accurately.Offprint requests to: W.-H. Li  相似文献   

5.
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis.  相似文献   

6.
We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical.  相似文献   

7.
Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transpositions, as well as through operations, such as duplications, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the tree of life; the combination of gene order data with sequence data also has the potential to provide more robust phylogenetic reconstructions, since each can elucidate evolution at different time scales. Distance corrections greatly improve the accuracy of phylogeny reconstructions from DNA sequences, enabling distance-based methods to approach the accuracy of the more elaborate methods based on parsimony or likelihood at a fraction of the computational cost. This paper focuses on developing distance correction methods for phylogeny reconstruction from whole genomes. The main question we investigate is how to estimate evolutionary histories from whole genomes with equal gene content, and we present a technique, the empirically derived estimator (EDE), that we have developed for this purpose. We study the use of EDE on whole genomes with identical gene content, and we explore the accuracy of phylogenies inferred using EDE with the neighbor joining and minimum evolution methods under a wide range of model conditions. Our study shows that tree reconstruction under these two methods is much more accurate when based on EDE distances than when based on other distances previously suggested for whole genomes. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

8.
Quartet-based phylogeny reconstruction methods, such as Quartet Puzzling, were introduced in the hope that they might be competitive with maximum likelihood methods, without being as computationally intensive. However, despite the numerous quartet-based methods that have been developed, their performance in simulation has been disappointing. In particular, Ranwez and Gascuel, the developers of one of the best quartet methods, conjecture that quartet-based methods have inherent limitations that make them unable to produce trees as accurate as neighbor joining or maximum parsimony. In this paper, we present Short Quartet Puzzling, a new quartet-based phylogeny reconstruction algorithm, and we demonstrate the improved topological accuracy of the new method over maximum parsimony and neighbor joining, disproving the conjecture of Ranwez and Gascuel. We also show a dramatic improvement over Quartet Puzzling. Thus, while our new method is not compared to any ML method (as it is not expected to be as accurate as the best of these), this study shows that quartet methods are not as limited in performance as was previously conjectured, and opens the possibility to further improvements through new algorithmic designs.  相似文献   

9.
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.  相似文献   

10.
A human papillomavirus (HPV) type is defined as an HPV isolate whose L1 gene sequence is at least 10% different from that of any other type, while a subtype is 2 to 10% different from any HPV type. In order to analyze the phylogeny behind the subtype definition, we compared 49 isolates of HPV type 44 (HPV-44) and its subtype HPV-55, previously misclassified as a separate type, and 41 isolates of the subtype pair HPV-68a and -b, sampled from cohorts in four continents. The subtypes of each pair are separated by deep dichotomic branching, and three of the four subtypes have evolved large phylogenetic clusters of genomic variants forming a "star" phylogeny, with some branches specific for ethnically defined cohorts. We conclude that subtypes of HPV types are natural and old taxa, equivalent to types, which either diverged more recently than types or evolved more slowly.  相似文献   

11.
Long branches in a true phylogeny tend to disrupt hierarchical character covariation (phylogenetic signal) in the distribution of traits among organisms. The distortion of hierarchical structure in character-state matrices can lead to errors in the estimation of phylogenetic relationships and inconsistency of methods of phylogenetic inference. Examination of trees distorted by long-branch attraction will not reveal the identities of problematic taxa, in part because the distortion can mask long branches by reducing inferred branch lengths and through errors in branching order. Here we present a simple method for the detection of taxa whose placement in evolutionary trees is made difficult by the effects of long-branch attraction. The method is an extension of a tree-independent conceptual framework of phylogenetic data exploration (RASA). Taxa that are likely to attract are revealed because long branches leave distinct footprints in the distribution of character states among taxa, and these traces can be directly observed in the error structure of the RASA regression. Problematic taxa are identified using a new diagnostic plot called the taxon variance plot, in which the apparent cladistic and phenetic variances contributed by individual taxa are compared. The procedure for identifying long edges employs algorithms solved in polynomial time and can be applied to morphological, molecular, and mixed characters. The efficacy of the method is demonstrated using simulated evolution and empirical evidence of long branches in a set of recently published sequences. We show that the accuracy of evolutionary trees can be improved by detecting and combating the potentially misleading influences of long-branch taxa.  相似文献   

12.
Using simulated data, we compared five methods of phylogenetic tree estimation: parsimony, compatibility, maximum likelihood, Fitch- Margoliash, and neighbor joining. For each combination of substitution rates and sequence length, 100 data sets were generated for each of 50 trees, for a total of 5,000 replications per condition. Accuracy was measured by two measures of the distance between the true tree and the estimate of the tree, one measure sensitive to accuracy of branch lengths and the other not. The distance-matrix methods (Fitch- Margoliash and neighbor joining) performed best when they were constrained from estimating negative branch lengths; all comparisons with other methods used this constraint. Parsimony and compatibility had similar results, with compatibility generally inferior; Fitch- Margoliash and neighbor joining had similar results, with neighbor joining generally slightly inferior. Maximum likelihood was the most successful method overall, although for short sequences Fitch- Margoliash and neighbor joining were sometimes better. Bias of the estimates was inferred by measuring whether the independent estimates of a tree for different data sets were closer to the true tree than to each other. Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches. When rates of evolution varied among different sites, all methods showed signs of inaccuracy and bias.   相似文献   

13.
Clearcut: a fast implementation of relaxed neighbor joining   总被引:1,自引:0,他引:1  
SUMMARY: Clearcut is an open source implementation for the relaxed neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) remains a popular method for distance-based phylogenetic tree reconstruction, it suffers from a O(N(3)) time complexity, where N represents the number of taxa in the input. Due to this steep asymptotic time complexity, NJ cannot reasonably handle very large datasets. In contrast, RNJ realizes a typical-case time complexity on the order of N(2)logN without any significant qualitative difference in output. RNJ is particularly useful when inferring a very large tree or a large number of trees. In addition, RNJ retains the desirable property that it will always reconstruct the true tree given a matrix of additive pairwise distances. Clearcut implements RNJ as a C program, which takes either a set of aligned sequences or a pre-computed distance matrix as input and produces a phylogenetic tree. Alternatively, Clearcut can reconstruct phylogenies using an extremely fast standard NJ implementation. AVAILABILITY: Clearcut source code is available for download at: http://bioinformatics.hungry.com/clearcut  相似文献   

14.
An investigation of mushroom phylogeny using the largest subunit of RNA polymerase II gene sequences (RPB1) was conducted in comparison with nuclear ribosomal large subunit RNA gene sequences (nLSU) for the same set of taxa in the genus Inocybe (Agaricales, Basidiomycota). The two data sets, though not significantly incongruent, exhibit conflict among the placement of two taxa that exhibit long branches in the nLSU data set. In contrast, RPB1 terminal branch lengths are rather uniform. Bootstrap support is increased for clades in RPB1. Combined data sets increase the degree of confidence for several relationships. Overall, nLSU data do not yield a robust phylogeny when independently assessed by RPB1 sequences. This multigene study indicates that Inocybe is a monophyletic group composed of at least four distinct lineages-subgenus Mallocybe, section Cervicolores, section Rimosae, and subgenus Inocybe sensu Kühner, Kuyper, non Singer. Within subgenus Inocybe, two additional lineages, one composed of species with smooth basidiospores (clade I) and a second characterized by nodulose-spored species (clade II), are recovered by RPB1 and combined data. The nLSU data recover only clade I. The genera Astrosporina and Inocybella cannot be recognized phylogenetically. "Supersections" Cortinatae and Marginatae are not monophyletic groups.  相似文献   

15.
A phylogenetic analysis of the sugeonfish family Acanthuridae was conducted to investigate: (a) the pattern of divergences among outgroup and basal ingroup taxa, (b) the pattern of species divergences within acanthurid genera, (c) monophyly in the genus Acanthurus, and (d) the evolution of thick-walled stomach morphology in the genera Acanthurus and Ctenochaetus. Fragments of the 12S, 16S, t-Pro, and control region mitochondrial genes were sequenced for 21 acanthurid taxa (representing all extant genera) and four outgroup taxa. Unweighted parsimony analysis produced two optimal trees. Both of these were highly incongruent with a previous morphological phylogeny, especially with regard to the placement of the monotypic outgroups Zanclus and Luvarus. The maximum likelihood tree and the morphological phylogeny were not significantly different and the conflicting branches were very short. Split decomposition analysis identified conflict in the placement of long basal branches separated by short internodes, providing further evidence that long branch attraction is an important cause of disagreement between molecular and morphological trees. Parametric bootstrapping rejected hypotheses of monophyly of: (a) the genus Acanthurus and (b) a group containing representatives of Acanthurus/Ctenochaetus with thick-walled stomachs. The branching pattern of the likelihood and split decomposition trees indicates that evolution in the acanthurid clade has involved at least three periods of intense speciation.  相似文献   

16.
Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.  相似文献   

17.
18.
Problematica are taxa that defy robust phylogenetic placement. Traditionally the term was restricted to fossil forms, but it is clear that extant taxa may be just as difficult to place, whether using morphological or molecular (nucleotide, gene or genomic) markers for phylogeny reconstruction. We discuss the kinds and causes of Problematica within the Metazoa, as well as criteria for their recognition and possible solutions. The inclusive set of Problematica changes depending upon the nature and quality of (homologous) data available, the methods of phylogeny reconstruction and the sister taxa inferred by their placement or displacement. We address Problematica in the context of pre-cladistic phylogenetics, numerical morphological cladistics and molecular phylogenetics, and focus on general biological and methodological implications of Problematica, rather than presenting a review of individual taxa. Rather than excluding Problematica from phylogeny reconstruction, as has often been preferred, we conclude that the study of Problematica is crucial for both the resolution of metazoan phylogeny and the proper inference of body plan evolution.  相似文献   

19.
Aim At broad geographical scales, species richness is a product of three basic processes: speciation, extinction and migration. However, determining which of these processes predominates is a major challenge. Whilst palaeontological studies can provide information on speciation and extinction rates, data are frequently lacking. Here we use a recent dated phylogenetic tree of mammals to explore the relative importance of these three processes in structuring present‐day richness gradients. Location The global terrestrial biosphere. Methods We combine macroecological data with phylogenetic methods more typically used in community ecology to describe the phylogenetic history of regional faunas. Using simulations, we explore two simple phylogenetic metrics, the mean and variance in the pairwise distances between taxa, and describe their relationship to phylogenetic tree topology. We then use these two metrics to characterize the evolutionary relationships among mammal species assemblages across the terrestrial biome. Results We show that the mean and variance in the pairwise distances describe phylogenetic tree topology well, but are less sensitive to phylogenetic uncertainty than more direct measures of tree shape. We find the phylogeny for South American mammals is imbalanced and ‘stemmy’ (long branches towards the root), consistent with recent diversification within evolutionarily disparate lineages. In contrast, the phylogeny for African mammals is balanced and ‘tippy’ (long branches towards the tips), more consistent with the slow accumulation of diversity over long times, reflecting the Old World origin of many mammal clades. Main conclusions We show that phylogeny can accurately capture biogeographical processes operating at broad spatial scales and over long time periods. Our results support inferences from the fossil record – that the New World tropics are a diversity cradle whereas the Old World tropics are a museum of old diversity.  相似文献   

20.
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号