首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.  相似文献   

2.
Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.  相似文献   

3.
We introduce a distance-based phylogeny reconstruction method called "weighted neighbor joining," or "Weighbor" for short. As in neighbor joining, two taxa are joined in each iteration; however, the Weighbor criterion for choosing a pair of taxa to join takes into account that errors in distance estimates are exponentially larger for longer distances. The criterion embodies a likelihood function on the distances, which are modeled as correlated Gaussian random variables with different means and variances, computed under a probabilistic model for sequence evolution. The Weighbor criterion consists of two terms, an additivity term and a positivity term, that quantify the implications of joining the pair. The first term evaluates deviations from additivity of the implied external branches, while the second term evaluates confidence that the implied internal branch has a positive branch length. Compared with maximum-likelihood phylogeny reconstruction, Weighbor is much faster, while building trees that are qualitatively and quantitatively similar. Weighbor appears to be relatively immune to the "long branches attract" and "long branch distracts" drawbacks observed with neighbor joining, BIONJ, and parsimony.  相似文献   

4.
Although long-branch attraction, the incorrect grouping of long lineages in a phylogeny because of systematic error, has been identified as a potential source of error in phylogenetic analysis for almost two decades, no empirical examples of the phenomenon exist. Here, I outline several criteria for identifying long-branch attraction and apply these criteria to 18S ribosomal DNA (rDNA) sequence data for 13 insects. Parsimony and minimum evolution with p distances group the two longest branches together (those leading to Strepsiptera and Diptera). Simulation studies show that the long branches are long enough to attract. When a tree is assumed in which Strepsiptera and Diptera are separated and many data sets are simulated for that tree (using the parameter estimates for that tree for the original data), parsimony analysis of the simulated data consistently groups Strepsiptera and Diptera. Analyses of the 18S rDNA sequences using methods that are less sensitive to the problem of long-branch attraction estimate trees in which the long branches are separate.  相似文献   

5.
Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.  相似文献   

6.
We used new 18S and 28S rRNA sequences analysed with parsimony, maximum likelihood and Bayesian methods of phylogenetic reconstruction to show that Nemertodermatida, generally classified as the sister group of Acoela within the recently proposed Phylum Acoelomorpha, are a separate basal bilaterian lineage. We used several analytical approaches to control for possible long branch attraction (LBA) artefacts in our results. Parsimony and the model based phylogenetic reconstruction methods that incorporate 'corrections' for substitution rate heterogenities yielded concordant results. When putative long branch taxa were experimentally removed the resulting topologies were consistent with our total evidence analysis. Deletion of fast-evolving nucleotide sites decreased resolution and clade support, but did not support a topology conflicting with the total evidence analysis. Establishment of Acoela and Nemertodermatida as two early lineages facilitates reconstruction of ancestral bilaterian features. The ancestor of extant Bilateria was a small, benthic direct developer without coelom or a planktonic larval stage. The previously proposed Phylum Acoelomorpha is dismissed as paraphyletic.  相似文献   

7.
Heterotachy is a general term to describe positions that evolve at different rates in different lineages. Heterotachy also can generally be viewed as multivariate rates-across-sites variation, which can be described as randomly drawing rates (or branch lengths) from a multivariate distribution for each branch at each site (Wu J, Susko E. 2009. General heterotachy and distance method adjustments. Mol Biol Evol. 26:2689-2697). Motivated by this result, we propose three new distance-based tests: a heterogeneity test, a heterotachy test, and a within-gene heterotachy test and demonstrate with simulations that they perform well under a wide range of conditions. We also applied the first two tests to two real data sets and found that although all these data sets showed significant evidence of heterotachy, there were subtrees for which the data were consistent with an equal rates or rates-across-sites model.heterogeneity, heterotachy, within-gene heterotachy, covarion model, distance method, hypothesis test.  相似文献   

8.
Despite the proliferation of increasingly sophisticated models of DNA sequence evolution, choosing among models remains a major problem in phylogenetic reconstruction. The choice of appropriate models is thought to be especially important when there is large variation among branch lengths. We evaluated the ability of nested models to reconstruct experimentally generated, known phylogenies of bacteriophage T7 as we varied the terminal branch lengths. Then, for each phylogeny we determined the best-fit model by progressively adding parameters to simpler models. We found that in several cases the choice of best-fit model was affected by the parameter addition sequence. In terms of phylogenetic performance, there was little difference between models when the ratio of short: long terminal branches was 1:3 or less. However, under conditions of extreme terminal branch-length variation, there were not only dramatic differences among models, but best-fit models were always among the best at overcoming long-branch attraction. The performance of minimum-evolution-distance methods was generally lower than that of discrete maximum-likelihood methods, even if maximum-likelihood methods were used to generate distance matrices. Correcting for among-site rate variation was especially important for overcoming long-branch attraction. The generality of our conclusions is supported by earlier simulation studies and by a preliminary analysis of mitochondrial and nuclear sequences from a well-supported four-taxon amniote phylogeny.  相似文献   

9.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

10.
ABSTRACT: BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.  相似文献   

11.

Background

Long branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so that Maximum Likelihood could be used instead of Maximum Parsimony.

Results

The long branch extraction method has been well cited and used by many authors in their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by an extensive search of the branch length search space under two topologies of six taxa, a Felsenstein-like topology and Farris-like topology. We also examine a long branch shortening method.

Conclusions

The long branch extraction method seems to mask the majority of the search space rendering it ineffective as a detection method of LBA. A proposed alternative, the long branch shortening method, is also ineffective in predicting long branch attraction for all tree topologies.
  相似文献   

12.
13.

Background  

Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees.  相似文献   

14.
The problem of inferring confidence sets of gene trees is discussed without assuming that the substitution model or the branching pattern of any of the investigated trees is correct. In this case, widely used methods to compare genealogies can give highly contradicting results. Here, three methods to infer confidence sets that are robust against model misspecification are compared, including a new approach based on estimating the confidence in a specific tree using expected-likelihood weights. The power of the investigated methods is studied by analysing HIV-1 and mtDNA sequence data as well as simulated sequences. Finally, guidelines for choosing an appropriate method to compare multiple gene trees are provided.  相似文献   

15.
Cophylogeny is the congruence of phylogenetic relationships between two different groups of organisms due to their long‐term interaction. We investigated the use of tree shape distance measures to quantify the degree of cophylogeny. We implemented a reverse‐time simulation model of pathogen phylogenies within a fixed host tree, given cospeciation probability, host switching, and pathogen speciation rates. We used this model to evaluate 18 distance measures between host and pathogen trees including two kernel distances that we developed for labeled and unlabeled trees, which use branch lengths and accommodate different size trees. Finally, we used these measures to revisit published cophylogenetic studies, where authors described the observed associations as representing a high or low degree of cophylogeny. Our simulations demonstrated that some measures are more informative than others with respect to specific coevolution parameters especially when these did not assume extreme values. For real datasets, trees’ associations projection revealed clustering of high concordance studies suggesting that investigators are describing it in a consistent way. Our results support the hypothesis that measures can be useful for quantifying cophylogeny. This motivates their usage in the field of coevolution and supports the development of simulation‐based methods, i.e., approximate Bayesian computation, to estimate the underlying coevolutionary parameters.  相似文献   

16.
Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.  相似文献   

17.
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1alpha (EF-1alpha) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial (outgroup) sequences ("long-branch attraction," or "LBA"). However, no previous studies have explicitly determined the reason(s) why the artifactual allegiance of microsporidia and archaebacteria ("M + A") is recovered by all phylogenetic methods, including maximum likelihood, a method that is supposed to be resistant to classical LBA. Here we show that the M + A affinity can be attributed to those alignment sites associated with large differences in evolutionary site rates between the eukaryotic and archaebacterial subtrees. Therefore, failure to model the significant evolutionary rate distribution differences (covarion shifts) between the ingroup and outgroup sequences is apparently responsible for the artifactual basal position of microsporidia in phylogenetic analyses of EF-1alpha sequences. Currently, no evolutionary model that accounts for discrete changes in the site rate distribution on particular branches is available for either protein or nucleotide level phylogenetic analysis, so the same artifacts may affect many other "deep" phylogenies. Furthermore, given the relative similarity of the site rate patterns of microsporidian and archaebacterial EF-1alpha proteins ("parallel site rate variation"), we suggest that the microsporidian orthologs may have lost some eukaryotic EF-1alpha-specific nontranslational functions, exemplifying the extreme degree of reduction in this parasitic lineage.  相似文献   

18.
Trees are usually grown in containers in the nursery until they reach a certain size, whereupon they are transplanted to a permanent location. Infrastructure development has often led to the removal of large trees. To maintain lush foliage and trees of a size that benefit urban ecology, trees can be grown in containers. Containerized trees can be moved from one location to another, and this relocation does not require root pruning or crown-size reduction. The drawback to having trees in containers is the small and confined volume of the container, which limits tree root development and thus affects containerized tree stability. The objective of this study was to understand the failure mechanisms for and the effect of the root dimensions on the stability of containerized trees. Therefore, small-scale stability model tests were conducted which were verified using numerical and analytical models. The results identified two failure modes that were likely to occur: tree overturning and container overturning. The mode of failure was dependent on the root dimensions. When the trees had extended their roots deep into the container, the whole container would overturn in the event of failure due to increased root confinement and shear resistance of the soil. On the other hand, the main failure mechanism when there was shallow root development was the uplifting of the tree from the container while the container remained upright. The results from numerical and analytical models were consistent with those obtained during the small-scale model stability tests.  相似文献   

19.
Long branches in a true phylogeny tend to disrupt hierarchical character covariation (phylogenetic signal) in the distribution of traits among organisms. The distortion of hierarchical structure in character-state matrices can lead to errors in the estimation of phylogenetic relationships and inconsistency of methods of phylogenetic inference. Examination of trees distorted by long-branch attraction will not reveal the identities of problematic taxa, in part because the distortion can mask long branches by reducing inferred branch lengths and through errors in branching order. Here we present a simple method for the detection of taxa whose placement in evolutionary trees is made difficult by the effects of long-branch attraction. The method is an extension of a tree-independent conceptual framework of phylogenetic data exploration (RASA). Taxa that are likely to attract are revealed because long branches leave distinct footprints in the distribution of character states among taxa, and these traces can be directly observed in the error structure of the RASA regression. Problematic taxa are identified using a new diagnostic plot called the taxon variance plot, in which the apparent cladistic and phenetic variances contributed by individual taxa are compared. The procedure for identifying long edges employs algorithms solved in polynomial time and can be applied to morphological, molecular, and mixed characters. The efficacy of the method is demonstrated using simulated evolution and empirical evidence of long branches in a set of recently published sequences. We show that the accuracy of evolutionary trees can be improved by detecting and combating the potentially misleading influences of long-branch taxa.  相似文献   

20.
The Rooting of the Universal Tree of Life Is Not Reliable   总被引:19,自引:0,他引:19  
Several composite universal trees connected by an ancestral gene duplication have been used to root the universal tree of life. In all cases, this root turned out to be in the eubacterial branch. However, the validity of results obtained from comparative sequence analysis has recently been questioned, in particular, in the case of ancient phylogenies. For example, it has been shown that several eukaryotic groups are misplaced in ribosomal RNA or elongation factor trees because of unequal rates of evolution and mutational saturation. Furthermore, the addition of new sequences to data sets has often turned apparently reasonable phylogenies into confused ones. We have thus revisited all composite protein trees that have been used to root the universal tree of life up to now (elongation factors, ATPases, tRNA synthetases, carbamoyl phosphate synthetases, signal recognition particle proteins) with updated data sets. In general, the two prokaryotic domains were not monophyletic with several aberrant groupings at different levels of the tree. Furthermore, the respective phylogenies contradicted each others, so that various ad hoc scenarios (paralogy or lateral gene transfer) must be proposed in order to obtain the traditional Archaebacteria–Eukaryota sisterhood. More importantly, all of the markers are heavily saturated with respect to amino acid substitutions. As phylogenies inferred from saturated data sets are extremely sensitive to differences in evolutionary rates, present phylogenies used to root the universal tree of life could be biased by the phenomenon of long branch attraction. Since the eubacterial branch was always the longest one, the eubacterial rooting could be explained by an attraction between this branch and the long branch of the outgroup. Finally, we suggested that an eukaryotic rooting could be a more fruitful working hypothesis, as it provides, for example, a simple explanation to the high genetic similarity of Archaebacteria and Eubacteria inferred from complete genome analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号