首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

2.
We test hypotheses for the evolution of a life history trait among a group of parasitoid wasps (Hymenoptera: Ichneumonoidea), namely, the transition among koinobiont parasitoids (parasitoids whose hosts continue development after oviposition) between attacking exposed hosts and attacking hosts that are concealed within plant tissue. Using a range of phylogeny estimates based on 28S rDNA sequences, we use maximum parsimony (MP) and maximum likelihood (ML) methods to estimate the ancestral life history traits for the main clades in which both traits occur (using the programs MacClade and Discrete, respectively). We also assess the robustness of these estimates; for MP, we use step matrices in PAUP* to find the minimum weight necessary to reverse estimates or make them ambiguous, and for ML, we measure the differences in likelihood after fixing the ancestral nodes at the alternative states. We also measure the robustness of the MP ancestral state estimate against uncertainties in the phylogeny estimate, manipulating the most-parsimonious tree in MacClade to find the shortest suboptimal tree in which the ancestral state estimate is reversed or made ambiguous. Using these methods, we find strong evidence supporting two transitions among koinobiont Ichneumonoidea: (1) to attacking exposed hosts in a clade consisting of the Helconinae and related subfamilies, and (2) the reverse transition in a clade consisting of the Euphorinae and related subfamilies. In exploring different methods of analyzing variable-length DNA sequences, we found that direct optimization with POY gave some clearly erroneous results that had a profound effect on the overall phylogeny estimate. We also discuss relationships within the superfamily and expand the Mesostoinae to include all the gall-associated braconids that form the sister group of the Aphidiinae.  相似文献   

3.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

4.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

5.
Abstract We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates presented in the literature often ignore the ancestral coalescent prior to speciation and therefore should be biased upward. The simulations show that both methods yield accurate estimates given sample sizes of five or more species pairs and that better likelihood estimates are obtained if there is no significant difference among ancestral population sizes. The model presented here indicates that the larger than expected variation found in multitaxa datasets can be explained by variation in the ancestral coalescence and the Poisson mutation process. In this context, observed variation can often be accounted for by variation in ancestral population sizes rather than invoking variation in other parameters, such as divergence time or mutation rate. The methods are applied to data from two groups of species pairs (sea urchins and Alpheus snapping shrimp) that are thought to have separated by the rise of Panama three million years ago.  相似文献   

6.
Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters.  相似文献   

7.
8.
To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology‐data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.  相似文献   

9.
10.
Tuffley and Steel (Bull. Math. Biol. 59:581–607, 1997) proved that maximum likelihood and maximum parsimony methods in phylogenetics are equivalent for sequences of characters under a simple symmetric model of substitution with no common mechanism. This result has been widely cited ever since. We show that small changes to the model assumptions suffice to make the two methods inequivalent. In particular, we analyze the case of bounded substitution probabilities as well as the molecular clock assumption. We show that in these cases, even under no common mechanism, maximum parsimony and maximum likelihood might make conflicting choices. We also show that if there is an upper bound on the substitution probabilities which is ‘sufficiently small’, every maximum likelihood tree is also a maximum parsimony tree (but not vice versa).  相似文献   

11.
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu.  相似文献   

12.
Island systems have long been useful models for understanding lineage diversification in a geographic context, especially pertaining to the importance of dispersal in the origin of new clades. Here we use a well-resolved phylogeny of the flowering plant genus Cyrtandra (Gesneriaceae) from the Pacific Islands to compare four methods of inferring ancestral geographic ranges in islands: two developed for character-state reconstruction that allow only single-island ranges and do not explicitly associate speciation with range evolution (Fitch parsimony [FP; parsimony-based] and stochastic mapping [SM; likelihood-based]) and two methods developed specifically for ancestral range reconstruction, in which widespread ranges (spanning islands) are integral to inferences about speciation scenarios (dispersal-vicariance analysis [DIVA; parsimony-based] and dispersal-extinction-cladogenesis [DEC; likelihood-based]). The methods yield conflicting results, which we interpret in light of their respective assumptions. FP exhibits the least power to unequivocally reconstruct ranges, likely due to a combination of having flat (uninformative) transition costs and not using branch length information. SM reconstructions generally agree with a prior hypothesis about dispersal-driven speciation across the Pacific, despite the conceptual mismatch between its character-based model and this mode of range evolution. In contrast with narrow extant ranges for species of Cyrtandra, DIVA reconstructs broad ancestral ranges at many nodes. DIVA results also conflict with geological information on island ages; we attribute these conflicts to the parsimony criterion not considering branch lengths or time, as well as vicariance being the sole means of divergence for widespread ancestors. DEC analyses incorporated geological information on island ages and allowed prior hypotheses about range size and dispersal rates to be evaluated in a likelihood framework and gave more nuanced inferences about range evolution and the geography of speciation than other methods tested. However, ancestral ranges at several nodes could not be conclusively resolved, due possibly to uncertainty in the phylogeny or the relative complexity of the underlying model. Of the methods tested, SM and DEC both converge on plausible hypotheses for area range histories in Cyrtandra, due in part to the consideration of branch lengths and/or timing of events. We suggest that DEC model-based methods for ancestral range inference could be improved by adopting a Bayesian SM approach, in which stochastic sampling of complete geographic histories could be integrated over alternative phylogenetic topologies. Likelihood-based estimates of ancestral ranges for Cyrtandra suggest a major dispersal route into the Pacific through the islands of Fiji and Samoa, motivating future biogeographic investigation of this poorly known region.  相似文献   

13.
Considerable diversity abounds among sponges with respect to reproductive and developmental biology. Their ancestral sexual mode (gonochorism vs. hermaphroditism) and reproductive condition (oviparity vs. viviparity) however remain unclear, and these traits appear to have undergone correlated evolution in the phylum. To infer ancestral traits and investigate this putative correlation, we used DNA sequence data from two loci (18S ribosomal RNA and cytochrome c oxidase subunit I) to explore the phylogenetic relationships of 62 sponges whose reproductive traits have been previously documented. Although the inferred tree topologies, using the limited data available, favoured paraphyly of sponges, we also investigated ancestral character‐state reconstruction on a phylogeny with constrained sponge monophyly. Both parsimony‐ and likelihood‐based ancestral state reconstructions indicate that viviparity (brooding) was the likely reproductive mode of the ancestral sponge. Hermaphroditism is favoured over gonochorism as the sexual condition of the sponge ancestor under parsimony, but the reconstruction is ambiguous under likelihood, rendering the ancestry of sexuality unresolved in our study. These results are insensitive to the constraint of sponge monophyly when tracing the reproductive characters using parsimony methods. However, the maximum likelihood analysis of the monophyletic hypothetical tree rendered gonochorism as ancestral for the phylum. A test of trait correlation unambiguously favours the concerted evolution of sexuality and reproductive mode in sponges (hermaphroditism/viviparity, gonochorism/oviparity). Although testing ecological hypotheses for the pattern of sponge reproduction is beyond the scope of our analyses, we postulate that certain physiological constrains might be key causes for the correlation of reproductive characters.  相似文献   

14.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

15.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

16.
Summary Goodman et al.'s (1974) populous path algorithm for estimating hidden mutational change in protein evolution is designed to be used as an adjunct to the maximum parsimony method. When the algorithm is so used, the augmented maximum parsimony distances, far from being overestimates, are underestimates of the actual number of nucleotide substitutions which occur in Tateno and Nei's (1978) computer simulation by the Poisson process model, even when the simulation is carried out at two and a half times the sequence density. Although underestimates, our evidence shows that they are nevertheless more accurate than estimates obtained by a Poisson correction. In the maximum parsimony reconstruction, there is a bias towards overrepresenting the number of shared nucleotide identities between adjacent ancestral and descendant nodal sequences with the bias being stronger in those portions of the evolutionary tree sparser in sequence data. Because of this particular property of maximum parsimony reconstructed sequences, the conclusions of Tateno and Nei concerning the statistical properties of the populous path algorithm are invalid. We conclude that estimates of protein evolutionary rates by the maximum parsimony - populous path approach will become more accurate rather than less as larger numbers of closely related species are included in the analysis.  相似文献   

17.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

18.
Liu K  Warnow T 《PloS one》2012,7(3):e33104
The standard approach to phylogeny estimation uses two phases, in which the first phase produces an alignment on a set of homologous sequences, and the second phase estimates a tree on the multiple sequence alignment. POY, a method which seeks a tree/alignment pair minimizing the total treelength, is the most widely used alternative to this two-phase approach. The topological accuracy of trees computed under treelength optimization is, however, controversial. In particular, one study showed that treelength optimization using simple gap penalties produced poor trees and alignments, and suggested the possibility that if POY were used with an affine gap penalty, it might be able to be competitive with the best two-phase methods. In this paper we report on a study addressing this possibility. We present a new heuristic for treelength, called BeeTLe (Better Treelength), that is guaranteed to produce trees at least as short as POY. We then use this heuristic to analyze a large number of simulated and biological datasets, and compare the resultant trees and alignments to those produced using POY and also maximum likelihood (ML) and maximum parsimony (MP) trees computed on a number of alignments. In general, we find that trees produced by BeeTLe are shorter and more topologically accurate than POY trees, but that neither POY nor BeeTLe produces trees as topologically accurate as ML trees produced on standard alignments. These findings, taken as a whole, suggest that treelength optimization is not as good an approach to phylogenetic tree estimation as maximum likelihood based upon good alignment methods.  相似文献   

19.
Theories of ecological diversification make predictions about the timing and ordering of character state changes through history. These theories are testable by “reconstructing” ancestor states using phylogenetic trees and measurements of contemporary species. Here we use maximum likelihood to estimate and evaluate the accuracy of ancestor reconstructions. We present likelihoods of discrete ancestor states and derive probability distributions for continuous ancestral traits. The methods are applied to several examples: diets of ancestral Darwin's finches; origin of inquilinism in gall wasps; microhabitat partitioning and body size evolution in scrubwrens; digestive enzyme evolution in artiodactyl mammals; origin of a sexually selected male trait, the sword, in platies and swordtails; and evolution of specialization in Anolis lizards. When changes between discrete character states are rare, the maximum-likelihood results are similar to parsimony estimates. In this case the accuracy of estimates is often high, with the exception of some nodes deep in the tree. If change is frequent then reconstructions are highly uncertain, especially of distant ancestors. Ancestor states for continuous traits are typically highly uncertain. We conclude that measures of uncertainty are useful and should always be provided, despite simplistic assumptions about the probabilistic models that underlie them. If uncertainty is too high, reconstruction should be abandoned in favor of approaches that fit different models of trait evolution to species data and phylogenetic trees, taking into account the range of ancestor states permitted by the data.  相似文献   

20.
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号