首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

2.
This article generalizes previous models for codon substitution and rate variation in molecular phylogeny. Particular attention is paid to (1) reversibility, (2) acceptance and rejection of proposed codon changes, (3) varying rates of evolution among codon sites, and (4) the interaction of these sites in determining evolutionary rates. To accommodate spatial variation in rates, Markov random fields rather than Markov chains are introduced. Because these innovations complicate maximum likelihood estimation in phylogeny reconstruction, it is necessary to formulate new algorithms for the evaluation of the likelihood and its derivatives with respect to the underlying kinetic, acceptance, and spatial parameters. To derive the most from maximum likelihood analysis of sequence data, it is useful to compute posterior probabilities assigning residues to internal nodes and evolutionary rate classes to codon sites. It is also helpful to search through tree space in a way that respects accepted phylogenetic relationships. Our phylogeny program LINNAEUS implements algorithms realizing these goals. Readers may consult our companion article in this issue for several examples.  相似文献   

3.
Distance-based methods for phylogeny reconstruction are the fastest and easiest to use, and their popularity is accordingly high. They are also the only known methods that can cope with huge datasets of thousands of sequences. These methods rely on evolutionary distance estimation and are sensitive to errors in such estimations. In this study, a novel Bayesian method for estimation of evolutionary distances is developed. The proposed method enables the use of a sophisticated evolutionary model that better accounts for among-site rate variation (ASRV), thereby improving the accuracy of distance estimation. Rate variations are estimated within a Bayesian framework by extracting information from the entire dataset of sequences, unlike standard methods that can only use one pair of sequences at a time. We compare the accuracy of a cascade of distance estimation methods, starting from commonly used methods and moving towards the more sophisticated novel method. Simulation studies show significant improvements in the accuracy of distance estimation by the novel method over the commonly used ones. We demonstrate the effect of the improved accuracy on tree reconstruction using both real and simulated protein sequence alignments. An implementation of this method is available as part of the SEMPHY package.  相似文献   

4.
The study of which life history traits primarily affect molecular evolutionary rates is often confounded by the covariance of these traits. Scombroid fishes (billfishes, tunas, barracudas, and their relatives) are unusual in that their mass-specific metabolic rate is positively associated with body size. This study exploits this atypical pattern of trait variation, which allows for direct tests of whether mass-specific metabolic rate or body size is the more important factor of molecular evolutionary rates. We inferred a phylogeny for scombroids from a supermatrix of molecular and morphological characters and used new phylogenetic comparative approaches to assess the associations of body size and mass-specific metabolic rate with substitution rate. As predicted by the body size hypothesis, there is a negative correlation between body size and substitution rate. However, unexpectedly, we also find a negative association between mass-specific metabolic and substitution rates. These relationships are supported by analyses of the total molecular data, separate mitochondrial and nuclear genes, and individual loci, and they are robust to phylogenetic uncertainty. The molecular evolutionary rates of scombroids are primarily tied to body size. This study demonstrates that groups with novel patterns of trait variation can be particularly informative for identifying which life history traits are the primary factors of molecular evolutionary rates.  相似文献   

5.
Many evolutionary processes can lead to a change in the correlation between continuous characters over time or on different branches of a phylogenetic tree. Shifts in genetic or functional constraint, in the selective regime, or in some combination thereof can influence both the evolution of continuous traits and their relation to each other. These changes can often be mapped on a phylogenetic tree to examine their influence on multivariate phenotypic diversification. We propose a new likelihood method to fit multiple evolutionary rate matrices (also called evolutionary variance–covariance matrices) to species data for two or more continuous characters and a phylogeny. The evolutionary rate matrix is a matrix containing the evolutionary rates for individual characters on its diagonal, and the covariances between characters (of which the evolutionary correlations are a function) elsewhere. To illustrate our approach, we apply the method to an empirical dataset consisting of two features of feeding morphology sampled from 28 centrarchid fish species, as well as to data generated via phylogenetic numerical simulations. We find that the method has appropriate type I error, power, and parameter estimation. The approach presented herein is the first to allow for the explicit testing of how and when the evolutionary covariances between characters have changed in the history of a group.  相似文献   

6.
7.
Using real sequence data, we evaluate the adequacy of assumptions made in evolutionary models of nucleotide substitution and the effects that these assumptions have on estimation of evolutionary trees. Two aspects of the assumptions are evaluated. The first concerns the pattern of nucleotide substitution, including equilibrium base frequencies and the transition/transversion-rate ratio. The second concerns the variation of substitution rates over sites. The maximum-likelihood estimate of tree topology appears quite robust to both these aspects of the assumptions of the models, but evaluation of the reliability of the estimated tree by using simpler, less realistic models can be misleading. Branch lengths are underestimated when simpler models of substitution are used, but the underestimation caused by ignoring rate variation over nucleotide sites is much more serious. The goodness of fit of a model is reduced by ignoring spatial rate variation, but unrealistic assumptions about the pattern of nucleotide substitution can lead to an extraordinary reduction in the likelihood. It seems that evolutionary biologists can obtain accurate estimates of certain evolutionary parameters even with an incorrect phylogeny, while systematists cannot get the right tree with confidence even when a realistic, and more complex, model of evolution is assumed.   相似文献   

8.
We developed a simulation model of phylogenesis with which we generated a large number of phylogenies and associated data matrices. We examined the characteristics of these and evaluated the success of three taxonomic methods (Wagner parsimony, character compatibility, and UPGMA clustering) as estimators of phylogeny, paying particular attention to the consequences of changes in certain evolutionary assumptions: relative rate of evolution in three different evolutionary contexts (phyletic, parent lineage, and daughter lineage); relative rate of evolution in different directions (novel forward, convergent forward, or reverse); variation of evolutionary rates; and topology of the phylogenetic tree. Except for variation of evolutionary rates, all the evolutionary parameters that we controlled had significant effects on accuracy of phylogenetic reconstructions. Unexpectedly, the topology of the phylogeny was the most important single factor affecting accuracy; some phylogenies are more readily estimated than others for simply historical reasons. We conclude that none of the three estimation methods is very accurate, that the differences in accuracy among them are rather small, and that historical effects (the branching pattern of a phylogeny) may outweigh biological effects in determining the accuracy with which a phylogeny can be reconstructed.  相似文献   

9.
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation.  相似文献   

10.
Comparative analysis is a potentially powerful approach to study the effects of ecological traits on genetic variation and rate of evolution across species. However, the lack of suitable datasets means that comparative studies of correlates of genetic traits across an entire clade have been rare. Here, we use a large DNA-barcode dataset (5062 sequences) of water beetles to test the effects of species ecology and geographical distribution on genetic variation within species and rates of molecular evolution across species. We investigated species traits predicted to influence their genetic characteristics, such as surrogate measures of species population size, latitudinal distribution and habitat types, taking phylogeny into account. Genetic variation of cytochrome oxidase I in water beetles was positively correlated with occupancy (numbers of sites of species presence) and negatively with latitude, whereas substitution rates across species depended mainly on habitat types, and running water specialists had the highest rate. These results are consistent with theoretical predictions from nearly-neutral theories of evolution, and suggest that the comparative analysis using large databases can give insights into correlates of genetic variation and molecular evolution.  相似文献   

11.
Elevated substitution rates estimated from ancient DNA sequences   总被引:1,自引:0,他引:1  
Ancient DNA sequences are able to offer valuable insights into molecular evolutionary processes, which are not directly accessible via modern DNA. They are particularly suitable for the estimation of substitution rates because their ages provide calibrating information in phylogenetic analyses, circumventing the difficult task of choosing independent calibration points. The substitution rates obtained from such datasets have typically been high, falling between the rates estimated from pedigrees and species phylogenies. Many of these estimates have been made using a Bayesian phylogenetic method that explicitly accommodates heterochronous data. Stimulated by recent criticism of this method, we present a comprehensive simulation study that validates its performance. For datasets of moderate size, it produces accurate estimates of rates, while appearing robust to assumptions about demographic history. We then analyse a large collection of 749 ancient and 727 modern DNA sequences from 19 species of animals, plants and bacteria. Our new estimates confirm that the substitution rates estimated from ancient DNA sequences are elevated above long-term phylogenetic levels.  相似文献   

12.
Much of the recent progress in understanding angiosperm phylogeny has been achieved using multi-gene or plastid genome datasets. However, it is largely unclear what size of dataset is required to achieve suffi-cient resolution. The ycf2 gene is the largest plastid gene in angiosperms and it was used as part of multigene datasets in several earlier investigations into angiosperm relationships. In this study, we show that the ycf2 gene alone can provide a generally well-supported phylogeny that is consistent with those inferred from the most comprehensive multigene or plastid genome datasets. The phylogenetic signal of the ycf2 gene is likely de-rived from the combination of its long sequence length and low rate of nucleotide substitution. The ycf2 gene may provide a low-cost alternative to comprehensive multigene or genome datasets for investigating angiosperm relationships.  相似文献   

13.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

14.
Evolutionary biologists since Darwin have been fascinated by differences in the rate of trait-evolutionary change across lineages. Despite this continued interest, we still lack methods for identifying shifts in evolutionary rates on the growing tree of life while accommodating uncertainty in the evolutionary process. Here we introduce a Bayesian approach for identifying complex patterns in the evolution of continuous traits. The method (auteur) uses reversible-jump Markov chain Monte Carlo sampling to more fully characterize the complexity of trait evolution, considering models that range in complexity from those with a single global rate to potentially ones in which each branch in the tree has its own independent rate. This newly introduced approach performs well in recovering simulated rate shifts and simulated rates for datasets nearing the size typical for comparative phylogenetic study (i.e., ≥64 tips). Analysis of two large empirical datasets of vertebrate body size reveal overwhelming support for multiple-rate models of evolution, and we observe exceptionally high rates of body-size evolution in a group of emydid turtles relative to their evolutionary background. auteur will facilitate identification of exceptional evolutionary dynamics, essential to the study of both adaptive radiation and stasis.  相似文献   

15.
Fossil taxa are critical to inferences of historical diversity and the origins of modern biodiversity, but realizing their evolutionary significance is contingent on restoring fossil species to their correct position within the tree of life. For most fossil species, morphology is the only source of data for phylogenetic inference; this has traditionally been analysed using parsimony, the predominance of which is currently challenged by the development of probabilistic models that achieve greater phylogenetic accuracy. Here, based on simulated and empirical datasets, we explore the relative efficacy of competing phylogenetic methods in terms of clade support. We characterize clade support using bootstrapping for parsimony and Maximum Likelihood, and intrinsic Bayesian posterior probabilities, collapsing branches that exhibit less than 50% support. Ignoring node support, Bayesian inference is the most accurate method in estimating the tree used to simulate the data. After assessing clade support, Bayesian and Maximum Likelihood exhibit comparable levels of accuracy, and parsimony remains the least accurate method. However, Maximum Likelihood is less precise than Bayesian phylogeny estimation, and Bayesian inference recaptures more correct nodes with higher support compared to all other methods, including Maximum Likelihood. We assess the effects of these findings on empirical phylogenies. Our results indicate probabilistic methods should be favoured over parsimony.  相似文献   

16.
Rates of recombination vary considerably between species. Despite the significance of this observation for evolutionary biology and genetics, the evolutionary mechanisms that contribute to these interspecific differences are unclear. On fine physical scales, recombination rates appear to evolve rapidly between closely related species, but the mode and tempo of recombination rate evolution on the broader scale is poorly understood. Here, we use phylogenetic comparative methods to begin to characterize the evolutionary processes underlying average genomic recombination rates in mammals. We document a strong phylogenetic effect in recombination rates, indicating that more closely related species tend to have more similar average rates of recombination. We demonstrate that this phylogenetic signal is not an artifact of errors in recombination rate estimation and show that it is robust to uncertainty in the mammalian phylogeny. Neutral evolutionary models present good fits to the data and we find no evidence for heterogeneity in the rate of evolution in recombination across the mammalian tree. These results suggest that observed interspecific variation in average genomic rates of recombination is largely attributable to the steady accumulation of neutral mutations over evolutionary time. Although single recombination hotspots may live and die on short evolutionary time scales, the strong phylogenetic signal in genomic recombination rates indicates that the pace of evolution on this scale may be considerably slower.  相似文献   

17.
Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer confidence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the efficiency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wide-ranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 protein-coding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required approximately 80% less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and the paraphyly of "S. undulatus" with respect to S. belli, S. cautus, and S. woodi. Parallel evolution of ecomorphs within "S. undulatus" has masked the actual number of species within this group. This evidence, along with convincing patterns of phylogeographic differentiation suggests "S. undulatus" represents at least four lineages that should be recognized as evolutionary species.  相似文献   

18.

Background  

The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species.  相似文献   

19.
Likelihood methods for detecting temporal shifts in diversification rates   总被引:8,自引:0,他引:8  
Maximum likelihood is a potentially powerful approach for investigating the tempo of diversification using molecular phylogenetic data. Likelihood methods distinguish between rate-constant and rate-variable models of diversification by fitting birth-death models to phylogenetic data. Because model selection in this context is a test of the null hypothesis that diversification rates have been constant over time, strategies for selecting best-fit models must minimize Type I error rates while retaining power to detect rate variation when it is present. Here I examine model selection, parameter estimation, and power to reject the null hypothesis using likelihood models based on the birth-death process. The Akaike information criterion (AIC) has often been used to select among diversification models; however, I find that selecting models based on the lowest AIC score leads to a dramatic inflation of the Type I error rate. When appropriately corrected to reduce Type I error rates, the birth-death likelihood approach performs as well or better than the widely used gamma statistic, at least when diversification rates have shifted abruptly over time. Analyses of datasets simulated under a range of rate-variable diversification scenarios indicate that the birth-death likelihood method has much greater power to detect variation in diversification rates when extinction is present. Furthermore, this method appears to be the only approach available that can distinguish between a temporal increase in diversification rates and a rate-constant model with nonzero extinction. I illustrate use of the method by analyzing a published phylogeny for Australian agamid lizards.  相似文献   

20.
《Genomics》2020,112(5):2970-2977
Here we determined mitogenomes of three Bostrichiformia species. These data were combined with 51 previously sequenced Polyphaga mitogenomes to explore the higher-level relationships within Polyphaga by using four different mitogenomic datasets and three tree inference approaches. Among Polyphaga mitogenomes we observed heterogeneity in nucleotide composition and evolutionary rates, which may have affected phylogenetic inferences across the different mitogenomic datasets. Elateriformia, Cucujiformia, and Scarabaeiformia were each inferred to be monophyletic by all analyses, as was Bostrichiformia by most analyses based on two datasets with low heterogeneity. The large series Staphyliniformia was never recovered as monophyletic in our analyses. The Bayesian tree using a degenerated nucleotide dataset (P123_Degen) and a site-heterogeneous mixture model in PhyloBayes was supported as the best Polyphaga phylogeny: (Scirtiformia, (Elateriformia, ((Bostrichiformia, Cucujiformia), (Scarabaeiformia + Staphyliniformia)))). For Cucujiformia, the largest series, we inferred a superfamily-level phylogeny: ((Cleroidea, Coccinelloidea), (Tenebrionoidea, (Cucujoidea + Curculionoidea + Chrysomeloidea))).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号