首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We develop a reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models. Using a proper, yet nontruncated and uninformative prior, we demonstrate the advantages of the Bayesian approach to hypothesis testing and estimation in phylogenetics by comparing different models for the infinitesimal rates of change among nucleotides, for the number of rate classes, and for the relationships among branch lengths. We compare the relative probabilities of these models and the appropriateness of a molecular clock using Bayes factors. Our most general model, first proposed by Tamura and Nei, parameterizes the infinitesimal change probabilities among nucleotides (A, G, C, T/U) into six parameters, consisting of three parameters for the nucleotide stationary distribution, two rate parameters for nucleotide transitions, and another parameter for nucleotide transversions. Nested models include the Hasegawa, Kishino, and Yano model with equal transition rates and the Kimura model with a uniform stationary distribution and equal transition rates. To illustrate our methods, we examine simulated data, 16S rRNA sequences from 15 contemporary eubacteria, halobacteria, eocytes, and eukaryotes, 9 primates, and the entire HIV genome of 11 isolates. We find that the Kimura model is too restrictive, that the Hasegawa, Kishino, and Yano model can be rejected for some data sets, that there is evidence for more than one rate class and a molecular clock among similar taxa, and that a molecular clock can be rejected for more distantly related taxa.  相似文献   

2.
We investigated the tempo and mode of evolution of the primate T-lymphotropic viruses (PTLVs). Several different models of nucleotide substitution were tested on a general phylogenetic tree obtained using the 20 full-genome HTLV/STLV sequences available. The likelihood ratio test showed that the Tamura and Nei model with discrete gamma-distributed rates among sites is the best-fitting substitution model. The heterogeneity of nucleotide substitution rates along the PTLV genome was further investigated for different genes and at different codon positions (cdp's). Tests of rate constancy showed that different PTLV lineages evolve at different rates when first and second cdp's are considered, but the molecular-clock hypothesis holds for some PTLV lineages when the third cdp is used. Negative selection was evident throughout the genome. However, in the gp46 region, a small fragment subjected to positive selection was identified using a Monte Carlo simulation based on a likelihood method. Employing correlations of the virus divergence times with anthropologically documented migrations of their host, a possible timescale was estimated for each important node of the PTLV tree. The obtained results on these slow-evolving viruses could be used to fill gaps in the historical records of some of the host species. In particular, the HTLV-I/STLV-I history might suggest a simian migration from Asia to Africa not much earlier than 19,500-60,000 years ago.  相似文献   

3.
The general Markov model (GMM) of nucleotide substitution does not assume the evolutionary process to be stationary, reversible, or homogeneous. The GMM can be simplified by assuming the evolutionary process to be stationary. A stationary GMM is appropriate for analyses of phylogenetic data sets that are compositionally homogeneous; a data set is considered to be compositionally homogeneous if a statistical test does not detect significant differences in the marginal distributions of the sequences. Though the general time-reversible (GTR) model assumes stationarity, it also assumes reversibility and homogeneity. We propose two new stationary and nonhomogeneous models--one constrains the GMM to be reversible, whereas the other does not. The two models, coupled with the GTR model, comprise a set of nested models that can be used to test the assumptions of reversibility and homogeneity for stationary processes. The two models are extended to incorporate invariable sites and used to analyze a seven-taxon hominoid data set that displays compositional homogeneity. We show that within the class of stationary models, a nonhomogeneous model fits the hominoid data better than the GTR model. We note that if one considers a wider set of models that are not constrained to be stationary, then an even better fit can be obtained for the hominoid data. However, the methods for reducing model complexity from an extremely large set of nonstationary models are yet to be developed.  相似文献   

4.

Background  

Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters.  相似文献   

5.
The blind use of models of nucleotide substitution in evolutionary analyses is a common practice in the viral community. Typically, a simple model of evolution like the Kimura two-parameter model is used for estimating genetic distances and phylogenies, either because other authors have used it or because it is the default in various phylogenetic packages. Using two statistical approaches to model fitting, hierarchical likelihood ratio tests and the Akaike information criterion, we show that different viral data sets are better explained by different models of evolution. We demonstrate our results with the analysis of HIV-1 sequences from a hierarchy of samples; sequences within individuals, individuals within subtypes, and subtypes within groups. We also examine results for three different gene regions: gag, pol, and env. The Kimura two-parameter model was not selected as the best-fit model for any of these data sets, despite its widespread use in phylogenetic analyses of HIV-1 sequences. Furthermore, the model complexity increased with increasing sequence divergence. Finally, the molecular-clock hypothesis was rejected in most of the data sets analyzed, throwing into question clock-based estimates of divergence times for HIV-1. The importance of models in evolutionary analyses and their repercussions on the derived conclusions are discussed.  相似文献   

6.
The Notothenioidei dominates the fish fauna of the Antarctic in both biomass and diversity. This clade exhibits adaptations related to metabolic function and freezing avoidance in the subzero Antarctic waters, and is characterized by a high degree of morphological and ecological diversity. Investigating the macroevolutionary processes that may have contributed to the radiation of notothenioid fishes requires a well-resolved phylogenetic hypothesis. To date published molecular and morphological hypotheses of notothenioids are largely congruent, however, there are some areas of significant disagreement regarding higher-level relationships. Also, there are critical areas of the notothenioid phylogeny that are unresolved in both molecular and morphological phylogenetic analyses. Previous molecular phylogenetic analyses of notothenioids using partial mtDNA 12S and 16S rRNA sequence data have resulted in limited phylogenetic resolution and relatively low node support. One particularly controversial result from these analyses is the paraphyly of the Nototheniidae, the most diverse family in the Notothenioidei. It is unclear if the phylogenetic results from the 12S and 16S partial gene sequence dataset are due to limited character sampling, or if they reflect patterns of evolutionary diversification in notothenioids. We sequenced the complete mtDNA 16S rRNA gene for 43 notothenioid species, the largest sampling to-date from all eight taxonomically recognized families. Phylogenetic analyses using both maximum parsimony and maximum likelihood resulted in well-resolved trees with most nodes supported with high bootstrap pseudoreplicate scores and significant Bayesian posterior probabilities. In all analyses the Nototheniidae was monophyletic. Shimodaira–Hasegawa tests were able to reject two hypotheses that resulted from prior morphological analyses. However, despite substantial resolution and node support in the 16S rRNA trees, several phylogenetic hypotheses among closely related species and clades were not rejected. The inability to reject particular hypotheses among species in apical clades is likely due to the lower rate of nucleotide substitution in mtDNA rRNA genes relative to protein coding regions. Nevertheless, with the most extensive notothenioid taxon sampling to date, and the much greater phylogenetic resolution offered by the complete 16S rRNA sequences over the commonly used partial 12S and 16S gene dataset, it would be advantageous for future molecular investigations of notothenioid phylogenetics to utilize at the minimum the complete gene 16S rRNA dataset.  相似文献   

7.
Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.  相似文献   

8.
Here we present a model of nucleotide substitution in protein-coding regions that also encode the formation of conserved RNA structures. In such regions, apparent evolutionary context dependencies exist, both between nucleotides occupying the same codon and between nucleotides forming a base pair in the RNA structure. The overlap of these fundamental dependencies is sufficient to cause "contagious" context dependencies which cascade across many nucleotide sites. Such large-scale dependencies challenge the use of traditional phylogenetic models in evolutionary inference because they explicitly assume evolutionary independence between short nucleotide tuples. In our model we address this by replacing context dependencies within codons by annotation-specific heterogeneity in the substitution process. Through a general procedure, we fragment the alignment into sets of short nucleotide tuples based on both the protein coding and the structural annotation. These individual tuples are assumed to evolve independently, and the different tuple sets are assigned different annotation-specific substitution models shared between their members. This allows us to build a composite model of the substitution process from components of traditional phylogenetic models. We applied this to a data set of full-genome sequences from the hepatitis C virus where five RNA structures are mapped within the coding region. This allowed us to partition the effects of selection on different structural elements and to test various hypotheses concerning the relation of these effects. Of particular interest, we found evidence of a functional role of loop and bulge regions, as these were shown to evolve according to a different and more constrained selective regime than the nonpairing regions outside the RNA structures. Other potential applications of the model include comparative RNA structure prediction in coding regions and RNA virus phylogenetics.  相似文献   

9.
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.  相似文献   

10.
Tempo and mode of synonymous substitutions in mitochondrial DNA of primates   总被引:3,自引:1,他引:2  
Nucleotide substitutions of the four-fold degenerate sites and the total third codon positions of mitochondrial DNA from human, common chimpanzee, bonobo, gorilla, and orangutan were examined in detail by three alternative Markov models; (1) Hasegawa, Kishino, and Yano's (1985) model, (2) Tamura and Nei's (1993) model, and (3) the general reversible Markov model. These sites are expected to be relatively free from constraint, and therefore their tempo and mode in evolution should reflect those of mutation. It turned out that, among the alternative models, the general reversible Markov model best approximates the nucleotide substitutions of the four-fold degenerate sites and the total third codon positions, while the maximum likelihood estimates of the numbers of nucleotide substitutions along each branch do not differ significantly among the three models. It was further shown that the transition rate of these sites during evolution, and therefore transitional mutation rate of mtDNA, are higher in humans than in chimpanzees and gorillas probably by about two times. However, transversional mutation rate and amino acid substitution rate do not differ significantly between humans and the African apes. These and additional observations suggest heterogeneity of the mutation rate as well as of the constraint operating on the mtDNA-encoded proteins among different lineages of Hominoidea.   相似文献   

11.
Markov models describing the evolution of the nucleotide substitution process, widely used in phylogeny reconstruction, usually assume the hypotheses of stationarity and time reversibility. Although these models give meaningful results when applied to biological data, it is not clear if the 2 assumptions mentioned above hold and, if not, how much sequence evolution processes deviate from them. To this aim, we introduce 2 sets of indices that can be calculated from the nucleotide distribution and the substitution rates. The stationarity indices (STIs) can be used to test the validity of the equilibrium assumption. The irreversibility indices (IRIs) are derived from the Kolmogorov cycle conditions for time reversibility and quantify the degree of nontime reversibility of a process. We have computed STIs and IRIs for the evolutionary process of 2 lineages, Drosophila simulans and Homo sapiens. In the latter case, we use a modified form of the indices that takes into account the CpG decay process. In both cases, we find statistically significant deviations from the ideal case of a process that has reached stationarity and is time reversible.  相似文献   

12.
The evolutionary patterns of hepatitis C virus (HCV), including the best-fitting nucleotide substitution model and the molecular clock hypothesis, were investigated by analyzing full-genome sequences available in the HCV database. The likelihood ratio test allowed us to discriminate among different evolutionary hypotheses. The phylogeny of the six major HCV types was accurately inferred, and the final tree was rooted by reconstructing the hypothetical HCV common ancestor with the maximum likelihood method. The presence of phylogenetic noise and the relative nucleotide substitution rates in the different HCV genes were also examined. These results offer a general guideline for the future of HCV phylogenetic analysis and also provide important insights on HCV origin and evolution. Received: 13 January 2001 / Accepted: 21 June 2001  相似文献   

13.
When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models. Correspondence to: M. Nei  相似文献   

14.
Recent molecular studies have incorporated the parametric bootstrap method to test a priori hypotheses when the results of molecular based phylogenies are in conflict with these hypotheses. The parametric bootstrap requires the specification of a particular substitutional model, the parameters of which will be used to generate simulated, replicate DNA sequence data sets. It has been both suggested that, (a) the method appears robust to changes in the model of evolution, and alternatively that, (b) as realistic model of DNA substitution as possible should be used to avoid false rejection of a null hypothesis. Here we empirically evaluate the effect of suboptimal substitution models when testing hypotheses of monophyly with the parametric bootstrap using data sets of mtDNA cytochrome oxidase I and II (COI and COII) sequences for Macaronesian Calathus beetles, and mitochondrial 16S rDNA and nuclear ITS2 sequences for European Timarcha beetles. Whether a particular hypothesis of monophyly is rejected or accepted appears to be highly dependent on whether the nucleotide substitution model being used is optimal. It appears that a parameter rich model is either equally or less likely to reject a hypothesis of monophyly where the optimal model is unknown. A comparison of the performance of the Kishino–Hasegawa (KH) test shows it is not as severely affected by the use of suboptimal models, and overall it appears to be a less conservative method with a higher rate of failure to reject null hypotheses.  相似文献   

15.
We develop here an analytical evolution model based on a dinucleotide mutation matrix 16 x 16 with six substitution parameters associated with the three types of substitutions in the two dinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4. It determines at some time t the exact occurrence probabilities of dinucleotides mutating randomly according to these six substitution parameters. Furthermore, several properties and two applications of this model allow to derive 16 evolutionary analytical solutions of dinucleotides and also a dinucleotide phylogenetic distance. Finally, based on this mathematical model, the SED (Stochastic Evolution of Dinucleotides) web server has been developed for deriving evolutionary analytical solutions of dinucleotides.  相似文献   

16.
Simplifying assumptions made in various tree reconstruction methods-- notably rate constancy among nucleotide sites, homogeneity, and stationarity of the substitutional processes--are clearly violated when nucleotide sequences are used to infer distant relationships. Use of tree reconstruction methods based on such oversimplified assumptions can lead to misleading results, as pointed out by previous authors. In this paper, we made use of a (discretized) gamma distribution to account for variable rates of substitution among sites and built models that allowed for unequal base frequencies in different sequences. The models were nonhomogeneous Markov-process models, assuming different patterns of substitution in different parts of the tree. Data of the small-subunit rRNAs from four species were analyzed, where base frequencies were quite different among sequences and rates of substitution were highly variable at sites. Parameters in the models were estimated by maximum likelihood, and models were compared by the likelihood-ratio test. The nonhomogeneous models provided significantly better fit to the data than homogeneous models despite their involvement of many parameters. They also appeared to produce reasonable estimation of the phylogenetic tree; in particular, they seemed able to identify the root of the tree.   相似文献   

17.
18.
A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restriction-site changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneous inferences about evolutionary changes of restriction sites unless the number of nucleotide substitutions per site is less than 0.01 for all branches of the tree. This introduces a systematic error in estimating the number of mutational changes for each branch and, consequently, in constructing phylogenetic trees. Therefore, parsimony methods should be used only in cases where nucleotide sequences are closely related. Reexamination of Ferris et al.'s data on restriction-site differences of mitochondrial DNAs does not support Templeton's conclusions regarding the phylogenetic tree for man and apes and the molecular clock hypothesis. Templeton's claim that Nei and Li's method of estimating the number of nucleotide substitutions per site is seriously affected by parallel losses and loss-gains of restriction sites is also unsupported.   相似文献   

19.
The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made.  相似文献   

20.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号