共查询到20条相似文献,搜索用时 15 毫秒
1.
Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable mixture model, which assumes that a proportion of sites in the sequence are completely resistant to change, while substitution rates at the remaining sites are gamma-distributed. For data sampled at the intraspecific level, however, biological assumptions involved in the invariable-sites model are commonly violated. We examined the use of these models in analyses of five intraspecific data sets. We show that using 6–10 rate categories for the discrete gamma distribution of rates among sites is sufficient to provide a good approximation of the marginal likelihood. Increasing the number of gamma rate categories did not have a substantial effect on estimates of the substitution rate or coalescence time, unless rates varied strongly among sites in a non-gamma-distributed manner. The assumption of a proportion of invariable sites provided a better approximation of the asymptotic marginal likelihood when the number of gamma categories was small, but had minimal impact on estimates of rates and coalescence times. However, the estimated proportion of invariable sites was highly susceptible to changes in the number of gamma rate categories. The concurrent use of gamma and invariable-site models for intraspecific data is not biologically meaningful and has been challenged on statistical grounds; here we have found that the assumption of a proportion of invariable sites has no obvious impact on Bayesian estimates of rates and timescales from intraspecific data. 相似文献
2.
The hepatitis B virus (HBV) has a circular DNA genome of about 3,200 base pairs. Economical use of the genome with overlapping reading frames may have led to severe constraints on nucleotide substitutions along the genome and to highly variable rates of substitution among nucleotide sites. Nucleotide sequences from 13 complete HBV genomes were compared to examine such variability of substitution rates among sites and to examine the phylogenetic relationships among the HBV variants. The maximum likelihood method was employed to fit models of DNA sequence evolution that can account for the complexity of the pattern of nucleotide substitution. Comparison of the models suggests that the rates of substitution are different in different genes and codon positions; for example, the third codon position changes at a rate over ten times higher than the second position. Furthermore, substantial variation of substitution rates was detected even after the effects of genes and codon positions were corrected; that is, rates are different at different sites of the same gene or at the same codon position. Such rates after the correction were also found to be positively correlated at adjacent sites, which indicated the existence of conserved and variable domains in the proteins encoded by the viral genome. A multiparameter model validates the earlier finding that the variation in nucleotide conservation is not random around the HBV genome. The test for the existence of a molecular clock suggests that substitution rates are more or less constant among lineages. The phylogenetic relationships among the viral variants were examined. Although the data do not seem to contain sufficient information to resolve the details of the phylogeny, it appears quite certain that the serotypes of the viral variants do not reflect their genetic relatedness.
Correspondence to: Z. Yang 相似文献
3.
Rogers JS 《Systematic biology》2001,50(5):713-722
Maximum likelihood estimation of phylogenetic trees from nucleotide sequences is completely consistent when nucleotide substitution is governed by the general time reversible (GTR) model with rates that vary over sites according to the invariable sites plus gamma (I + gamma) distribution. 相似文献
4.
Models of amino acid substitution and applications to mitochondrial protein evolution 总被引:28,自引:20,他引:8
Models of amino acid substitution were developed and compared using maximum
likelihood. Two kinds of models are considered. "Empirical" models do not
explicitly consider factors that shape protein evolution, but attempt to
summarize the substitution pattern from large quantities of real data.
"Mechanistic" models are formulated at the codon level and separate
mutational biases at the nucleotide level from selective constraints at the
amino acid level. They account for features of sequence evolution, such as
transition-transversion bias and base or codon frequency biases, and make
use of physicochemical distances between amino acids to specify
nonsynonymous substitution rates. A general approach is presented that
transforms a Markov model of codon substitution into a model of amino acid
replacement. Protein sequences from the entire mitochondrial genomes of 20
mammalian species were analyzed using different models. The mechanistic
models were found to fit the data better than empirical models derived from
large databases. Both the mutational distance between amino acids
(determined by the genetic code and mutational biases such as the
transition-transversion bias) and the physicochemical distance are found to
have strong effects on amino acid substitution rates. A significant
proportion of amino acid substitutions appeared to have involved more than
one codon position, indicating that nucleotide substitutions at neighboring
sites may be correlated. Rates of amino acid substitution were found to be
highly variable among sites.
相似文献
5.
Hyracoids have been allied with either perissodactyls or tethytheres (i.e., Proboscidea + Sirenia) based on morphological data. The latter hypothesis, termed Paenungulata, is corroborated by numerous molecular studies. However, molecular studies have failed to support Tethytheria, a group that is supported by morphological data. We examined relationships among living paenungulate orders using a multigene data set that included sequences from four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes (aquaporin, A2AB, IRBP, vWF). Nineteen maximum-likelihood models were employed, including models with process partitions for base composition and substitution parameterizations. With the inclusion of partitions with a heterogeneous base composition, 18 of 19 models favored Hyracoidea + Sirenia. All 19 models favored Hyracoidea + Sirenia after excluding heterogeneous base composition partitions. Most of the support for Hyracoidea + Sirenia derived from the mitochondrial genes (bootstrap support ranged from 51 to 99%); Tethytheria, in turn, received 0 to 19% support in different analyses. Bootstrap support deriving from the nuclear genes was more evenly split among the competing hypotheses (3 to 45% for Tethytheria; 17.5 to 62% for Hyracoidea + Sirenia). Lineage-specific rate variation among both mitochondrial and nuclear genes may contribute to the different results that were obtained with mitochondrial versus nuclear data. Whether Tethytheria or a competing hypothesis is correct, short internodes on the molecular phylogenies suggest that paenungulate orders diverged from each other over a 5- to 8-million-year time window extending from the late Paleocene into the early Eocene. We also used likelihood-ratio tests to compare different models of sequence evolution. A gamma distribution of rates results in a greater improvement in likelihood scores than does an allowance for invariant sites. Twenty-one rate partitions corresponding to stems, loops, and codon positions of different genes result in higher likelihood scores than a gamma distribution of rates and/or an allowance for invariant sites. Process partitions of the data that incorporate base composition and substitution parameterizations result in significant improvements in likelihood scores in comparison to models that allow only for relative rate differences among partitions. 相似文献
6.
Z. Yang 《Genetics》1995,139(2):993-1005
We describe a model for the evolution of DNA sequences by nucleotide substitution, whereby nucleotide sites in the sequence evolve over time, whereas the rates of substitution are variable and correlated over sites. The temporal process used to describe substitutions between nucleotides is a continuous-time Markov process, with the four nucleotides as the states. The spatial process used to describe variation and dependence of substitution rates over sites is based on a serially correlated gamma distribution, i.e., an auto-gamma model assuming Markov-dependence of rates at adjacent sites. To achieve computational efficiency, we use several equal-probability categories to approximate the gamma distribution, and the result is an auto-discrete-gamma model for rates over sites. Correlation of rates at sites then is modeled by the Markov chain transition of rates at adjacent sites from one rate category to another, the states of the chain being the rate categories. Two versions of nonparametric models, which place no restrictions on the distributional forms of rates for sites, also are considered, assuming either independence or Markov dependence. The models are applied to data of a segment of mitochondrial genome from nine primate species. Model parameters are estimated by the maximum likelihood method, and models are compared by the likelihood ratio test. Tremendous variation of rates among sites in the sequence is revealed by the analyses, and when rate differences for different codon positions are appropriately accounted for in the models, substitution rates at adjacent sites are found to be strongly (positively) correlated. Robustness of the results to uncertainty of the phylogenetic tree linking the species is examined. 相似文献
7.
The relative rates of nucleotide substitution at synonymous and nonsynonymous sites within protein-coding regions have been widely used to infer the action of natural selection from comparative sequence data. It is known, however, that mutational and repair biases can affect rates of evolution at both synonymous and nonsynonymous sites. More importantly, it is also known that synonymous sites are particularly prone to the effects of nucleotide bias. This means that nucleotide biases may affect the calculated ratio of substitution rates at synonymous and nonsynonymous sites. Using a large data set of animal mitochondrial sequences, we demonstrate that this is, in fact, the case. Highly biased nucleotide sequences are characterized by significantly elevated dN/dS ratios, but only when the nucleotide frequencies are not taken into account. When the analysis is repeated taking the nucleotide frequencies at each codon position into account, such elevated ratios disappear. These results suggest that the recently reported differences in dN/dS ratios between vertebrate and invertebrate mitochondrial sequences could be explained by variations in mitochondrial nucleotide frequencies rather than the effects of positive Darwinian selection. 相似文献
8.
Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites 总被引:13,自引:6,他引:7
We propose two approximate methods (one based on parsimony and one on
pairwise sequence comparison) for estimating the pattern of nucleotide
substitution and a parsimony-based method for estimating the gamma
parameter for variable substitution rates among sites. The matrix of
substitution rates that represents the substitution pattern can be
recovered through its relationship with the observable matrix of site
pattern frequences in pairwise sequence comparisons. In the parsimony
approach, the ancestral sequences reconstructed by the parsimony algorithm
were used, and the two sequences compared are those at the ends of a branch
in the phylogenetic tree. The method for estimating the gamma parameter was
based on a reinterpretation of the numbers of changes at sites inferred by
parsimony. Three data sets were analyzed to examine the utility of the
approximate methods compared with the more reliable likelihood methods. The
new methods for estimating the substitution pattern were found to produce
estimates quite similar to those obtained from the likelihood analyses. The
new method for estimating the gamma parameter was effective in reducing the
bias in conventional parsimony estimates, although it also overestimated
the parameter. The approximate methods are computationally very fast and
appear useful for analyzing large data sets, for which use of the
likelihood method requires excessive computation.
相似文献
9.
Ziheng Yang 《Journal of molecular evolution》1996,42(5):587-596
Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of
multiple genes) from the same set of species. The models account for different aspects of the heterogeneity in the evolutionary
process of different genes, such as differences in nucleotide frequencies, in substitution rate bias (for example, the transition/transversion
rate bias), and in the extent of rate variation across sites. Model parameters were estimated by maximum likelihood and the
likelihood ratio test was used to test hypotheses concerning sequence evolution, such as rate constancy among lineages (the
assumption of a molecular clock) and proportionality of branch lengths for different genes. The example data from a segment
of the mitochondrial genome of six hominoid species (human, common and pygmy chimpanzees, gorilla, orangutan, and siamang)
were analyzed. Nucleotides at the three codon positions in the protein-coding regions and from the tRNA-coding regions were
considered heterogeneous data sets. Statistical tests showed that the amount of evolution in the sequence data reflected in
the estimated branch lengths can be explained by the codon-position effect and lineage effect of substitution rates. The assumption
of a molecular clock could not be rejected when the data were analyzed separately or when the rate variation among sites was
ignored. However, significant differences in substitution rate among lineages were found when the data sets were combined
and when the rate variation among sites was accounted for in the models. Under the assumption that the orangutan and African
apes diverged 13 million years ago, the combined analysis of the sequence data estimated the times for the human-chimpanzee
separation and for the separation of the gorilla as 4.3 and 6.8 million years ago, respectively. 相似文献
10.
Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites 总被引:23,自引:9,他引:14
This paper presents a maximum likelihood approach to estimating the
variation of substitution rate among nucleotide sites. We assume that the
rate varies among sites according to an invariant+gamma distribution, which
has two parameters: the gamma parameter alpha and the proportion of
invariable sites theta. Theoretical treatments on three, four, and five
sequences have been conducted, and computer program have been developed. It
is shown that rho = (1 + theta alpha)/(1 + alpha) is a good measure for the
rate heterogeneity among sites. Extensive simulations show that (1) if the
proportion of invariable sites is negligible, i.e., theta = 0, the gamma
parameter alpha can be satisfactorily estimated, even with three sequences;
(2) if the proportion of invariable sites is not negligible, the
heterogeneity rho can still be suitably estimated with four or more
sequences; and (3) the distances estimated by the proposed method are
almost unbiased and are robust against violation of the assumption of the
invariant + gamma distribution.
相似文献
11.
Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA 总被引:19,自引:0,他引:19
John Wakeley 《Journal of molecular evolution》1993,37(6):613-623
More than an order of magnitude difference in substitution rate exists among sites within hypervariable region 1 of the control region of human mitochondrial DNA. A two-rate Poisson mixture and a negative binomial distribution are used to describe the distribution of the inferred number of changes per nucleotide site in this region. When three data sets are pooled, however, the two-rate model cannot explain the data. The negative binomial distribution always fits, suggesting that substitution rates are approximately gamma distributed among sites. Simulations presented here provide support for the use of a biased, yet commonly employed, method of examining rate variation. The use of parsimony in the method to infer the number of changes at each site introduces systematic errors into the analysis. These errors preclude an unbiased quantification of variation in substitution rate but make the method conservative overall. The method can be used to distinguish sites with highly elevated rates, and 29 such sites are identified in hypervariable region 1. Variation does not appear to be clustered within this region. Simulations show that biases in rates of substitution among nucleotides and non-uniform base composition can mimic the effects of variation in rate among sites. However, these factors contribute little to the levels of rate variation observed in hypervariable region 1. 相似文献
12.
A. Rzhetsky 《Genetics》1995,141(2):771-783
A model is introduced describing nucleotide substitution in ribosomal RNA (rRNA) genes. In this model, substitution in the stem and loop regions of rRNA is modeled with 16- and four-state continuous time Markov chains, respectively. The mean substitution rates at nucleotide sites are assumed to follow gamma distributions that are different for the two types of regions. The simplest formulation of the model allows for explicit expressions for transition probabilities of the Markov processes to be found. These expressions were used to analyze several 16S-like rRNA genes from higher eukaryotes with the maximum likelihood method. Although the observed proportion of invariable sites was only slightly higher in the stem regions, the estimated average substitution rates in the stem regions were almost two times as high as in the loop regions. Therefore, the degree of site heterogeneity of substitution rates in the stem regions seems to be higher than in the loop regions of animal 16S-like rRNAs due to presence of a few rapidly evolving sites. The model appears to be helpful in understanding the regularities of nucleotide substitution in rRNAs and probably minimizing errors in recovering phylogeny for distantly related taxa from these genes. 相似文献
13.
Zhang W Bouffard GG Wallace SS Bond JP;NISC Comparative Sequencing Program 《Journal of molecular evolution》2007,65(3):207-214
It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences. 相似文献
14.
Simplifying assumptions made in various tree reconstruction methods--
notably rate constancy among nucleotide sites, homogeneity, and
stationarity of the substitutional processes--are clearly violated when
nucleotide sequences are used to infer distant relationships. Use of tree
reconstruction methods based on such oversimplified assumptions can lead to
misleading results, as pointed out by previous authors. In this paper, we
made use of a (discretized) gamma distribution to account for variable
rates of substitution among sites and built models that allowed for unequal
base frequencies in different sequences. The models were nonhomogeneous
Markov-process models, assuming different patterns of substitution in
different parts of the tree. Data of the small-subunit rRNAs from four
species were analyzed, where base frequencies were quite different among
sequences and rates of substitution were highly variable at sites.
Parameters in the models were estimated by maximum likelihood, and models
were compared by the likelihood-ratio test. The nonhomogeneous models
provided significantly better fit to the data than homogeneous models
despite their involvement of many parameters. They also appeared to produce
reasonable estimation of the phylogenetic tree; in particular, they seemed
able to identify the root of the tree.
相似文献
15.
Atsushi Katafuchi Akira Sassa Naoko Niimi Petr Grúz Hirofumi Fujimoto Chikahide Masutani Fumio Hanaoka Toshihiro Ohta Takehiko Nohmi 《Nucleic acids research》2010,38(3):859-867
Oxidized DNA precursors can cause mutagenesis and carcinogenesis when they are incorporated into the genome. Some human Y-family DNA polymerases (Pols) can effectively incorporate 8-oxo-dGTP, an oxidized form of dGTP, into a position opposite a template dA. This inappropriate G:A pairing may lead to transversions of A to C. To gain insight into the mechanisms underlying erroneous nucleotide incorporation, we changed amino acids in human Polη and Polκ proteins that might modulate their specificity for incorporating 8-oxo-dGTP into DNA. We found that Arg61 in Polη was crucial for erroneous nucleotide incorporation. When Arg61 was substituted with lysine (R61K), the ratio of pairing of dA to 8-oxo-dGTP compared to pairing of dC was reduced from 660:1 (wild-type Polη) to 7 : 1 (R61K). Similarly, Tyr112 in Polκ was crucial for erroneous nucleotide incorporation. When Tyr112 was substituted with alanine (Y112A), the ratio of pairing was reduced from 11: 1 (wild-type Polκ) to almost 1: 1 (Y112A). Interestingly, substitution at the corresponding position in Polη, i.e. Phe18 to alanine, did not alter the specificity. These results suggested that amino acids at distinct positions in the active sites of Polη and Polκ might enhance 8-oxo-dGTP to favor the syn conformation, and thus direct its misincorporation into DNA. 相似文献
16.
The Amount of DNA Polymorphism Maintained in a Finite Population When the Neutral Mutation Rate Varies among Sites 总被引:18,自引:4,他引:14
下载免费PDF全文
![点击此处可从《Genetics》网站下载免费的PDF全文](/ch/ext_images/free.gif)
F. Tajima 《Genetics》1996,143(3):1457-1465
The expectations of the average number of nucleotide differences per site (π), the proportion of segregating site (s), the minimum number of mutations per site (s*) and some other quantities were derived under the finite site models with and without rate variation among sites, where the finite site models include Jukes and Cantor's model, the equal-input model and Kimura's model. As a model of rate variation, the gamma distribution was used. The results indicate that if distribution parameter α is small, the effect of rate variation on these quantities are substantial, so that the estimates of θ based on the infinite site model are substantially underestimated, where θ = 4Nv, N is the effective population size and v is the mutation rate per site per generation. New methods for estimating θ are also presented, which are based on the finite site models with and without rate variation. Using these methods, underestimation can be corrected. 相似文献
17.
Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species 总被引:5,自引:0,他引:5
Spady TC Seehausen O Loew ER Jordan RC Kocher TD Carleton KL 《Molecular biology and evolution》2005,22(6):1412-1422
Cichlid fish inhabit a diverse range of environments that vary in the spectral content of light available for vision. These differences should result in adaptive selective pressure on the genes involved in visual sensitivity, the opsin genes. This study examines the evidence for differential adaptive molecular evolution in East African cichlid opsin genes due to gross differences in environmental light conditions. First, we characterize the selective regime experienced by cichlid opsin genes using a likelihood ratio test format, comparing likelihood models with different constraints on the relative rates of amino acid substitution, across sites. Second, we compare turbid and clear lineages to determine if there is evidence of differences in relative rates of substitution. Third, we present evidence of functional diversification and its relationship to the photic environment among cichlid opsin genes. We report statistical evidence of positive selection in all cichlid opsin genes, except short wavelength-sensitive 1 and short wavelength-sensitive 2b. In all genes predicted to be under positive selection, except short wavelength-sensitive 2a, we find differences in selective pressure between turbid and clear lineages. Potential spectral tuning sites are variable among all cichlid opsin genes; however, patterns of substitution consistent with photic environment-driven evolution of opsin genes are observed only for short wavelength-sensitive 1 opsin genes. This study identifies a number of promising candidate-tuning sites for future study by site-directed mutagenesis. This work also begins to demonstrate the molecular evolutionary dynamics of cichlid visual sensitivity and its relationship to the photic environment. 相似文献
18.
19.
Molecular divergence and phylogeny: rates and patterns of cytochrome b evolution in cranes 总被引:6,自引:0,他引:6
Analyses of complete cytochrome b sequences from all species of cranes
(Aves: Gruidae) reveal aspects of sequence evolution in the early stages of
divergence. These DNA sequences are > or = 89% identical, but expected
departures from random substitution are evident. Silent, third- position
pyrimidine transitions are the dominant substitution type, with
transversion comprising only a small fraction of sequence differences.
Substitution patterns are not clearly manifested until divergence has
reached a moderate level (> 3%), as expected for a stochastic process.
Variation in the frequency of mismatch types among lineages decreases at
larger divergences, but the level of bias does not decay. Divergence varies
up to fivefold among gene regions but is not correlated with structural
domain. All protein structural domains except extramembrane 4 display <
20% variable residues. Regions corresponding to putative functional domains
show the excepted conservation of amino acids, although the C-terminal
portion of the Q0 reaction center displays several nonconservative
replacements. Phylogenetic analyses incorporating substitution asymmetries
produced mixed results. Distances estimated with multiple parameters
(transition, codon-position, composition, and pyrimidine-transition biases)
yielded identical additive tree topologies with comparable bootstrap
values, all consistent with uncontroversial species relationships. Maximum
likelihood analysis incorporating these biases, as well as equally weighted
parsimony analysis, produced similar results. Static, differential
weighting for parsimony did not improve the phylogenetic signal but
produced unusual trees with low bootstraps. The overall rate of nucleotide
substitution varies slightly but significantly among cranes, and
calibration of distances against fossil dates suggests divergence rates of
0.7%-1.7% per million years.
相似文献
20.
A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes 总被引:78,自引:7,他引:71
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed. 相似文献