首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Unbiased estimation of the rates of synonymous and nonsynonymous substitution   总被引:39,自引:0,他引:39  
Summary The current convention in estimating the number of substitutions per synonymous site (K S ) and per nonsynonymous site (K A ) between two protein-coding genes is to count each twofold degenerate site as one-third synonymous and two-thirds nonsynonymous because one of the three possible changes at such a site is synonymous and the other two are nonsynonymous. This counting rule can considerably overestimate theK S value because transitional mutations tend to occur more often than transversional mutations and because most transitional mutations at twofold degenerate sites are synonymous. A new method that gives unbiased estimates is proposed. An application of the new and the old method to 14 pairs of mouse and rat genes shows that the new method gives aK S value very close to the number of substitutions per fourfold degenerate site whereas the old method gives a value 30% higher. Both methods give aK A value close to the number of substitutions per nondegenerate site.  相似文献   

2.
A model is presented for sequence evolution on the basis of which one can analyze combinations of noncoding, singly coding, and multiply coding regions of aligned homologous DNA sequences. It is a generalization of Kimura's (J. Mol. Evol. 16:111–120, 1980) and Li et al.'s (J. Mol. Evol. 36:96–99, 1985) transition-transversion models with selection on replacement substitutions.Based on a hierarchy of hypotheses, one will be able to estimate selection factors and transition and transversion distances for different combinations of regions ranging from many regions, each with their private set of parameters, to one set of parameters for all regions.The method is demonstrated on two aligned HIV I retroviruses. Correspondence to: J. Hein  相似文献   

3.
The kinetics of synonymous codon change and species divergence is described in a matrix formalism that is equally applicable to all levels of codon degeneracy and all levels of codon or nucleotide bias. Based on the formalism it is possible to calculate the sum of all the synonymous substitution rate constants from the observed sequence differences between two species. This sum, the relaxation rate, is equivalent to the LogDet transformation that has recently been proposed as a new measure of evolutionary distance (Lockhardt et al.Mol. Biol. Evol. 11(4): 605–612, 1994). The relationship between this measure and the average number of base changes per site (K) is discussed. The formalism is tested on some sets of simulated sequence divergence data.  相似文献   

4.
Molecular evolutionary analyses were carried out to elucidate the phylogenetic relationships, the evolutionary rate, and the divergence times of hepatitis C viruses. Using the nucleotide sequences of the viruses isolated from various locations in the world, we constructed phylogenetic trees. The trees showed that strains isolated from a single location were not necessarily clustered as a group. This suggests that the viruses may be transferred with blood on a worldwide scale. We estimated the evolutionary rates at synonymous and nonsynonymous sites for all genes in the viral genome. We then found that the rate (1.35 × 10–3 per site per year) at synonymous sites for the C gene was much smaller than those for the other genes (e.g., 6.29 × 10–3 per site per year for the E gene). This indicates that a special type of functional constraint on synonymous substitutions may exist in the C gene. Because we found an open reading frame (ORF) with the C gene region, the possibility exists that synonymous substitutions for the C gene are constrained by the overlapping ORF whose reading frame is different from that of the C gene. Applying the evolutionary rates to the trees, we also suggest that major groups of hepatitis C viruses diverged from their common ancestor several hundred years ago. Correspondence to: T. Gojobori  相似文献   

5.
Estimation of evolutionary distances between nucleotide sequences   总被引:11,自引:0,他引:11  
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.  相似文献   

6.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

7.
Summary Focusing on the synonymous substitution rate, we carried out detailed sequence analyses of hominoid mitochondrial (mt) DNAs of ca. 5-kb length. Owing to the outnumbered transitions and strong biases in the base compositions, synonymous substitutions in mtDNA reach rapidly a rather low saturation level. The extent of the compositional biases differs from gene to gene. Such changes in base compositions, even if small, can bring about considerable variation in observed synonymous differences and may result in the region-dependent estimate of the synonymous substitution rate. We demonstrate that such a region dependency is due to a failure to take proper account of heterogeneous compositional biases from gene to gene but that the actual synonymous substitution rate is rather uniform. The synonymous substitution rate thus estimated is 2.37 ± 0.11 × 10–8 per site per year and comparable to the overall rate for the noncoding region. On the other hand, the rate of nonsynonymous substitutions differs considerably from gene to gene, as expected under the neutral theory of molecular evolution. The lowest rate is 0.8 × 10–9 per site per year forCOI and the highest rate is 4.5 × 10–9 forATPase 8, the degree of functional constraints (measured by the ratio of the nonsynonymous to the synonymous substitution rate) being 0.03 and 0.19, respectively. Transfer RNA (tRNA) genes also show variability in the base contents and thus in the nucleotide differences. The average rate for 11 tRNAs contained in the 5-kb region is 3.9 × 10–9 per site per year. The nucleotide substitutions in the genome suggest that the transition rate is about 17 times faster than the transversion rate.  相似文献   

8.
Using mammalian gene sequences, the variances in the numbers of synonymous and nonsynonymous substitutions among genes were estimated together with the correlation coefficient between the two. The expected correlation coefficient can be obtained under the neutral theory using these estimated values of the variances. The expected coefficient is found to often be one-half to two-thirds of the observed value. Possible causes for the disagreement were discussed, such as correlated selective constraints on the two types of substitutions and excess doublet mutations. The variance of mutation rate and that of selective constraint were also estimated. The results show that the coefficient of variation of the former is 0.2–0.3, whereas that of the latter is 0.7–0.9. Correspondence to: T. Ohta  相似文献   

9.

Background  

Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades, several methods have been proposed but they have not considered unequal transitional substitutions (between the two purines, A and G, or the two pyrimidines, T and C) that become apparent when sequences data to be compared are vast and significantly diverged.  相似文献   

10.
Using basic probability theory, we show that there is a substantial likelihood that even in the presence of strong purifying selection, there will be a number of codons in which the number of synonymous nucleotide substitutions per site (d (S)) exceeds the number of non-synonymous nucleotide substitutions per site (d (N)). In an empirical study, we examined the numbers of synonymous (b (S)) and non-synonymous substitutions (b (N)) along branches of the phylogenies of 69 single-copy orthologous genes from seven species of mammals. A pattern of b (N) > b (S) was most commonly seen in the shortest branches of the tree and was associated with a high coefficient of variation in both b (N) and b (S), suggesting that high stochastic error in b (N) and b (S) on short branches, rather than positive Darwinian selection, is the explanation of most cases where b (N) is greater than b (S) on a given branch. The branch-site method of Zhang et al. (Zhang, Nielsen, Yang, Mol Biol Evol, 22:2472-2479, 2005) identified 117 codons on 35 branches as "positively selected," but a majority of these codons lacked synonymous substitutions, while in the others, synonymous and non-synonymous differences per site occurred in approximately equal frequencies. Thus, it was impossible to rule out the hypothesis that chance variation in the pattern of mutation across sites, rather than positive selection, accounted for the observed pattern. Our results showed that b (N)/b (S) was consistently elevated in immune system genes, but neither the search for branches with b (N) > b (S) nor the branch-site method revealed this trend.  相似文献   

11.
A general model for estimating the number of amino acid substitutions per site (d) from the fraction of identical residues between two sequences (q) is proposed. The well-known Poisson-correction formula q = e –d corresponds to a site-independent and amino-acid-independent substitution rate. Equation q = (1 – e –2d )/2d, derived for the case of substitution rates that are site-independent, but vary among amino acids, approximates closely the empirical method, suggested by Dayhoff et al. (1978). Equation q = 1/(1 + d) describes the case of substitution rates that are amino acid-independent but vary among sites. Lastly, equation q = [ln(1 + 2d)]/2d accounts for the general case where substitution rates can differ for both amino acids and sites.  相似文献   

12.
New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kimura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. The following results were obtained: (1) The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions. The major cause for the biased estimation is that these three methods underestimate the number of synonymous sites and overestimate the number of nonsynonymous sites. (2) The PBL method gives better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. (3) The new methods also give better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. In addition, estimates of the numbers of synonymous and nonsynonymous sites obtained by the new methods are reasonably accurate. (4) In some cases, the new methods and the PBL method give biased estimates of substitution numbers. However, from the number of nucleotide substitutions at the third position of codons, we can examine whether estimates obtained by the new methods are good or not, whereas we cannot make an examination of estimates obtained by the PBL method. (5) When there are strong transition/transversion and nucleotide-frequency biases like mitochondrial genes, all of the above methods give biased estimates of substitution numbers. In such cases, Kondo et al.'s method is recommended to be used for estimating the number of synonymous substitutions, although their method cannot estimate the number of nonsynonymous substitutions and is time-consuming. These results, particularly result (1), call for reexaminations of some genes. This is because evolutionary pictures of genes have often been discussed on the basis of results obtained by the NG, MY, and LWL methods, which are favorable for the neutral theory of molecular evolution.  相似文献   

13.
Twelve of 30 species examined in the ant genus Polyrhachis carry single nucleotide insertions at one or two positions within the mitochondrial cytochrome b (cytb) gene. Two of the sites are present in more than one species. Nucleotide substitutions in taxa carrying insertions show the strong codon position bias expected of functional protein coding genes, with substitutions concentrated in the third positions of the original reading frame. This pattern of evolution of the sequences strongly suggests that they are functional cytb sequences. This result is not the first report of +1 frameshift insertions in animal mitochondrial genes. A similar site was discovered in vertebrates, where single nucleotide frameshift insertions in many birds and a turtle were reported by Mindell et al. (Mol Biol Evol 15:1568, 1998). They hypothesized that the genes are correctly decoded by a programmed frameshift during translation. The discovery of four additional sites gives us the opportunity to look for common features that may explain how programmed frameshifts can arise. The common feature appears to be the presence of two consecutive rare codons at the insertion site. We hypothesize that the second of these codons is not efficiently translated, causing a pause in the translation process. During the stall the weak wobble pairing of the tRNA bound in the peptidyl site of the ribosome, together with an exact Watson–Crick codon–anticodon pairing in the +1 position, allows translation to continue in the +1 reading frame. The result of these events is an adequate level of translation of a full-length and fully functional protein. A model is presented for decoding of these mitochondrial genes, consistent with known features of programmed translational frameshifting in the yeast TY1 and TY3 retrotransposons.Reviewing Editor: Dr. W. Ford Doolittle  相似文献   

14.
Nucleotide sequences of the genome RNA encoding capsid protein VP1 (918 nucleotides) of 18 enterovirus 70 (EV70) isolates collected from various parts of the world in 1971 to 1981 were determined, and nucleotide substitutions among them were studied. The genetic distances between isolates were calculated by the pairwise comparison of nucleotide difference. Regression analysis of the genetic distances against time of isolation of the strains showed that the synonymous substitution rate was very high at 21.53 x 10(-3) substitution per nucleotide per year, while the nonsynonymous rate was extremely low at 0.32 x 10(-3) substitution per nucleotide per year. The rate estimated by the average value of synonymous and nonsynonymous substitutions (W.-H. Li, C.-C. Wu, and C.-C. Luo, Mol. Biol. Evol. 2:150-174, 1985) was 5.00 x 10(-3) substitution per nucleotide per year. Taking the average value of synonymous and nonsynonymous substitutions as genetic distances between isolates, the phylogenetic tree was inferred by the unweighted pairwise grouping method of arithmetic average and by the neighbor-joining method. The tree indicated that the virus had evolved from one focal place, and the time of emergence was estimated to be August 1967 +/- 15 months, 2 years before first recognition of the pandemic of acute hemorrhagic conjunctivitis. By superimposing every nucleotide substitution on the branches of the phylogenetic tree, we analyzed nucleotide substitution patterns of EV70 genome RNA. In synonymous substitutions, the proportion of transitions, i.e., C<==>U and G<==>A, was found to be extremely frequent in comparison with that reported on other viruses or pseudogenes. In addition, parallel substitutions (independent substitutions at the same nucleotide position on different branches, i.e., different isolates, of the tree) were frequently found in both synonymous and nonsynonymous substitutions. These frequent parallel substitutions and the low nonsynonymous substitution rate despite the very high synonymous substitution rate described above imply a strong restriction on nonsynonymous substitution sites of VP1, probably due to the requirement for maintaining the rigid icosahedral conformation of the virus.  相似文献   

15.
Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA (mtDNA) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account. Application of this method to human and chimpanzee data suggested that the transition/transversion ratio for the entire control region was approximately 15 and nearly the same for the two species. The 95% confidence interval of the age of the common ancestral mtDNA was estimated to be 80,000-480,000 years in humans and 0.57-2.72 Myr in common chimpanzees.   相似文献   

16.
Tempo and mode of synonymous substitutions in mitochondrial DNA of primates   总被引:3,自引:1,他引:2  
Nucleotide substitutions of the four-fold degenerate sites and the total third codon positions of mitochondrial DNA from human, common chimpanzee, bonobo, gorilla, and orangutan were examined in detail by three alternative Markov models; (1) Hasegawa, Kishino, and Yano's (1985) model, (2) Tamura and Nei's (1993) model, and (3) the general reversible Markov model. These sites are expected to be relatively free from constraint, and therefore their tempo and mode in evolution should reflect those of mutation. It turned out that, among the alternative models, the general reversible Markov model best approximates the nucleotide substitutions of the four-fold degenerate sites and the total third codon positions, while the maximum likelihood estimates of the numbers of nucleotide substitutions along each branch do not differ significantly among the three models. It was further shown that the transition rate of these sites during evolution, and therefore transitional mutation rate of mtDNA, are higher in humans than in chimpanzees and gorillas probably by about two times. However, transversional mutation rate and amino acid substitution rate do not differ significantly between humans and the African apes. These and additional observations suggest heterogeneity of the mutation rate as well as of the constraint operating on the mtDNA-encoded proteins among different lineages of Hominoidea.   相似文献   

17.
The Artemia hemoglobin is a dimer comprising two nine-domain covalent polymers in quaternary association. Each polymer is encoded by a gene representing nine successive globin domains which have different sequences and are presumed to have been copied originally from a single-domain gene. Two different polymers exist as the result of a complete duplication of the nine-domain gene, allowing the formation of either homodimers or the heterodimer. The total population size of 18 domains comprising nine corresponding pairs, coupled with the probability that they reflect several hundred million years of evolution in the same lineage, provides a unique model in which the process of gene multiplication can be analyzed. The outcome has important implications for the reliability of local molecular clocks. The two polymers differ from each other at 11.7% of amino acid sites; however when corresponding individual domains are compared between polymers, amino acid substitution fluctuates by a factor of 2.7-fold from lowest to highest. This variation is not obvious at the DNA level: Domain pair identity values fluctuate by 1.3-fold. Identity values are, however, uncorrected for multiple substitutions, and both silent and nonsilent changes are pooled. Therefore, to determine the variability in relative substitution rates at the DNA level, we have used the method of Li (1993, J Mol Evol 36:96–99) to determine estimates of nonsynonymous (K A ) and synonymous (K S ) substitutions per site for the nine pairs of domains. As expected, the overall level of silent substitutions (K S of 56.9%) far exceeded nonsilent substitutions (K A of 6.7%); however, for corresponding domain pairs, K A fluctuates by 2.3-fold and K S by 1.7-fold. The large discrepancies reflected in the expressed protein have accrued within a single lineage and the implication is that divergence dates of different genera based on amino acid sequences, even with well-studied proteins of reasonable size, can be wrong by a factor well in excess of 2. Received: 4 June 1997 / Accepted: 17 December 1997  相似文献   

18.
We examined the sequence of a 2214 base pair HindIII fragment from the mitochondrial genome of six rainbow trout. The fragment encodes four proteins and two tRNAs. Sequences for two fish from a single locality were identical. Those from separate localities differed by from 1 to 7 nucleotide substitutions. Of 13 variable sites, 12 were synonymous and 1 led to a conservative amino acid substitution. Transitions accounted for 12 of the 13 variants. In contrast to interspecific comparisons (Thomas and Beckenbach. 1989. J. Mol. Evol. 29: 233-245), the intraspecific divergence estimates based on sequence are less than those estimated from restriction fragment analysis, suggesting a complex, dynamic process for the accumulation of variation in the mitochondrial genome.  相似文献   

19.
Statistical methods for estimating genetic parentage are increasingly applied to accommodate limited marker polymorphism and the incomplete sampling of individuals. Neff et al. (2000a, Mol. Ecol. 9, 515–528; 2000b, Mol. Ecol. 9, 529–539) published a method (Pat) that estimates the proportion of next-generation individuals sired by a focal male, taking into account that the male may be genetically compatible, by random chance, with offspring that are not his own. Here we employ this method to reestimate paternity of 68 nest-guarding males from several fish species. The difference between the conventional exclusion-based estimate and Pat was >0.05 in only four of the 68 (5.9%) fish nests analyzed. An analytical formula shows that the difference between the two estimates is expected to be negligible if the focal male is consistent with a large proportion of the genotyped offspring, or if marker polymorphism is high. In addition, computer simulations illustrate how numbers of marker loci and their levels of genetic polymorphism, as well as the mating system of the organism under study, can influence estimates of paternity derived from exclusion-based estimates and Pat. Finally, we discuss various applications of these estimators including cases where additional biological information is present in the form of behavioral observations on parental care.  相似文献   

20.
Immunochemical cross-reactivity of protein variants has been very frequently used to map protein antigenic sites. The approach is based on the assumption that amino acid substitutions affecting the binding of a protein to its antibody, particularly when monoclonal antibodies (mAbs) are used, must be part of the antigenic site and not far from it. The assumption was investigated in this study by determining the effects of amino acid substitutions outside the antigenic site on the reactivity of six myglobin (Mb) variants with three mAbs of predetermined specificity prepared by immunization with a free synthetic peptide representing region 113–120 (antigenic site 4) of Mb. Two of the Mb variants used had no substitutions within residues 113–120 (the region to which the specificity of the mAbs is directed) and yet exhibited markedly decreased cross-reactions and binding affinities, relative to the reference antigen, sperm-whale Mb. The other three Mb variants possessed substitutions within, as well as outside, region 113–120 and showed very little cross-reactivities. The results of this study, particularly with the Mbs that have no substitutions within the indicated antigenic site, clearly show that substitutions outside the site, and which by design are not part of the site, can influence very markedly the reactivity of the protein variant with the anti-site mAbs. The approach can, therefore, lead to serious errors if used to identify residues of protein antigenic sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号