首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Unbiased estimation of the rates of synonymous and nonsynonymous substitution   总被引:39,自引:0,他引:39  
Summary The current convention in estimating the number of substitutions per synonymous site (K S ) and per nonsynonymous site (K A ) between two protein-coding genes is to count each twofold degenerate site as one-third synonymous and two-thirds nonsynonymous because one of the three possible changes at such a site is synonymous and the other two are nonsynonymous. This counting rule can considerably overestimate theK S value because transitional mutations tend to occur more often than transversional mutations and because most transitional mutations at twofold degenerate sites are synonymous. A new method that gives unbiased estimates is proposed. An application of the new and the old method to 14 pairs of mouse and rat genes shows that the new method gives aK S value very close to the number of substitutions per fourfold degenerate site whereas the old method gives a value 30% higher. Both methods give aK A value close to the number of substitutions per nondegenerate site.  相似文献   

2.
R Nielsen  D M Weinreich 《Genetics》1999,153(1):497-506
McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.  相似文献   

3.
Using mammalian gene sequences, the variances in the numbers of synonymous and nonsynonymous substitutions among genes were estimated together with the correlation coefficient between the two. The expected correlation coefficient can be obtained under the neutral theory using these estimated values of the variances. The expected coefficient is found to often be one-half to two-thirds of the observed value. Possible causes for the disagreement were discussed, such as correlated selective constraints on the two types of substitutions and excess doublet mutations. The variance of mutation rate and that of selective constraint were also estimated. The results show that the coefficient of variation of the former is 0.2–0.3, whereas that of the latter is 0.7–0.9. Correspondence to: T. Ohta  相似文献   

4.
BACKGROUND: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. METHODS: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. RESULTS/CONCLUSIONS: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.  相似文献   

5.
Bartolomé C  Maside X  Yi S  Grant AL  Charlesworth B 《Genetics》2005,169(3):1495-1507
We have investigated patterns of within-species polymorphism and between-species divergence for synonymous and nonsynonymous variants at a set of autosomal and X-linked loci of Drosophila miranda. D. pseudoobscura and D. affinis were used for the between-species comparisons. The results suggest the action of purifying selection on nonsynonymous, polymorphic variants. Among synonymous polymorphisms, there is a significant excess of synonymous mutations from preferred to unpreferred codons and of GC to AT mutations. There was no excess of GC to AT mutations among polymorphisms at noncoding sites. This suggests that selection is acting to maintain the use of preferred codons. Indirect evidence suggests that biased gene conversion in favor of GC base pairs may also be operating. The joint intensity of selection and biased gene conversion, in terms of the product of effective population size and the sum of the selection and conversion coefficients, was estimated to be approximately 0.65.  相似文献   

6.
Keightley PD  Halligan DL 《Genetics》2011,188(4):931-940
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.  相似文献   

7.
New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kimura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. The following results were obtained: (1) The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions. The major cause for the biased estimation is that these three methods underestimate the number of synonymous sites and overestimate the number of nonsynonymous sites. (2) The PBL method gives better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. (3) The new methods also give better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. In addition, estimates of the numbers of synonymous and nonsynonymous sites obtained by the new methods are reasonably accurate. (4) In some cases, the new methods and the PBL method give biased estimates of substitution numbers. However, from the number of nucleotide substitutions at the third position of codons, we can examine whether estimates obtained by the new methods are good or not, whereas we cannot make an examination of estimates obtained by the PBL method. (5) When there are strong transition/transversion and nucleotide-frequency biases like mitochondrial genes, all of the above methods give biased estimates of substitution numbers. In such cases, Kondo et al.'s method is recommended to be used for estimating the number of synonymous substitutions, although their method cannot estimate the number of nonsynonymous substitutions and is time-consuming. These results, particularly result (1), call for reexaminations of some genes. This is because evolutionary pictures of genes have often been discussed on the basis of results obtained by the NG, MY, and LWL methods, which are favorable for the neutral theory of molecular evolution.  相似文献   

8.
Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's and by Li et al.'s methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large.   相似文献   

9.
Nei and Gojobori (1986) developed a simple method to estimate the numbers of synonymous (ds) and nonsynonymous (dN) substitutions per site. In the present paper, we have developed a method for computing variances and covariances of ds's and dN's and of the proportions of synonymous (ps) and nonsynonymous (pN) differences. We also have developed a method for computing the variances of mean dS, dN, pS, pN, without constructing a phylogenetic tree of the genes. We have conducted computer simulations based on simple evolutionary models and have shown that the new method gives good estimates of variances and covariances.   相似文献   

10.
We estimated the intensity of selection on preferred codons in Drosophila pseudoobscura and D. miranda at X-linked and autosomal loci, using a published data set on sequence variability at 67 loci, by means of an improved method that takes account of demographic effects. We found evidence for stronger selection at X-linked loci, consistent with their higher levels of codon usage bias. The estimates of the strength of selection and mutational bias in favor of unpreferred codons were similar to those found in other species, after taking into account the fact that D. pseudoobscura showed evidence for a recent expansion in population size. We examined correlates of synonymous and nonsynonymous diversity in these species and found no evidence for effects of recurrent selective sweeps on nonsynonymous mutations, which is probably because this set of genes have much higher than average levels of selective constraints. There was evidence for correlated effects of levels of selective constraints on protein sequences and on codon usage, as expected under models of selection for translational accuracy. Our analysis of a published data set on D. melanogaster provided evidence for the effects of selective sweeps of nonsynonymous mutations on linked synonymous diversity, but only in the subset of loci that experienced the highest rates of nonsynonymous substitutions (about one-quarter of the total) and not at more slowly evolving loci. Our correlational analysis of this data set suggested that both selective constraints on protein sequences and recurrent selective sweeps affect the overall level of codon usage.  相似文献   

11.
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.  相似文献   

12.
Comparison of numbers of synonymous and nonsynonymous substitutions is useful for understanding mechanisms of molecular evolution. In this paper, I examine the statistical properties of six methods of estimating numbers of synonymous and nonsynonymous substitutions. The six methods are Miyata and Yasunaga’s (MY) method; Nei and Gojobori’s (NG) method; Li, Wu and Luo’s (LWL) method; Pamilo, Bianchi and Li’s (PBL) method; and Ina’s (Ina) two methods. When the transition/transversion bias at the mutation level is strong, the numbers of synonymous and nonsynonymous substitutions are estimated more accurately by the PBL and Ina methods than by the NG, MY and LWL methods. When the nucleotide-frequency bias is strong and distantly related sequences are compared, all the six methods give underestimates of the number of synonymous substitutions. The concept of synonymous and nonsynonymous categories is also useful for analysis of DNA polymorphism data.  相似文献   

13.
A method for estimating the numbers of synonymous (Ks) and nonsynonymous (Ka) substitutions per site is proposed. The method is based on the Li's (J Mol. Evol. 36:96–99, 1993) and Pamilo and Bianchi's (Mol. Biol. Evol. 10:271–281, 1993) method, but a putative source of bias is solved. It is proposed that the number of synonymous substitutions that are actually transitions or transversions should be computed by separating the twofold degenerate sites into two types of sites, 2S-fold and 2V-fold, where only transitional and transversional substitutions are synonymous, respectively. Kimura's (J. Mol. Evol. 16:111–120, 1980) two-parameter correcting method for multiple substitutions at a site is then applied using the overall observed synonymous transversion frequency to estimate both the numbers of synonymous transversional (Bs) and transitional (As) substitutions per site. This approach, therefore, also minimizes stochastic errors. Computer simulations indicate that the method presented gives more accurate Ks and Ka estimates than the aforementioned methods. Furthermore, the obtention of confidence intervals for divergence estimates by computer simulation is proposed.  相似文献   

14.
The proportion of synonymous nucleotide differences per synonymous site (p(S)) and the proportion of nonsynonymous differences per nonsynonymous site (p(N)) were computed at 1,993,217 individual codons in 4,133 protein-coding genes between the two yeast species Saccharomyces cerevisiae and Saccharomyces paradoxus. When the modified Nei-Gojobori method was used, significantly more codons with p(N) > p(S) were observed than expected, based on random pairing of observed p(S) and p(N) values. However, this finding was most likely explained by the presence of a strong negative correlation between the number of synonymous differences and the number of nonsynonymous differences at codons with at least one difference. As a result of this correlation, codons with p(N) > p(S) were characterized not only by unusually high p(N) but also by unusually low p(S). On the other hand, the number of codons with p(N)>p(S) (where p(S) is the mean p(S) for all codons) was very similar to the random expectation, and the observed number of 30-codon windows with p(N) > p(S) was significantly lower than the random expectation. These results imply that the occurrence of a certain number of codons or codon windows with p(N) > p(S) is expected given the nature of nucleotide substitution and need not imply the action of positive Darwinian selection.  相似文献   

15.
N G Smith  L D Hurst 《Genetics》1999,153(3):1395-1402
Nonsynonymous substitutions in DNA cause amino acid substitutions while synonymous substitutions in DNA leave amino acids unchanged. The cause of the correlation between the substitution rates at nonsynonymous (K(A)) and synonymous (K(S)) sites in mammals is a contentious issue, and one that impacts on many aspects of molecular evolution. Here we use a large set of orthologous mammalian genes to investigate the causes of the K(A)-K(S) correlation in rodents. The strength of the K(A)-K(S) correlation exceeds the neutral theory expectation when substitution rates are estimated using algorithmic methods, but not when substitution rates are estimated by maximum likelihood. Irrespective of this methodological uncertainty the strength of the K(A)-K(S) correlation appears mostly due to tandem substitutions, an excess of which is generated by substitutional nonindependence. Doublet mutations cannot explain the excess of tandem synonymous-nonsynonymous substitutions, and substitution patterns indicate that selection on silent sites is the likely cause. We find no evidence for selection on codon usage. The nature of the relationship between synonymous divergence and base composition is unclear because we find a significant correlation if we use maximum-likelihood methods but not if we use algorithmic methods. Finally, we find that K(S) is reduced at the start of genes, which suggests that selection for RNA structure may affect silent sites in mammalian protein-coding genes.  相似文献   

16.
Three frequently used methods for estimating the synonymous and nonsynonymous substitution rates (Ks and Ka) were evaluated and compared for their accuracies; these methods are denoted by LWL85, LPB93, and GY94, respectively. For this purpose, we used a codon-evolution model to obtain the expected Ka and Ks values for the above three methods and compared the values with those obtained by the three methods. We also proposed some modifications of LWL85 and LPB93 to increase their accuracies. Our computer simulations under the codon-evolution model showed that for sequences < or =300 codons, the performance of GY94 may not be reliable. For longer sequences, GY94 is more accurate for estimating the Ka/Ks ratio than the modified LPB93 and LWL85 in the majority of the cases studied. This is particularly so when k > or = 3, which is the transition/transversion (mutation) rate ratio. However, when k is approximately 2 and when the sequence divergence is relatively large, the modified LWL85 performed better than GY94 and the modified LPB93. The inferiority of LPB93 to LWL85 is surprising because LPB93 was intended to improve LWL85. Also, it has been thought that the codon-based method of GY94 is better than the heuristic method of LWL85, but our simulation results showed that in many cases, the opposite was true, even though our simulation was based on the codon-evolution model.  相似文献   

17.
18.
Montane species endemic to the “sky islands” of the North American southwest were significantly impacted by changing climates during the Pleistocene. We combined mitochondrial and genomic data with species distribution modelling to determine whether Aphonopelma marxi, a large tarantula from the nearby Colorado Plateau, was similarly impacted by glacial climates. Genetic analyses revealed that the species comprises three main clades that diverged in the Pleistocene. A clade distributed along the Mogollon Rim appears to have persisted in place during glacial conditions, whereas the other two clades probably colonized central and northeastern portions of the species' range from refugia in canyons. Climate models support this hypothesis for the Mogollon Rim, but late glacial climate data appear too coarse to detect suitable areas in canyons. Locations of canyon refugia could not be inferred from genomic analyses due to missing data, encouraging us to explore the effect of missing loci in phylogeographical inferences using RADseq. Results from analyses with varying amounts of missing data suggest that samples with large amounts of missing data can still improve inferences, and the specific loci that are missing matters more than the number of missing loci. This study highlights the profound impact of Pleistocene climates on tarantulas endemic to the Colorado Plateau, as well as the mixed nature of the region's fauna. Some animals recently colonized from nearby deserts as glacial climates receded, whereas others, like tarantulas, appear to have persisted on the Mogollon Rim and in refugia associated with the region's famous river‐cut canyons.  相似文献   

19.
Bierne N  Eyre-Walker A 《Genetics》2003,165(3):1587-1597
Most methods for estimating the rate of synonymous and nonsynonymous substitution per site define a site as a mutational opportunity: the proportion of sites that are synonymous is equal to the proportion of mutations that would be synonymous under the model of evolution being considered. Here we demonstrate that this definition of a site can give misleading results and that a physical definition of site should be used in some circumstances. We illustrate our point by reexamining the relationship between codon usage bias and the synonymous substitution rate. It has recently been shown that the rate of synonymous substitution, calculated using the Goldman-Yang method, which encapsulates the mutational-opportunity definition of a site at a high level of sophistication, is either positively correlated or uncorrelated to synonymous codon bias in Drosophila. Using other methods, which account for synonymous codon bias but define a site physically, we show that there is a negative correlation between the synonymous substitution rate and codon bias and that the lack of a negative correlation using the Goldman-Yang method is due to the way in which the number of synonymous sites is counted. We also show that there is a positive correlation between the synonymous substitution rate and third position GC content in mammals, but that the relationship is considerably weaker than that obtained using the Goldman-Yang method. We argue that the Goldman-Yang method is misleading in this context and conclude that methods that rely on a mutational-opportunity definition of a site should be used with caution.  相似文献   

20.
As we identify more and more genetic changes, either through mutation studies or population screens, we need powerful tools to study their potential molecular effects. With these tools, we can begin to understand the contributions of genetic variations to the wide range of human phenotypes. We used our catalogue of molecular changes in patients with carbamyl phosphate synthetase I (CPSI) deficiency to develop such a system for use in eukaryotic cells. We developed the tools and methods for rapidly modifying bacterial artificial chromosomes (BACs) for eukaryotic episomal replication, marker expression, and selection and then applied this protocol to a BAC containing the entire CPSI gene. Although this CPSI BAC construct was suitable for studying nonsynonymous mutations, potential splicing defects, and promoter variations, our focus was on studying potential splicing and RNA-processing defects to validate this system. In this article, we describe the construction of this system and subsequently examine the mechanism of four putative splicing mutations in patients deficient in CPSI. Using this model, we also demonstrate the reversible role of nonsense-mediated decay in all four mutations, using small interfering RNA knockdown of hUPF2. Furthermore, we were able to locate cryptic splicing sites for the two intronic mutations. This BAC-based system permits expression studies in the absence of patient RNA or tissues with relevant gene expression and provides experimental flexibility not available in genomic DNA or plasmid constructs. Our splicing and RNA degradation data demonstrate the advantages of using whole-gene constructs to study the effects of sequence variation on gene expression and function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号