首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest—requiring approximately 100 selected sites—but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Lauren Meyers]  相似文献   

2.
为了分析鲨烯合酶(squalene synthase, SS)基因密码子的使用方式及其影响因素,利用codon W和SPSS 16.0软件对47条来自不同物种的SS基因进行多元统计分析、对应性分析.SS基因密码子1~3位碱基的GC含量(GC1, GC2和GC3)依次为51.33%、34.65%和54.37%,3个位点的GC含量均呈极显著相关关系(p<0.01),对应性分析的结果表明,第1轴显示30.71%的差异,有效密码子数和GC3、GC1和GC2的均值与GC3之间的相关性均达极显著水平(p<0.01).筛选出的26个最优密码子的第3位碱基均为G或C.以MEGA 5.0构建的基于SS蛋白质序列的进化树比基于RSCU的聚类更符合传统的系统发育观点.SS基因密码子偏好以G/C结尾,使用模式受选择和突变影响,突变对密码子偏好影响较大.  相似文献   

3.
It is well known that an amino acid can be encoded by more than one codon, called synonymous codons. The preferential use of one particular codon for coding an amino acid is referred to as codon usage bias (CUB). A quantitative analytical method, CUB and a related tool, Codon Adaptative Index have been applied to comparatively study whole genomes of a few pathogenic Trypanosomatid species. This quantitative attempt is of direct help in the comparison of qualitative features like mutational and translational selection. Pathogens of the Leishmania and Trypanosoma genus cause debilitating disease and suffering in human beings and animals. Of these, whole genome sequences are available for only five species. The complete coding sequences (CDS), highly expressed, essential and low expressed genes have all been studied for their CUB signature. The codon usage bias of essential genes and highly expressed genes show distribution similar to codon usage bias of all CDSs in Trypanosomatids. Translational selection is the dominant force selecting the preferred codon, and selection due to mutation is negligible. In contrast to an earlier study done on these pathogens, it is found in this work that CUB and CAI may be used to distinguish the Trypanosomatid genomes at the sub-genus level. Further, CUB may effectively be used as a signature of the species differentiation by using Principal Component Analysis (PCA).

Abbreviations

CUB - Codon Usage Bias, CAI - Codon Adaptative Index, CDS - Coding sequences, t-RNA - Transfer RNA, PCA - Principal Component Analysis.  相似文献   

4.
在基因组学水平上研究密码子使用偏性模式、成因并分析进化过程中的选择压力在基因组学研究中有重要意义。文章概述了目前提出的密码子使用偏性的量化方法及实现原理。目前研究发现:有些量化密码子偏性的方法受高表达基因参考数据集未完全注释的限制,不同密码子位置对变异和选择的影响不同,以及不同密码子位置处GC含量和嘌呤含量的贡献不同。由此展望密码子偏性量化方法发展方向为:需要设计不需要相关参考基因集合先验知识的密码子使用偏性量化方法;考虑不同位置处背景核苷酸组成的密码子使用偏性的量化方法;同时考虑基因表达水平的密码子使用偏性量化方法。最后,归纳了目前可用的密码子使用偏性的量化工具和数据库。  相似文献   

5.
Summary An analysis of 4680 codons expressed by pathogenic Entamoeba histolytica showed the A+U content of coding sequences to be 67%. The preference for A+U resulted in an unusual codon usage with an A+U content of 84% in the third codon position. The data show a remarkable similarity to those obtained for Plasmodium falciparum.  相似文献   

6.
In this study we reconstruct the evolution of codon usage bias in the chloroplast gene rbcL using a phylogeny of 92 green-plant taxa. We employ a measure of codon usage bias that accounts for chloroplast genomic nucleotide content, as an attempt to limit plausible explanations for patterns of codon bias evolution to selection- or drift-based processes. This measure uses maximum likelihood-ratio tests to compare the performance of two models, one in which a single codon is overrepresented and one in which two codons are overrepresented. The measure allowed us to analyze both the extent of bias in each lineage and the evolution of codon choice across the phylogeny. Despite predictions based primarily on the low G+C content of the chloroplast and the high functional importance of rbcL, we found large differences in the extent of bias, suggesting differential molecular selection that is clade specific. The seed plants and simple leafy liverworts each independently derived a low level of bias in rbcL, perhaps indicating relaxed selectional constraint on molecular changes in the gene. Overrepresentation of a single codon was typically plesiomorphic, and transitions to overrepresentation of two codons occurred commonly across the phylogeny, possibly indicating biochemical selection. The total codon bias in each taxon, when regressed against the total bias of each amino acid, suggested that twofold amino acids play a strong role in inflating the level of codon usage bias in rbcL, despite the fact that twofolds compose a minority of residues in this gene. Those amino acids that contributed most to the total codon usage bias of each taxon are known through amino acid knockout and replacement to be of high functional importance. This suggests that codon usage bias may be constrained by particular amino acids and, thus, may serve as a good predictor of what residues are most important for protein fitness. Present address (Joshua T. Herbeck): JBP Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543, USA  相似文献   

7.
Abstract The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 × 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.  相似文献   

8.
The correlation was shown between the length of introns and the codon usage of the coding sequences of the corresponding genes, which in some cases can be related to the level of gene expression. The link is positive in the unicellular organisms, i.e., genes with the longer introns show the higher bias of codon usage. It is most pronounced in baker's yeast, where it is definitely related to the level of gene expression—genes with the higher level of expression have the longer introns. The correlation is inverted in multicellular organisms as compared to unicellular ones. Some organisms, however, do not show the link. The presence or absence of the link does not seem to be related to the GC percent of the coding sequences. Received: 7 December 1999 / Accepted: 10 May 2000  相似文献   

9.
以植物钾离子外排通道(K’channeloutward.rectifier,KCO)基因为研究对象,运用CodonW软件分析了75个植物KCO基因密码子的使用模式,探讨密码子的使用模式和影响密码子使用的各种可能因素。结果表明:碱基组成差异(r=0.961,P〈0.01)和自然选择(r=0.568,P〈0.01)是影响密码子使用的主要因素,并且高表达的基因强烈偏爱使用以G或C结尾的密码子。确定了UUC、CUC等26个均以G/C结尾的密码子为植物KcD基因的高表达优越密码子。  相似文献   

10.
余劲聪  方柏山 《生物信息学》2011,9(3):242-246,249
密码子用法数据库(CUD)是密码子用法与密码子优化研究领域的一个重要的在线服务,为了找出该数据库中潜在的两种不再适用的记录,即已过期的陈旧记录和在遗传密码类型上实际无法有效支持的记录(简称不支持记录),本文通过结合前期自主研发的两个软件BestCodon与CUDassist,组建了CUD自动寻错平台CUDer,并应用于上述问题的研究。结果发现,CUD中存在317条陈旧记录与4条不支持记录。对于陈旧记录,这些记录的物种分类号在NCBI中已发生变更,研究者应该避免使用这些记录的相关数据;对于不支持记录,本文借助CUDer计算得到这些记录正确的密码子用法表(CUT),弥补了当前CUD的不足。此外,该平台也为面向CUD的自动化数据处理,提供了一个新的软件框架,具有一定的借鉴意义。  相似文献   

11.
A novel subtype of influenza A virus 09H1N1 has rapidly spread across the world. Evolutionary analyses of this virus have revealed that 09H1N1 is a triple reassortant of segments from swine, avian and human influenza viruses. In this study, we investigated factors shaping the codon usage bias of 09H1N1 and carried out cluster analysis of 60 strains of influenza A virus from different subtypes based on their codon usage bias. We discovered that more preferentially used codons of 09H1N1 are A-ended or U-ended...  相似文献   

12.
江澎  孙啸  陆祖宏 《遗传学报》2007,34(3):275-284
比较分析了嗜热泉生古细菌(Aeropyrum pernix K1)和其他两种系统发育相关的泉古菌[嗜气菌(Pyrobaculum aerophi-lumstr.IM2)和嗜硫菌(Sulfolobus acidocaldarius DSM 639)]的同义密码子使用偏向性。结果表明嗜热泉生古细菌(Aeropyrum pernix K1)的密码子偏向性很小,并且与GC3S成高度的相关性。这3种泉古菌的密码子使用模式在进化上很保守。与基因的功能对密码子使用的影响相比,这些泉古菌密码子的使用偏向性更是由其物种所决定的。嗜热泉生古细菌(A.pernix K1),嗜气菌(P.aerophilum str.IM2)和嗜硫菌(S.acidocaldarius DSM 639)生存在不同的极限环境中。推测正是这些极限环境决定了这些泉古菌的密码子使用偏向性模式。此外在这些泉古菌的基因组中并没有发现其正义链和反义链的密码子使用偏向性差别。嗜热泉生古细菌(A.pernix K1)和嗜硫菌(S.acidocaldarius DSM 639)的密码子偏向性程度与基因表达水平有高度的相关性,而嗜气菌(P.aerophilum str.IM2)的基因组并没有发现这种规律。  相似文献   

13.
The usage of synonymous codons and the frequencies of amino acids were investigated in the complete genome of the bacterium Thermotoga maritima using a multivariate statistical approach. The GC3 content of each gene was the most prominent source of variation of codon usage. Surprisingly the usage of UGU and UGC (synonymous triplets coding for Cys, the least frequent amino acid in this species) was detected as the second most prominent source of variation. However, this result is probably an artifact due to the very low frequency of Cys together with the nonbiased composition of this genome. The third trend was related to the preferential usage of a subset of codons among highly expressed genes, and these triplets are presumed to be translationally optimal. Concerning the amino acid usage, the hydropathy level of each protein (and therefore the frequency of charged residues) was the main trend, while the second factor was related to the frequency of usage of the smaller residues, suggesting that the cell economy strongly influences the architecture of the proteins. The third axis of the analysis discriminated the usage of Phe, Tyr, Trp (aromatic residues) plus Cys, Met, and His. These six residues have in common the property of being the preferential targets of reactive oxygen species, and therefore the anaerobic condition of T. maritima is an important factor for the amino acid frequencies. Finally, the Cys content of each protein was the fourth trend. Received: 22 June 2001 / Accepted: 1 October 2001  相似文献   

14.
Codon usage and genome composition   总被引:17,自引:0,他引:17  
Summary The GC levels of codon third positions from 49 genomes coveering a wide phylogenetic range are linearly correlated with the GC levels of the corresponding genomes. Three different relationships have been found: one for prokaryotes and viruses, one for lower eukaryotes, and one for vertebrates. All points not fitting the first relationship can be brought into quasi coincidence with it when plotted against GC levels of coding sequences.  相似文献   

15.
研究了Escherichiacoli(115个基因)和SacharomycesYeast(97个基因)核酸序列的密码子使用频率与基因表达水平的关系.将同义密码子按使用频率统计值分成三种特性的密码子:最适密码子(H)、非最适密码子(L)和稀有密码子(R),对每一基因序列的编码区,算出它们各自出现的概率P(H),P(L)和P(R).以P(H)和P(R)为指标,用图论法聚类,发现每种生物的高低表达基因明显分开,基因表达水平被分为四级:甚高表达基因(VH)、高表达基因(H)、较低表达基因(LM)和低表达基因(LL).每类基因的表达水平与实验结果保持了很好的相关性,与E.coli和Yeast的现有资料相比,符合很好.  相似文献   

16.
The genetic code is not random but instead is organized in such a way that single nucleotide substitutions are more likely to result in changes between similar amino acids. This fidelity, or error minimization, has been proposed to be an adaptation within the genetic code. Many models have been proposed to measure this adaptation within the genetic code. However, we find that none of these consider codon usage differences between species. Furthermore, use of different indices of amino acid physicochemical characteristics leads to different estimations of this adaptation within the code. In this study, we try to establish a more accurate model to address this problem. In our model, a weighting scheme is established for mistranslation biases of the three different codon positions, transition/transversion biases, and codon usage. Different indices of amino acids physicochemical characteristics are also considered. In contrast to pervious work, our results show that the natural genetic code is not fully optimized for error minimization. The genetic code, therefore, is not the most optimized one for error minimization, but one that balances between flexibility and fidelity for different species.  相似文献   

17.
18.
We compared the codon usage of sequences of transposable elements (TEs) with that of host genes from the species Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Saccharomyces cerevisiae, and Homo sapiens. Factorial correspondence analysis showed that, regardless of the base composition of the genome, the TEs differed from the genes of their host species by their AT-richness. In all species, the percentage of A + T on the third codon position of the TEs was higher than that on the first codon position and lower than that in the noncoding DNA of the genomes. This indicates that the codon choice is not simply the outcome of mutational bias but is also subject to selection constraints. A tendency toward higher A + T on the third position than on the first position was also found in the host genes of A. thaliana, C. elegans, and S. cerevisiae but not in those of D. melanogaster and H. sapiens. This strongly suggests that the AT choice is a host-independent characteristic common to all TEs. The codon usage of TEs generally appeared to be different from the mean of the host genes. In the AT-rich genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Saccharomyces cerevisiae, the codon usage bias of TEs was similar to that of weakly expressed genes. In the GC-rich genome of D. melanogaster, however, the bias in codon usage of the TEs clearly differed from that of weakly expressed genes. These findings suggest that selection acts on TEs and that TEs may display specific behavior within the host genomes. Received: 2 May 2001 / Accepted: 29 October 2001  相似文献   

19.
Studies on the origin of the genetic code compare measures of the degree of error minimization of the standard code with measures produced by random variant codes but do not take into account codon usage, which was probably highly biased during the origin of the code. Codon usage bias could play an important role in the minimization of the chemical distances between amino acids because the importance of errors depends also on the frequency of the different codons. Here I show that when codon usage is taken into account, the degree of error minimization of the standard code may be dramatically reduced, and shifting to alternative codes often increases the degree of error minimization. This is especially true with a high CG content, which was probably the case during the origin of the code. I also show that the frequency of codes that perform better than the standard code, in terms of relative efficiency, is much higher in the neighborhood of the standard code itself, even when not considering codon usage bias; therefore alternative codes that differ only slightly from the standard code are more likely to evolve than some previous analyses suggested. My conclusions are that the standard genetic code is far from being an optimum with respect to error minimization and must have arisen for reasons other than error minimization.[Reviewing Editor: Martin Kreitman]  相似文献   

20.
In recent years, the amount of molecular sequencing data from Tetrahymena thermophila has dramatically increased. We analyzed G + C content, codon usage, initiator codon context and stop codon sites in the extremely A + T rich genome of this ciliate. Average G + C content was 38% for protein coding regions, 21% for 5' non-coding sequences, 19% for 3' non-coding sequences, 15% for introns, 19% for micronuclear limited sequences and 17% for macronuclear retained sequences flanking micronuclear specific regions. The 75 available T. thermophila protein coding sequences favored codons ending in T and, where possible, avoided those with G in the third position. Highly expressed genes were relatively G + C-rich and exhibited an extremely biased pattern of codon usage while developmentally regulated genes were more A + T-rich and showed less codon usage bias. Regions immediately preceding Tetrahymena translation initiator codons were generally A-rich. For the 60 stop codons examined, the frequency of G in the end + 1 site was much higher than expected whereas C never occupied this position.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号