首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Codon usage bias refers to the phenomenon where specific codons are used more often than other synonymous codons during translation of genes, the extent of which varies within and among species. Molecular evolutionary investigations suggest that codon bias is manifested as a result of balance between mutational and translational selection of such genes and that this phenomenon is widespread across species and may contribute to genome evolution in a significant manner. With the advent of whole‐genome sequencing of numerous species, both prokaryotes and eukaryotes, genome‐wide patterns of codon bias are emerging in different organisms. Various factors such as expression level, GC content, recombination rates, RNA stability, codon position, gene length and others (including environmental stress and population size) can influence codon usage bias within and among species. Moreover, there has been a continuous quest towards developing new concepts and tools to measure the extent of codon usage bias of genes. In this review, we outline the fundamental concepts of evolution of the genetic code, discuss various factors that may influence biased usage of synonymous codons and then outline different principles and methods of measurement of codon usage bias. Finally, we discuss selected studies performed using whole‐genome sequences of different insect species to show how codon bias patterns vary within and among genomes. We conclude with generalized remarks on specific emerging aspects of codon bias studies and highlight the recent explosion of genome‐sequencing efforts on arthropods (such as twelve Drosophila species, species of ants, honeybee, Nasonia and Anopheles mosquitoes as well as the recent launch of a genome‐sequencing project involving 5000 insects and other arthropods) that may help us to understand better the evolution of codon bias and its biological significance.  相似文献   

2.
High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.  相似文献   

3.
4.
The honeybee (Apis mellifera) has a genome with a wide variation in GC content showing 2 clear modal GC values, in some ways reminiscent of an isochore-like structure. To gain insight into causes and consequences of this pattern, we used a comparative approach to study the genome-wide alignment of primarily coding sequence of A. mellifera with Drosophila melanogaster and Anopheles gambiae. The latter 2 species show a higher average GC content than A. mellifera and no indications of bimodality, suggesting that the GC-poor mode is a derived condition in honeybee. In A. mellifera, synonymous sites of genes generally adopt the GC content of the region in which they reside. A large proportion of genes in GC-poor regions have not been assigned to the honeybee assembly because of the low sequence complexity of their genome neighborhood. The synonymous substitution rate between A. mellifera and the other species is very close to saturation, but analyses of nonsynonymous substitutions as well as amino acid substitutions indicate that the GC-poor regions are not evolving faster than the GC-rich regions. We describe the codon usage and amino acid usage and show that they are remarkably heterogeneous within the honeybee genome between the 2 different GC regions. Specifically, the genes located in GC-poor regions show a much larger deviation in both codon usage bias and amino acid usage from the Dipterans than the genes located in the GC-rich regions.  相似文献   

5.
Wang ML  Song JN  Xu WB  Li WJ 《FEBS letters》2004,576(3):336-338
Proline is a special imino acid in protein and the isomerization of the prolyl peptide bond has notable biological significance and influences the final structure of protein greatly, so the correlation between proline synonymous codon usage and local amino acid, the correlation between proline synonymous codon usage and the isomerization of the prolyl peptide bond were both investigated in the Escherichia coli genome by using a novel method based on information theory. The results show that in peptide chain, the residue at the first position C-terminal influences the usage of proline synonymous codon greatly and proline synonymous codons contain some factors influencing the isomerization of the prolyl peptide bond.  相似文献   

6.
7.
X Tian  J E Strassmann  D C Queller 《Heredity》2014,112(2):215-218
Eukaryotic protein sequences often contain amino-acid homopolymers that consist of a single amino acid repeated from several to dozens of times. Some of these are functional but others may persist largely because of high expansion rates due to DNA slippage. However, very long homopolymers with over a hundred repeats are very rare. We report an extraordinarily long homopolymer consisting of 306 tandem serine repeats from the single-celled eukaryote Dictyostelium discoideum, which also has a multicellular stage. The gene has a paralog with 132 repeats and orthologs, also with high serine repeat numbers, in various other Dictyostelid species. The conserved gene structure and protein sequences suggest that the homopolymer is functional. The high codon diversity and very poor alignment of serine codons in this gene between species similarly indicate functionality. This is because the serine homopolymer is conserved despite much DNA sequence change. A survey of other very long amino-acid homopolymers in eukaryotes shows that high codon diversity is the rule, suggesting that these too may be functional.  相似文献   

8.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

9.
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.  相似文献   

10.
杨树派间不同种的遗传密码子使用频率分析   总被引:1,自引:0,他引:1  
周猛  童春发  施季森 《遗传学报》2007,34(6):555-561
遗传密码子的简并性特征造成了不同物种使用的密码子存在偏爱性。了解不同物种的密码子使用特点,可以为外源基因导入过程中的基因改造提供依据,从而实现外源基因的高效表达。杨树是世界上广泛栽培的重要造林树种之一,已经成为林木基因工程研究的模式植物。本研究采用高频密码子分析法,对美洲山杨P.tremuloides,毛白杨P.tomentosa,美洲黑杨P.deltoids和毛果杨P.trichocarpa 4种杨树的蛋白质编码基因序列(CDS)进行了分析,计算出了杨树同义密码子相对使用频率(RFSC),确定了4种杨树的高频率密码子,发现虽然不同种类的杨树密码子使用上有一些差别,但是偏爱密码子的差别却很小,共性的密码子占绝大多数。仅有Pro,Thr和Cys等少数几个氨基酸的偏爱密码子有差别。这种“共性”提示我们,用不同种的杨树中任何一种杨树的偏爱密码子所设计的外源基因在其他杨树中也可以使用。  相似文献   

11.
We surveyed the substitution patterns in the ent-kaurenoic acid oxidase (KAO) gene in 11 species of Oryzeae with an outgroup in the Ehrhartoidaea. The synonymous and non-synonymous substitution rates showed a high positive correlation with each other, but were negatively correlated with codon usage bias and GC content at third codon positions. The substitution rate was heterogenous among lineages. Likelihood-ratio tests showed that the non-synonymous/synonymous rate ratio changed significantly among lineages. Site-specific models provided no evidence for positive selection of particular amino acid sites in any codon of the KAO gene. This finding suggested that the significant rate heterogeneity among some lineages may have been caused by variability in the relaxation of the selective constraint among lineages or by neutral processes.  相似文献   

12.
Wei JP  Pan XF  Li HQ  Duan F 《遗传》2011,33(1):67-74
简单重复序列广泛分布于从原核到真核生物的基因组中, 其形成的分子机理目前尚不明确。对NCBI数据库中已有256种哺乳动物线粒体DNA (mtDNA) D-loop区进行序列比对分析, 根据其所含有的简单重复序列类型分为3组, 分别是53种哺乳动物含有六核苷酸重复序列; 104种哺乳动物含有非六核苷酸重复序列(>6 bp); 99种哺乳动物不含有任何重复序列。通过碱基序列分析比对, 发现六核苷酸重复序列集中分布在CSB1-CSB2间隔区, 而非六核苷酸重复可以分布于终止区(TAS)、中央保守区(Central domain)以及CSB(Central sequence block)区。通过比较含有重复序列与不含重复序列的功能保守区发现, 简单重复序列的存在并不明确影响D-loop区内的中央保守区以及CSB1、CSB2、CSB3三个功能保守区的碱基序列保守性。在此基础上, 利用N-J法构建了256种哺乳动物的进化树, 分析了哺乳动物D-Loop区内重复序列在进化过程中的可能变化规律, 发现简单重复序列随着物种的进化地位的升高而呈现消失趋势。  相似文献   

13.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

14.
The ethanol tolerance of adult transgenic flies of Drosophila containing between zero and ten unpreferred synonymous mutations that reduced codon bias in the alcohol dehydrogenase (Adh) gene was assayed. As the amino acid sequences of the ADH protein were identical in the four genotypes assayed, differences in ethanol tolerance were due to differences in the abundance of ADH protein, presumably driven by the effects of codon bias on translational efficiency. The ethanol tolerance of genotypes decreased with the number of unpreferred synonymous mutations, and a positive correlation between ADH protein abundance and ethanol tolerance was observed. This work confirms that the fitness effects of unpreferred synonymous mutations that reduce codon bias in a highly expressed gene are experimentally measurable in Drosophila melanogaster.  相似文献   

15.
That natural selection affects molecular evolution at synonymous sites in protein-coding sequences is well established and is thought to predominantly reflect selection for translational efficiency/accuracy mediated through codon bias. However, a recently developed maximum likelihood framework, when applied to 18 coding sequences in 3 species of Drosophila, confirmed an earlier report that the Notch gene in Drosophila melanogaster was evolving under selection in favor of those codons defined as unpreferred in this species. This finding opened the possibility that synonymous sites may be subject to a variety of selective pressures beyond weak selection for increased frequencies of the codons currently defined as "preferred" in D. melanogaster. To further explore patterns of synonymous site evolution in Drosophila in a lineage-specific manner, we expanded the application of the maximum likelihood framework to 8,452 protein coding sequences with well-defined orthology in D. melanogaster, Drosophila sechellia, and Drosophila yakuba. Our analyses reveal intragenomic and interspecific variation in mutational patterns as well as in patterns and intensity of selection on synonymous sites. In D. melanogaster, our results provide little statistical evidence for recent selection on synonymous sites, and Notch remains an outlier. In contrast, in D. sechellia our findings provide evidence in support of selection predominantly in favor of preferred codons. However, there is a small subset of genes in this species that appear to be evolving under selection in favor of unpreferred codons, which indicates that selection on synonymous sites is not limited to the preferential fixation of mutations that enhance the speed or accuracy of translation in this species.  相似文献   

16.
该研究以2株野生沙枣(Elaeagnus angustifolia Linn.)嫩枝经温室水培后的嫩叶为材料,采用CTAB法分别提取总DNA,并利用第二代测序技术进行总DNA从头测序,组装后得到2株沙枣叶绿体基因组全序列,并详细分析了其蛋白质编码基因密码子使用的偏好性及其原因,为沙枣叶绿体基因工程和分子系统进化等研究奠定基础。结果显示:(1)组装得到沙枣叶绿体基因组序列全长150 546 bp,由长度为81 113 bp的长单拷贝(LSC)区域和25 494 bp的短单拷贝(SSC)区域,以及1对分隔开它们的长18 445 bp的反向重复序列(IRS)组成;注释共得到132个基因,包括86个蛋白编码基因、38个tRNA基因和8个rRNA基因。(2)沙枣叶绿体基因组蛋白编码基因密码子的第三位碱基GC含量(GC_3)为28.47%,明显低于整个叶绿体基因组GC含量(37%),也低于第一位(GC_1)和第二位(GC_2)碱基的GC含量,说明密码子对AT碱基结尾有偏好性;其中, UCU、CCU、UGU、GCU、CUU、GAU、UCA和UAA为最优密码子。(3)同义密码子相对使用频率(RSCU)分析发现,影响密码子使用模式的因素并不单一,密码子的偏好性受到突变、选择及其他因素的共同影响,并且自然选择表达引起的序列差异比突变对密码子偏好性的影响要显著;中性绘图分析、有效密码子数(ENC-plot)分析和奇偶偏好性(PR2-plot)分析表明,沙枣叶绿体基因组使用密码子的偏性受选择的影响更大。(4)通过最大似然法、最大简约法和贝叶斯方法对胡颓子科6个物种和1个枣的叶绿体基因序列构建系统发育树,与它们使用密码子偏性聚类的结果一致,表明叶绿体基因组使用密码子偏性与物种的亲缘关系相关。  相似文献   

17.
Huntley MA  Golding GB 《Proteins》2002,48(1):134-140
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.  相似文献   

18.
We describe the first complete mitochondrial genome sequence from a representative of the insect order Coleoptera, the flour beetle Tribolium castaneum. The 15,881 bp long Tribolium mitochondrial genome encodes 13 putative proteins, two ribosomal RNAs and 22 tRNAs canonical for animal mitochondrial genomes. Their arrangement is identical to that in Drosophila melanogaster, which is considered ancestral for insects and crustaceans (Boore et al., 1998; Hwang, et al., 2001a). Nucleotide composition, amino acid composition, and codon usage fall within the range of values observed in other insect mitochondrial genomes. Most notable features are the use of TCT as tRNA(Ser(AGN)) anticodon instead of GCT, which is used in most other arthropod species, and the relative scarcity of special sequence motifs in the 1431 bp long control region. Phylogenetic analysis confirmed resolving power in the conserved regions of the mitochondrial proteome regarding diversification events, which predate the emergence of pterygote insects, while little resolution was obtained at the level of basal perygote diversification. The partition of faster evolving amino acid sites harbored strong support for joining Lepidoptera with Diptera, which is consistent with a monophyletic Mecopterida.  相似文献   

19.
A corollary of the nearly neutral theory of molecular evolution is that the efficiency of natural selection depends on effective population size. In this study, we evaluated the differences in levels of synonymous polymorphism among Drosophila species and showed that these differences can be explained by differences in effective population size. The differences can have implications for the molecular evolution of the Drosophila species, as is suggested by our results showing that the levels of codon bias and the proportion of adaptive substitutions are both higher in species with higher levels of synonymous polymorphism. Moreover, species with lower synonymous polymorphism have higher levels of nonsynonymous polymorphism and larger content of repetitive sequences in their genomes, suggesting a diminished efficiency of selection in species with smaller effective population size.  相似文献   

20.
Mitochondrial genomes typically show genome-wide patterns of synonymous codon usage bias. In animals and land plants, mutation appears more dominant than selection in shaping this bias, while in green algae the relative importance of these factors is not well studied. Based on our analysis of mitochondrial DNA sequence from the green algae Mesostigma viride (NIES-296) and Chlamydomonas reinhardtii (CC-277) and a closely related relative of each, we conclude that both mutation and selection are important in shaping synonymous codon usage bias in their mitochondrial genomes, with selection being more dominant. The possible confounding influence of mutational context dependence on our analyses is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号