首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The "expression measure" of a gene, E(g), is a statistic devised to predict the level of gene expression from codon usage bias. E(g) has been used extensively to analyze prokaryotic genome sequences. We discuss 2 problems with this approach. First, the formulation of E(g) is such that genes with the strongest selected codon usage bias are not likely to have the highest predicted expression levels; indeed the correlation between E(g) and expression level is weak among moderate to highly expressed genes. Second, in some species, highly expressed genes do not have unusual codon usage, and so codon usage cannot be used to predict expression levels. We outline a simple approach, first to check whether a genome shows evidence of selected codon usage bias and then to assess the strength of bias in genes as a guide to their likely expression level; we illustrate this with an analysis of Shewanella oneidensis.  相似文献   

2.
The 'effective number of codons' used in a gene   总被引:64,自引:0,他引:64  
F Wright 《Gene》1990,87(1):23-29
A simple measure is presented that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons. This measure of synonymous codon usage bias, the 'effective number of codons used in a gene', Nc, can be easily calculated from codon usage data alone, and is independent of gene length and amino acid (aa) composition. Nc can take values from 20, in the case of extreme bias where one codon is exclusively used for each aa, to 61 when the use of alternative synonymous codons is equally likely. Nc thus provides an intuitively meaningful measure of the extent of codon preference in a gene. Codon usage patterns across genes can be investigated by the Nc-plot: a plot of Nc vs. G + C content at synonymous sites. Nc-plots are produced for Homo sapiens, Saccharomyces cerevisiae, Escherichia coli, Bacillus subtilis, Dictyostelium discoideum, and Drosophila melanogaster. A FORTRAN77 program written to calculate Nc is available on request.  相似文献   

3.
Gu W  Zhou T  Ma J  Sun X  Lu Z 《Bio Systems》2004,73(2):89-97
The role of silent position in the codon on the protein structure is an interesting and yet unclear problem. In this paper, 563 Homo sapiens genes and 417 Escherichia coli genes coding for proteins with four different folding types have been analyzed using variance analysis, a multivariate analysis method newly used in codon usage analysis, to find the correlation between amino acid composition, synonymous codon, and protein structure in different organisms. It has been found that in E. coli, both amino acid compositions in differently folded proteins and synonymous codon usage in different gene classes coding for differently folded proteins are significantly different. It was also found that only amino acid composition is different in different protein classes in H. sapiens. There is no universal correlation between synonymous codon usage and protein structure in these two different organisms. Further analysis has shown that GC content on the second codon position can distinguish coding genes for different folded proteins in both organisms.  相似文献   

4.
It is important and meaningful to understand the codon usage pattern and the factors that shape codon usage of maize. In this study, trends in synonymous codon usage in maize have been firstly examined through the multivariate statistical analysis on 7402 cDNA sequences. The results showed that the genes positions on the primary axis were strongly negatively correlated with GC3s, GC content of individual gene and gene expression level assessed by the codon adaptation index (CAI) values, which indicated that nucleotide composition and gene expression level were the main factors in shaping the codon usage of maize, and the variation in codon usage among genes may be due to mutational bias at the DNA level and natural selection acting at the level of mRNA translation. At the same time, CDS length and the hydrophobicity of each protein were, respectively, significantly correlated with the genes locations on the primary axis, GC3s and CAI values. We infer that genes length and the hydrophobicity of the encoded protein may play minor role in shaping codon usage bias. Additional 28 codons ending with a G or C base have been defined as “optimal codons”, which may provide useful information for maize gene-transformation and gene prediction.  相似文献   

5.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

6.
Codon usage data has been compiled for 110 yeast genes. Cluster analysis on relative synonymous codon usage revealed two distinct groups of genes. One group corresponds to highly expressed genes, and has much more extreme synonymous codon preference. The pattern of codon usage observed is consistent with that expected if a need to match abundant tRNAs, and intermediacy of tRNA-mRNA interaction energies are important selective constraints. Thus codon usage in the highly expressed group shows a higher correlation with tRNA abundance, a greater degree of third base pyrimidine bias, and a lesser tendency to the A+T richness which is characteristic of the yeast genome. The cluster analysis can be used to predict the likely level of gene expression of any gene, and identifies the pattern of codon usage likely to yield optimal gene expression in yeast.  相似文献   

7.
Zhao S  Zhang Q  Liu X  Wang X  Zhang H  Wu Y  Jiang F 《Bio Systems》2008,92(3):207-214
Human Bocavirus (HBoV) is a novel virus which can cause respiratory tract disease in infants or children. In this study, the codon usage bias and the base composition variations in the available 11 complete HBoV genome sequences have been investigated. Although, there is a significant variation in codon usage bias among different HBoV genes, codon usage bias in HBoV is a little slight, which is mainly determined by the base compositions on the third codon position and the effective number of codons (ENC) value. The results of correspondence analysis (COA) and Spearman's rank correlation analysis reveals that the G + C compositional constraint is the main factor that determines the codon usage bias in HBoV and the gene's function also contributes to the codon usage in this virus. Moreover, it was found that the hydrophobicity of each protein and the gene length are also critical in affecting these viruses’ codon usage, although they were less important than that of the mutational bias and the genes’ function. At last, the relative synonymous codon usage (RSCU) of 44 genes from these 11 HBoV isolates is analyzed using a hierarchical cluster method. The result suggests that genes with same function yet from different isolates are classified into the same lineage and it does not depend on geographical location. These conclusions not only can offer an insight into the codon usage patterns and gene classification of HBoV, but also may help in increasing the efficiency of gene delivery/expression systems.  相似文献   

8.
葡萄基因组密码子使用偏好模式研究   总被引:2,自引:0,他引:2  
根据完整基因组序列,运用多元统计分析和对应分析的方法,探讨了葡萄全基因组序列密码子的使用模式和影响密码子使用的各种可能因素。结果显示:葡萄密码子偏好性主要受到碱基差异(r=0.925)和自然选择(r=0.193)共同作用的影响,突变压力占了主导因素,自然选择的作用较小。同时基因长度和蛋白质疏水性也对密码子的偏好性有所影响。确定了葡萄的20个最优密码子。  相似文献   

9.
Analysis of synonymous codon usage pattern in the genome of a thermophilic cyanobacterium, Thermosynechococcus elongatus BP-1 using multivariate statistical analysis revealed a single major explanatory axis accounting for codon usage variation in the organism. This axis is correlated with the GC content at third base of synonymous codons (GC3s) in correspondence analysis taking T. elongatus genes. A negative correlation was observed between effective number of codons i.e. Nc and GC3s. Results suggested a mutational bias as the major factor in shaping codon usage in this cyanobacterium. In comparison to the lowly expressed genes, highly expressed genes of this organism possess significantly higher proportion of pyrimidine-ending codons suggesting that besides, mutational bias, translational selection also influenced codon usage variation in T. elongatus. Correspondence analysis of relative synonymous codon usage (RSCU) with A, T, G, C at third positions (A3s, T3s, G3s, C3s, respectively) also supported this fact and expression levels of genes and gene length also influenced codon usage. A role of translational accuracy was identified in dictating the codon usage variation of this genome. Results indicated that although mutational bias is the major factor in shaping codon usage in T. elongatus, factors like translational selection, translational accuracy and gene expression level also influenced codon usage variation.  相似文献   

10.
Carboxydothermus hydrogenoformans is an extremely thermophilic, Gram-positive bacterium growing on carbon monoxide (CO) as single carbon and energy source and producing only H(2) and CO(2). Carbon monoxide dehydrogenase is a key enzyme for CO metabolism. The carbon monoxide dehydrogenase genes cooF and cooS from C. hydrogenoformans were cloned and sequenced. These genes showed the highest similarity to the cooF genes from the archaeon Archaeoglobus fulgidus and the cooS gene from the bacterium Rhodospirillum rubrum, respectively. The cooS gene was identified immediately downstream of cooF, however, the cooF and cooS genes from C. hydrogenoformans have substantially different codon usage, and the cooF gene Arg codon usage pattern, dominated by AGA and AGG, resembles the archaeal pattern. The data therefore suggest lateral transfer of these genes, possibly from different donor species.  相似文献   

11.
基因表达水平与同义密码子使用关系的初步研究   总被引:3,自引:0,他引:3  
提出一个预测基因表达水平和同义密码子使用的自洽信息聚类方法。将同义密码子分成最适密码子、非最适密码子和稀有密码子,认为三者的使用频率是调控基因表达水平的主要因素。基于这一观点,对Ecoli和Yeast两类生物的基因表达水平和密码子的使用,用自洽信息聚类方法进行了预测。发现高低表达基因明显分开,基因表达水平被分为四级;甚高表达基因(VH)、高表达基因(H)、较低表达基因(LM)和低表达基因(LL);  相似文献   

12.
Glycosyl hydrolase (GH) genes from Escherichia coli and Bacillus subtilis were used to search for cases of horizontal gene transfer. Such an event was inferred by G + C content, codon usage analysis, and a phylogenetic congruency test. The codon usage analysis used is a procedure based on a distance derived from a Pearson linear correlation coefficient determined from a pairwise codon usage comparison. The distances are then used to generate a distance-based tree with which we can define clusters and rapidly compare codon usage. Three genes (yagH from E. coli and xynA and xynB from B. subtilis) were determined to have arrived by horizontal gene transfer and were located in E. coli CP4-6 prophage, and B. subtilis prophages 6 and 5, respectively. In this study, we demonstrate that with codon usage analysis, the proposed horizontally transferred genes can be distinguished from highly expressed genes.  相似文献   

13.
In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.  相似文献   

14.
Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.  相似文献   

15.
Codon bias is generally thought to be determined by a balance between mutation, genetic drift, and natural selection on translational efficiency. However, natural selection on codon usage is considered to be a weak evolutionary force and selection on codon usage is expected to be strongest in species with large effective population sizes. In this paper, I study associations between codon usage, gene expression, and molecular evolution at synonymous and nonsynonymous sites in the long-lived, woody perennial plant Populus tremula (Salicaceae). Using expression data for 558 genes derived from expressed sequence tags (EST) libraries from 19 different tissues and developmental stages, I study how gene expression levels within single tissues as well as across tissues affect codon usage and rates sequence evolution at synonymous and nonsynonymous sites. I show that gene expression have direct effects on both codon usage and the level of selective constraint of proteins in P. tremula, although in different ways. Codon usage genes is primarily determined by how highly expressed a genes is, whereas rates of sequence evolution are primarily determined by how widely expressed genes are. In addition to the effects of gene expression, protein length appear to be an important factor influencing virtually all aspects of molecular evolution in P. tremula.  相似文献   

16.
It is generally believed that the effect of translational selection on codon usage bias is related to the number of transfer RNA genes in bacteria, which is more with respect to the high expression genes than the whole genome. Keeping this in the background, we analyzed codon usage bias with respect to asparagine, isoleucine, phenylalanine, and tyrosine amino acids. Analysis was done in seventeen bacteria with the available gene expression data and information about the tRNA gene number. In most of the bacteria, it was observed that codon usage bias and tRNA gene number were not in agreement, which was unexpected. We extended the study further to 199 bacteria, limiting to the codon usage bias in the two highly expressed genes rpoB and rpoC which encode the RNA polymerase subunits β and β′, respectively. In concordance with the result in the high expression genes, codon usage bias in rpoB and rpoC genes was also found to not be in agreement with tRNA gene number in many of these bacteria. Our study indicates that tRNA gene numbers may not be the sole determining factor for translational selection of codon usage bias in bacterial genomes.  相似文献   

17.
We present an expression measure of a gene, devised to predictthe level of gene expression from relative codon bias (RCB).There are a number of measures currently in use that quantifycodon usage in genes. Based on the hypothesis that gene expressivityand codon composition is strongly correlated, RCB has been definedto provide an intuitively meaningful measure of an extent ofthe codon preference in a gene. We outline a simple approachto assess the strength of RCB (RCBS) in genes as a guide totheir likely expression levels and illustrate this with an analysisof Escherichia coli (E. coli) genome. Our efforts to quantitativelypredict gene expression levels in E. coli met with a high levelof success. Surprisingly, we observe a strong correlation betweenRCBS and protein length indicating natural selection in favourof the shorter genes to be expressed at higher level. The agreementof our result with high protein abundances, microarray dataand radioactive data demonstrates that the genomic expressionprofile available in our method can be applied in a meaningfulway to the study of cell physiology and also for more detailedstudies of particular genes of interest.  相似文献   

18.
Studies on codon usage in Entamoeba histolytica   总被引:13,自引:0,他引:13  
Codon usage bias of Entamoeba histolytica, a protozoan parasite, was investigated using the available DNA sequence data. Entamoeba histolytica having AT rich genome, is expected to have A and/or T at the third position of codons. Overall codon usage data analysis indicates that A and/or T ending codons are strongly biased in the coding region of this organism. However, multivariate statistical analysis suggests that there is a single major trend in codon usage variation among the genes. The genes which are supposed to be highly expressed are clustered at one end, while the majority of the putatively lowly expressed genes are clustered at the other end. The codon usage pattern is distinctly different in these two sets of genes. C ending codons are significantly higher in the putatively highly expressed genes suggesting that C ending codons are translationally optimal in this organism. In the putatively lowly expressed genes A and/or T ending codons are predominant, which suggests that compositional constraints are playing the major role in shaping codon usage variation among the lowly expressed genes. These results suggest that both mutational bias and translational selection are operational in the codon usage variation in this organism.  相似文献   

19.
Adaptive codon usage provides evidence of natural selection in one of its most subtle forms: a fitness benefit of one synonymous codon relative to another. Codon usage bias is evident in the coding sequences of a broad array of taxa, reflecting selection for translational efficiency and/or accuracy as well as mutational biases. Here, we quantify the magnitude of selection acting on alternative codons in genes of the nematode Caenorhabditis remanei, an outcrossing relative of the model organism C. elegans, by fitting the expected mutation-selection-drift equilibrium frequency distribution of preferred and unpreferred codon variants to the empirical distribution. This method estimates the intensity of selection on synonymous codons in genes with high codon bias as N(e)s = 0.17, a value significantly greater than zero. In addition, we demonstrate for the first time that estimates of ongoing selection on codon usage among genes, inferred from nucleotide polymorphism data, correlate strongly with long-term patterns of codon usage bias, as measured by the frequency of optimal codons in a gene. From the pattern of polymorphisms in introns, we also infer that these findings do not result from the operation of biased gene conversion toward G or C nucleotides. We therefore conclude that coincident patterns of current and ancient selection are responsible for shaping biased codon usage in the C. remanei genome.  相似文献   

20.
A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号