首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
CUTG (codon usage tabulated from GenBank) is a comprehensive database for codon usage. The codon usage for each full-length protein gene has been calculated using the nucleotide sequence obtained from GenBank sequence database. The sum of the codon use of each organism has been also calculated. The data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of codonusage of genes in organisms was made searchableby name of organism through a web site http://www.dna.affrc.go.jp/ approximately nakamura/CUTG.html The compilation is synchronized with major release of GenBank.  相似文献   

2.
Frequencies for each of the 206 526 complete protein-coding genes (CDS's) have been compiled from taxonomical divisions of the GenBank DNA sequence database. The sum of the codon use of 7434 organisms has also been calculated. These data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of the codon usage of genes in an organism as well as the sum of the codon usage of the organism was made searchable by the name of organism through a web site http://www.dna.affrc.go.jp//CUTG.html  相似文献   

3.
The codon usage in individual protein genes has been calculated using the nucleotide sequence obtained from the GenBank Genetic Sequence Database. Sum of the codon use of each organism has been also calculated. The data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of codon usage of genes in organisms was made searchable by name of organism through a web site. The compilation has been synchronized with a major release of GenBank.  相似文献   

4.
The genetic code is degenerate, but alternative synonymous codons are generally not used with equal frequency. Since the pioneering work of Grantham's group it has been apparent that genes from one species often share similarities in codon frequency; under the "genome hypothesis" there is a species-specific pattern to codon usage. However, it has become clear that in most species there are also considerable differences among genes. Multivariate analyses have revealed that in each species so far examined there is a single major trend in codon usage among genes, usually from highly biased to more nearly even usage of synonymous codons. Thus, to represent the codon usage pattern of an organism it is not sufficient to sum over all genes as this conceals the underlying heterogeneity. Rather, it is necessary to describe the trend among genes seen in that species. We illustrate these trends for six species where codon usage has been examined in detail, by presenting the pooled codon usage for the 10% of genes at either end of the major trend. Closely-related organisms have similar patterns of codon usage, and so the six species in Table 1 are representative of wider groups. For example, with respect to codon usage, Salmonella typhimurium closely resembles E. coli, while all mammalian species so far examined (principally mouse, rat and cow) largely resemble humans.  相似文献   

5.
6.
As shown in the accompanying paper (5), the oligonucleotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randomly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site.  相似文献   

7.
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.  相似文献   

8.
The 'effective number of codons' used in a gene   总被引:64,自引:0,他引:64  
F Wright 《Gene》1990,87(1):23-29
A simple measure is presented that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons. This measure of synonymous codon usage bias, the 'effective number of codons used in a gene', Nc, can be easily calculated from codon usage data alone, and is independent of gene length and amino acid (aa) composition. Nc can take values from 20, in the case of extreme bias where one codon is exclusively used for each aa, to 61 when the use of alternative synonymous codons is equally likely. Nc thus provides an intuitively meaningful measure of the extent of codon preference in a gene. Codon usage patterns across genes can be investigated by the Nc-plot: a plot of Nc vs. G + C content at synonymous sites. Nc-plots are produced for Homo sapiens, Saccharomyces cerevisiae, Escherichia coli, Bacillus subtilis, Dictyostelium discoideum, and Drosophila melanogaster. A FORTRAN77 program written to calculate Nc is available on request.  相似文献   

9.
基因表达水平与同义密码子使用关系的初步研究   总被引:3,自引:0,他引:3  
提出一个预测基因表达水平和同义密码子使用的自洽信息聚类方法。将同义密码子分成最适密码子、非最适密码子和稀有密码子,认为三者的使用频率是调控基因表达水平的主要因素。基于这一观点,对Ecoli和Yeast两类生物的基因表达水平和密码子的使用,用自洽信息聚类方法进行了预测。发现高低表达基因明显分开,基因表达水平被分为四级;甚高表达基因(VH)、高表达基因(H)、较低表达基因(LM)和低表达基因(LL);  相似文献   

10.
The occurrence of nucleotides of the 3' side of codons has been determined in highly and weakly expressed genes from Escherichia coli. It was found that the usage of some amino acid codons in highly expressed genes was site specific, depending on the base 3' to the codon. The role of the 3' nucleotide as a modulator of codon translation effectiveness is discussed. The rules of synonymous codon usage in relation to the 3' flanking nucleotide have been established for highly expressed genes. For example, if a triplet next to the lysine codon starts with guanosine, lysine is preferably encoded by AAA and not by AAG (P less than 10(-8), while of cytidine is 3' to the lysine codon, AAG is preferred over AAA (P less than 0.001). These rules are observed in highly and absent in weakly expressed mRNAs and can be used in the chemical synthesis of genes designed for expression in E. coli.  相似文献   

11.
A novel bias in codon third-letter usage was found in Escherichia coli genes with low fractions of "optimal codons", by comparing intact sequences with control random sequences. Third-letter usage has been found to be biased according to preference in codon usage and to doublet preference from the following first letter. The present study examines third-letter usage in the context of the nucleotide sequence when these preferences are considered. In order to exclude any influence by these factors, the random sequences were generated such that the amino acid sequence, codon usage, and the doublet frequency in each gene were all preserved. Comparison of intact sequences with these randomly generated sequences reveals that third letters of codons show a strong preference for the purine/pyrimidine pattern of the next codons: purine (R) is preferred to pyrimidine (Y) at the third site when followed by an R-Y-R codon, and pyrimidine is preferred when followed by an R-R-Y, an R-Y-Y or a Y-R-Y codon. This bias is probably related to interactions of tRNA molecules in the ribosome.  相似文献   

12.
It has often been suggested that differential usage of codons recognized by rare tRNA species, i.e. "rare codons", represents an evolutionary strategy to modulate gene expression. In particular, regulatory genes are reported to have an extraordinarily high frequency of rare codons. From E. coli we have compiled codon usage data for highly expressed genes, moderately/lowly expressed genes, and regulatory genes. We have identified a clear and general trend in codon usage bias, from the very high bias seen in very highly expressed genes and attributed to selection, to a rather low bias in other genes which seems to be more influenced by mutation than by selection. There is no clear tendency for an increased frequency of rare codons in the regulatory genes, compared to a large group of other moderately/lowly expressed genes with low codon bias. From this, as well as a consideration of evolutionary rates of regulatory genes, and of experimental data on translation rates, we conclude that the pattern of synonymous codon usage in regulatory genes reflects primarily the relaxation of natural selection.  相似文献   

13.
In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.  相似文献   

14.
Preferential usage of some minor codons in bacteria   总被引:2,自引:0,他引:2  
Ohno H  Sakai H  Washio T  Tomita M 《Gene》2001,276(1-2):107-115
In many bacterial species, such as Deinococcus radiodurans, Haemophilus influenzae, and Methanobacterium thermoautotrophicum, some minor codons are preferentially used near the initiation codon. Among these codons, there are some minor codons that have strong preference for the initiation site in the high codon adaptation index (CAI) group (comprising of highly expressed genes) rather than in the low CAI genes group (comprising of low expressing genes). In the present study, codon usage in the initiation site and in the rest of the gene was systematically compared in the 27 complete bacterial genomes and Saccharomyces cerevisiae genome. Furthermore, we classified genes into two groups according to the CAI values and conducted the same analysis for each of the two groups. Our results suggest a role for some minor codons in the initiation site of the regulating translation system in many bacteria. We have summarized codons that are preferentially used in the initiation site and probably play a role in regulating genes expression in these organisms.  相似文献   

15.
Codon usage in 87 602 genes has been calculated using the nucleotide sequence data obtained from the GenBank Genetic Sequence Data Bank (Release 90.0; September 1995). The database is called the CUTG Database; the complete form of the database can be obtained by anonymous ftp from DDBJ and a part of the database, which lists the frequency of codon use in each organism, is made searchable through our World Wide Web server.  相似文献   

16.
Many organisms exhibit biased codon usage in their genome, including the fungal model organism Neurospora crassa. The preferential use of subset of synonymous codons (optimal codons) at the macroevolutionary level is believed to result from a history of selection to promote translational efficiency. At present, few data are available about selection on optimal codons at the microevolutionary scale, that is, at the population level. Herein, we conducted a large-scale assessment of codon mutations at biallelic sites, spanning more than 5,100 genes, in 2 distinct populations of N. crassa: the Caribbean and Louisiana populations. Based on analysis of the frequency spectra of synonymous codon mutations at biallelic sites, we found that derived (nonancestral) optimal codon mutations segregate at a higher frequency than derived nonoptimal codon mutations in each population; this is consistent with natural selection favoring optimal codons. We also report that optimal codon variants were less frequent in longer genes and that the fixation of optimal codons was reduced in rapidly evolving long genes/proteins, trends suggestive of genetic hitchhiking (Hill-Robertson) altering codon usage variation. Notably, nonsynonymous codon mutations segregated at a lower frequency than synonymous nonoptimal codon mutations (which impair translational efficiency) in each N. crassa population, suggesting that changes in protein composition are more detrimental to fitness than mutations altering translation. Overall, the present data demonstrate that selection, and partly genetic interference, shapes codon variation across the genome in N. crassa populations.  相似文献   

17.
The Horizontal Gene Transfer DataBase (HGT-DB) is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The current version of the database contains 88 bacterial and archaeal complete genomes, including multiple chromosomes and strains. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content and lists of putatively acquired genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruities in sequence-based phylogenetic trees. A search engine that allows searches for gene names or keywords for a specific organism is also available. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT.  相似文献   

18.
Codon usage in higher plants, green algae, and cyanobacteria   总被引:3,自引:1,他引:2  
Codon usage is the selective and nonrandom use of synonymous codons by an organism to encode the amino acids in the genes for its proteins. During the last few years, a large number of plant genes have been cloned and sequenced, which now permits a meaningful comparison of codon usage in higher plants, algae, and cyanobacteria. For the nuclear and organellar genes of these organisms, a small set of preferred codons are used for encoding proteins. Codon usage is different for each genome type with the variation mainly occurring in choices between codons ending in cytidine (C) or guanosine (G) versus those ending in adenosine (A) or uridine (U). For organellar genomes, chloroplastic and mitochrondrial proteins are encoded mainly with codons ending in A or U. In most cyanobacteria and the nuclei of green algae, proteins are encoded preferentially with codons ending in C or G. Although only a few nuclear genes of higher plants have been sequenced, a clear distinction between Magnoliopsida (dicot) and Liliopsida (monocot) codon usage is evident. Dicot genes use a set of 44 preferred codons with a slight preference for codons ending in A or U. Monocot codon usage is more restricted with an average of 38 codons preferred, which are predominantly those ending in C or G. But two classes of genes can be recognized in monocots. One set of monocot genes uses codons similar to those in dicots, while the other genes are highly biased toward codons ending in C or G with a pattern similar to nuclear genes of green algae. Codon usage is discussed in relation to evolution of plants and prospects for intergenic transfer of particular genes.  相似文献   

19.
Studies on codon usage in Entamoeba histolytica   总被引:13,自引:0,他引:13  
Codon usage bias of Entamoeba histolytica, a protozoan parasite, was investigated using the available DNA sequence data. Entamoeba histolytica having AT rich genome, is expected to have A and/or T at the third position of codons. Overall codon usage data analysis indicates that A and/or T ending codons are strongly biased in the coding region of this organism. However, multivariate statistical analysis suggests that there is a single major trend in codon usage variation among the genes. The genes which are supposed to be highly expressed are clustered at one end, while the majority of the putatively lowly expressed genes are clustered at the other end. The codon usage pattern is distinctly different in these two sets of genes. C ending codons are significantly higher in the putatively highly expressed genes suggesting that C ending codons are translationally optimal in this organism. In the putatively lowly expressed genes A and/or T ending codons are predominant, which suggests that compositional constraints are playing the major role in shaping codon usage variation among the lowly expressed genes. These results suggest that both mutational bias and translational selection are operational in the codon usage variation in this organism.  相似文献   

20.
To study the possible codon usage and base composition variation in the bacteriophages, fourteen mycobacteriophages were used as a model system here and both the parameters in all these phages and their plating bacteria, M. smegmatis had been determined and compared. As all the organisms are GC-rich, the GC contents at third codon positions were found in fact higher than the second codon positions as well as the first + second codon positions in all the organisms indicating that directional mutational pressure is strongly operative at the synonymous third codon positions. Nc plot indicates that codon usage variation in all these organisms are governed by the forces other than compositional constraints. Correspondence analysis suggests that: (i) there are codon usage variation among the genes and genomes of the fourteen mycobacteriophages and M. smegmatis, i.e., codon usage patterns in the mycobacteriophages is phage-specific but not the M. smegmatis-specific; (ii) synonymous codon usage patterns of Barnyard, Che8, Che9d, and Omega are more similar than the rest mycobacteriophages and M. smegmatis; (iii) codon usage bias in the mycobacteriophages are mainly determined by mutational pressure; and (iv) the genes of comparatively GC rich genomes are more biased than the GC poor genomes. Translational selection in determining the codon usage variation in highly expressed genes can be invoked from the predominant occurrences of C ending codons in the highly expressed genes. Cluster analysis based on codon usage data also shows that there are two distinct branches for the fourteen mycobacteriophages and there is codon usage variation even among the phages of each branch.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号