首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.  相似文献   

2.
Comparative analysis of metabolic pathways among widely diverse species provides an excellent opportunity to extract information about the functional relation of organisms and pentose phosphate pathway exemplifies one such pathway. A comparative codon usage analysis of the pentose phosphate pathway genes of a diverse group of organisms representing different niches and the related factors affecting codon usage with special reference to the major forces influencing codon usage patterns was carried out. It was observed that organism specific codon usage bias percolates into vital metabolic pathway genes irrespective of their near universality. A clear distinction in the codon usage pattern of gram positive and gram negative bacteria, which is a major classification criterion for bacteria, in terms of pentose phosphate pathway was an important observation of this study. The codon utilization scheme in all the organisms indicates the presence of translation selection as a major force in shaping codon usage. Another key observation was the segregation of the H. sapiens genes as a separate cluster by correspondence analysis, which is primarily attributed to the different codon usage pattern in this genus along with its longer gene lengths. We have also analyzed the amino acid distribution comparison of transketolase protein primary structures among all the organisms and found that there is a certain degree of predictability in the composition profile except in A. fumigatus and H. sapiens, where few exceptions are prominent. In A. fumigatus, a human pathogen responsible for invasive aspergillosis, a significantly different codon usage pattern, which finally translated into its amino acid composition model portraying a unique profile in a key pentose phosphate pathway enzyme transketolase was observed.  相似文献   

3.

Introduction

Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates.

Results

We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB.

Conclusion

Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.  相似文献   

4.
Suzuki H  Saito R  Tomita M 《FEBS letters》2005,579(28):6499-6504
Multivariate analyses are often used to identify major trends of variation in synonymous codon usage among genes. These analyses need to be performed on properly normalized codon usage data to avoid biases masking this synonymous variation, i.e., gene length, amino acid usage, and codon degeneracy; however, previous studies have failed to do so. In this paper, we demonstrate that the use of alternative normalized data (called 'relative adaptiveness' in the literature) can avoid all these biases and furthermore, can identify more trends of variation among genes, including GC-ending codon usage, GT-ending codon usage, and gene expression level.  相似文献   

5.
Most prokaryotic genomes display strand compositional asymmetries, but the reasons for these biases remain unclear. When the distribution of gene orientation is biased, as it often is, this may induce a bias in composition, as codon frequencies are not identical. We show here that this effect can be estimated and removed, and that the residual base skews are the highest at third base codon positions and lower at first and second positions. This strongly suggests that compositional asymmetries result from 1) a replication-related mutational bias that is filtered through selective pressure and/or from 2) an uneven distribution of gene orientation. In most cases, the mutational bias alters the codon usage and amino acid frequencies of the leading and the lagging strand. However, these features are not ubiquitous amongst prokaryotes, and the biological reasons for them remain to be found.  相似文献   

6.
Despite the degeneracy of the genetic code, whereby different codons encode the same amino acid, alternative codons and amino acids are utilized nonrandomly within and between genomes. Such biases in codon and amino acid usage have been demonstrated extensively in prokaryote genomes and likely reflect a balance between the action of mutation, selection, and genetic drift. Here, we quantify the effects of selection and mutation drift as causes of codon and amino acid-usage bias in a large collection of nematode partial genomes from 37 species spanning approximately 700 Myr of evolution, as inferred from expressed sequence tag (EST) measures of gene expression and from base composition variation. Average G + C content at silent sites among these taxa ranges from 10% to 63%, and EST counts range more than 100-fold, underlying marked differences between the identities of major codons and optimal codons for a given species as well as influencing patterns of amino acid abundance among taxa. Few species in our sample demonstrate a dominant role of selection in shaping intragenomic codon-usage biases, and these are principally free living rather than parasitic nematodes. This suggests that deviations in effective population size among species, with small effective sizes among parasites, are partly responsible for species differences in the extent to which selection shapes patterns of codon usage. Nevertheless, a consensus set of optimal codons emerges that is common to most taxa, indicating that, with some notable exceptions, selection for translational efficiency and accuracy favors similar sets of codons regardless of the major codon-usage trends defined by base compositional properties of individual nematode genomes.  相似文献   

7.
Highly expressed genes in any species differ in the usage frequency of synonymous codons. The relative recurrence of an event of the favored codon pair (amino acid pairs) varies between gene and genomes due to varying gene expression and different base composition. Here we propose a new measure for predicting the gene expression level, i.e., codon plus amino bias index (CABI). Our approach is based on the relative bias of the favored codon pair inclination among the genes, illustrated by analyzing the CABI score of the Medicago truncatula genes. CABI showed strong correlation with all other widely used measures (CAI, RCBS, SCUO) for gene expression analysis. Surprisingly, CABI outperforms all other measures by showing better correlation with the wet-lab data. This emphasizes the importance of the neighboring codons of the favored codon in a synonymous group while estimating the expression level of a gene.  相似文献   

8.
G蛋白偶联受体是非常重要的信号分子受体,其功能失调会导致许多疾病的产生。在前期工作的基础上,作者将序列特征分析与支持向量机技术结合起来,通过分析序列的特征差异,对G蛋白偶联受体分子及其类型进行识别。首次提取了G蛋白偶联受体对应的mRNA序列的绝对密码子使用频率作为特征,这主要因为它既包含了基因密码子使用偏性的信息,也包含了基因所编码蛋白的氨基酸组成信息。结果显示:在G蛋白偶联受体序列及其类型预测的问题中,设计支持向量机分类器时,最好选择使用包含基因序列绝对密码子使用频率和蛋白序列双联氨基酸使用频率两部分信息的组合特征作为特征,同时采用径向基核作为核函数。  相似文献   

9.
10.
Codon usage bias (CUB) is an omnipresent phenomenon, which occurs in nearly all organisms. Previous studies of codon bias in Plasmodium species were based on a limited dataset. This study uses whole genome datasets for comparative genome analysis of six Plasmodium species using CUB and other related methods for the first time. Codon usage bias, compositional variation in translated amino acid frequency, effective number of codons and optimal codons are analyzed for P.falciparum, P.vivax, P.knowlesi, P.berghei, P.chabaudii and P.yoelli. A plot of effective number of codons versus GC3 shows their differential codon usage pattern arises due to a combination of mutational and translational selection pressure. The increased relative usage of adenine and thymine ending optimal codons in highly expressed genes of P.falciparum is the result of higher composition biased pressure, and usage of guanine and cytosine bases at third codon position can be explained by translational selection pressure acting on them. While higher usage of adenine and thymine bases at third codon position in optimal codons of P.vivax highlights the role of translational selection pressure apart from composition biased mutation pressure in shaping their codon usage pattern. The frequency of those amino acids that are encoded by AT ending codons are significantly high in P.falciparum due to action of high composition biased mutational pressure compared with other Plasmodium species. The CUB variation in the three rodent parasites, P.berghei, P.chabaudii and P.yoelli is strikingly similar to that of P.falciparum. The simian and human malarial parasite, P.knowlesi shows a variation in codon usage bias similar to P.vivax but on closer study there are differences confirmed by the method of Principal Component Analysis (PCA).

Abbreviations

CDS - Coding sequences, GC1 - GC composition at first site of codon, GC2 - GC composition at second site of codon, GC3 - GC composition at third site of codon, Ala - Alanine, Arg - Arginine, Asn - Asparagine, Asp - Aspartic acid, Cys - Cysteine, Gln - Glutamine Glu - Glutamic acid Gly - Glycine His - Histidine Ile - Isoleucine Leu - Leucine Lys - Lysine Met - Methionine Phe - Phenylalanine Pro - Proline Ser - Serine Thr - Threonine Trp - Tryptophan Tyr - Tyrosine Val - Valine.  相似文献   

11.
Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups–a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage–utilizing linguistic analyses of word frequency in language and text–identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function.  相似文献   

12.
Fungal xylanases has important applications in food, baking, pulp and paper industries in addition to various other industries. Xylanases are produced extensively by both bacterial and fungal sources and has tremendous potential of being active at extremes of temperature and pH. In the present study an effort has been made to explore the codon bias perspective of this potential enzyme using bioinformatics tools. Multivariate analysis has been used as a tool to study codon bias perspectives of xylanases. It was further observed that the codon usage of xylanases genes from different fungal sources is not similar and to reveal this phenomenon the relative synonymous codon usage (RSCU) and base composition variation in fungal xylanase genes were also studied. The codon biasing data like GC content at third position (GC3S), effective codon number (NC), codon adaptive index (CAI) were further analyzed with statistical softwares like Sigma1plot 9.0 and Systat 11.0. Furthermore, study of translation selection was also performed to verify the influences of codon usage variation among the 94 xylanase genes. In the present study xylanase gene from 12 organisms were analyzed and codon usages of all xylanases from each organism were compared separately. Analysis indicates biased codon among all 12 fungi taken for study with Aspergillus nidulans, Chaetomium globosum, Aspergillus terreus and Aspergillus clavatus showing maximum biasing. NC plot and correspondence analysis on relative synonymous codon usage indicate that mutation bias and translation selection influences codon usage variation in fungal xylanase gene. To reveal the relative synonymous codon usage and base composition variation in xylanase, 94 genes from 12 fungi were used as model system.  相似文献   

13.
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.  相似文献   

14.
15.
Mitochondria often use genetic codes different from the standard genetic code. Now that many mitochondrial genomes have been sequenced, these variant codes provide the first opportunity to examine empirically the processes that produce new genetic codes. The key question is: Are codon reassignments the sole result of mutation and genetic drift? Or are they the result of natural selection? Here we present an analysis of 24 phylogenetically independent codon reassignments in mitochondria. Although the mutation-drift hypothesis can explain reassignments from stop to an amino acid, we found that it cannot explain reassignments from one amino acid to another. In particular—and contrary to the predictions of the mutation-drift hypothesis—the codon involved in such a reassignment was not rare in the ancestral genome. Instead, such reassignments appear to take place while the codon is in use at an appreciable frequency. Moreover, the comparison of inferred amino acid usage in the ancestral genome with the neutral expectation shows that the amino acid gaining the codon was selectively favored over the amino acid losing the codon. These results are consistent with a simple model of weak selection on the amino acid composition of proteins in which codon reassignments are selected because they compensate for multiple slightly deleterious mutations throughout the mitochondrial genome. We propose that the selection pressure is for reduced protein synthesis cost: most reassignments give amino acids that are less expensive to synthesize. Taken together, our results strongly suggest that mitochondrial genetic codes evolve to match the amino acid requirements of proteins.  相似文献   

16.
Synonymous codon usage variation among Giardia lamblia genes and isolates.   总被引:3,自引:0,他引:3  
The pattern of codon usage in the amitochondriate diplomonad Giardia lamblia has been investigated. Very extensive heterogeneity was evident among a sample of 65 genes. A discrete group of genes featured unusual codon usage due to the amino acid composition of their products: these variant surface proteins (VSPs) are unusually rich in Cys and, to a lesser extent, Gly and Thr. Among the remaining 50 genes, correspondence analysis revealed a single major source of variation in synonymous codon usage. This trend was related to the extent of use of a particular subset of 21 codons which are inferred to be those which are optimal for translation; at one end of this trend were genes expected to be expressed at low levels with near random codon usage, while at the other extreme were genes expressed at high levels in which these optimal codons are used almost exclusively. These optimal codons all end in C or G so G + C content at silent sites varies enormously among genes, from values around 40%, expected to reflect the background level of the genome, up to nearly 100%. Although VSP genes are occasionally extremely highly expressed, they do not, in general, have high frequencies of optimal codons, presumably because their high expression is only intermittent. These results indicate that natural selection has been very effective in shaping codon usage in G. lamblia. These analyses focused on sequences from strains placed within G. lamblia "assemblage A"; a few sequences from other strains revealed extensive divergence at silent sites, including some divergence in the pattern of codon usage.  相似文献   

17.
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.  相似文献   

18.
The present study has been aimed to the comparative analysis of high GC composition containing Corynebacterium genomes and their evolutionary study by exploring codon and amino acid usage patterns. Phylogenetic study by MLSA approach, indel analysis and BLAST matrix differentiated Corynebacterium species in pathogenic and non-pathogenic clusters. Correspondence analysis on synonymous codon usage reveals that, gene length, optimal codon frequencies and tRNA abundance affect the gene expression of Corynebacterium. Most of the optimal codons as well as translationally optimal codons are C ending i.e. RNY (R-purine, N-any nucleotide base, and Y-pyrimidine) and reveal translational selection pressure on codon bias of Corynebacterium. Amino acid usage is affected by hydrophobicity, aromaticity, protein energy cost, etc. Highly expressed genes followed the cost minimization hypothesis and are less diverged at their synonymous positions of codons. Functional analysis of core genes shows significant difference in pathogenic and non-pathogenic Corynebacterium. The study reveals close relationship between non-pathogenic and opportunistic pathogenic Corynebaterium as well as between molecular evolution and survival niches of the organism.  相似文献   

19.
The extent to which base composition and codon usage vary among RNA viruses, and the possible causes of this bias, is undetermined in most cases. A maximum-likelihood statistical method was used to test whether base composition and codon usage bias covary with arthropod association in the genus Flavivirus, a major source of disease in humans and animals. Flaviviruses are transmitted by mosquitoes, by ticks, or directly between vertebrate hosts. Those viruses associated with ticks were found to have a significantly lower G+C content than non-vector-borne flaviviruses and this difference was present throughout the genome at all amino acids and codon positions. In contrast, mosquito-borne viruses had an intermediate G+C content which was not significantly different from those of the other two groups. In addition, biases in dinucleotide and codon usage that were independent of base composition were detected in all flaviviruses, but these did not covary with arthropod association. However, the overall effect of these biases was slight, suggesting only weak selection at synonymous sites. A preliminary analysis of base composition, codon usage, and vector specificity in other RNA virus families also revealed a possible association between base composition and vector specificity, although with biases different from those seen in the Flavivirus genus. Received: 29 August 2000 / Accepted: 19 December 2000  相似文献   

20.
Singer GA  Hickey DA 《Gene》2003,317(1-2):39-47
A number of recent studies have shown that thermophilic prokaryotes have distinguishable patterns of both synonymous codon usage and amino acid composition, indicating the action of natural selection related to thermophily. On the other hand, several other studies of whole genomes have illustrated that nucleotide bias can have dramatic effects on synonymous codon usage and also on the amino acid composition of the encoded proteins. This raises the possibility that the thermophile-specific patterns observed at both the codon and protein levels are merely reflections of a single underlying effect at the level of nucleotide composition. Moreover, such an effect at the nucleotide level might be due entirely to mutational bias. In this study, we have compared the genomes of thermophiles and mesophiles at three levels: nucleotide content, codon usage and amino acid composition. Our results indicate that the genomes of thermophiles are distinguishable from mesophiles at all three levels and that the codon and amino acid frequency differences cannot be explained simply by the patterns of nucleotide composition. At the nucleotide level, we see a consistent tendency for the frequency of adenine to increase at all codon positions within the thermophiles. Thermophiles are also distinguished by their pattern of synonymous codon usage for several amino acids, particularly arginine and isoleucine. At the protein level, the most dramatic effect is a two-fold decrease in the frequency of glutamine residues among thermophiles. These results indicate that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting (i) mRNA thermostability, (ii) stability of codon-anticodon interactions and (iii) increased thermostability of the protein products. We conclude that elevated growth temperature imposes selective constraints at all three molecular levels: nucleotide content, codon usage and amino acid composition. In addition to these multiple selective effects, however, the genomes of both thermophiles and mesophiles are often subject to superimposed large changes in composition due to mutational bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号