首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
The genomes of homeothermic (warm-blooded) vertebrates are mosaic interspersions of homogeneously GC-rich and GC-poor regions (isochores). Evolution of genome compartmentalization and GC-rich isochores is hypothesized to reflect either selective advantages of an elevated GC content or chromosome location and mutational pressure associated with the timing of DNA replication in germ cells. To address the present controversy regarding the origins and maintenance of isochores in homeothermic vertebrates, newly obtained as well as published nucleotide sequences of the insulin and insulin-like growth factor (IGF) genes, members of a well-characterized gene family believed to have evolved by repeated duplication and divergence, were utilized to examine the evolution of base composition in nonconstrained (flanking) and weakly constrained (introns and fourfold degenerate sites) regions. A phylogeny derived from amino acid sequences supports a common evolutionary history for the insulin/IGF family genes. In cold- blooded vertebrates, insulin and the IGFs were similar in base composition. In contrast, insulin and IGF-II demonstrate dramatic increases in GC richness in mammals, but no such trend occurred in IGF- I. Base composition of the coding portions of the insulin and IGF genes across vertebrates correlated (r = 0.90) with that of the introns and flanking regions. The GC content of homologous introns differed dramatically between insulin/IGF-II and IGF-I genes in mammals but was similar to the GC level of noncoding regions in neighboring genes. Our findings suggest that the base composition of introns and flanking regions is determined by chromosomal location and the mutational pressure of the isochore in which the sequences are embedded. An elevated GC content at codon third positions in the insulin and the IGF genes may reflect selective constraints on the usage of synonymous codons.   相似文献   

2.
Synonymous codons are widely selected for various biological mechanisms in both prokaryotes and eukaryotes. Recent evidence suggests that microRNA (miRNA) function may affect synonymous codon choices near miRNA target sites. To better understand this, we perform genome-wide analysis on synonymous codon usage around miRNA target sites in four plant genomes. We observed a general trend of increased site accessibility around miRNA target sites in plants. Guanine-cytosine (GC)-poor codons are preferred in the flank region of miRNA target sites. Within-genome analyses show significant variation among miRNA targets in species. GC content of the target gene can partly explain the variation of site accessibility among miRNA targets. miRNA targets in GC-rich genes show stronger selection signals than those in GC-poor genes. Gene's codon usage bias and the conservation level of miRNA and its target also have some effects on site accessibility, but the expression level of miRNA or its target and the mechanism of miRNA activity do not contribute to site accessibility differences among miRNA targets. We suggest that synonymous codons near miRNA targets are selected for efficient miRNA binding and proper miRNA function. Our results present a new dimension of natural selection on synonymous codons near miRNA target sites in plants, which will have important implications of coding sequence evolution.  相似文献   

3.
伪狂犬病病毒基因编码区碱基组成与密码子使用偏差   总被引:6,自引:0,他引:6  
由于伪狂犬病病毒(PRV)中G C含量高达74%,至今尚没有一个毒株完成全基因组测序。对已知的68个PRV基因编码区序列碱基组成及密码子使用现象进行了统计分析,结果发现PRV基因中存在非常强的密码子使用偏差。所有68个PRV基因编码区密码子第三位总的G C含量为96.24%,其中UL48基因高达99.52%。PRV基因偏向于使用富含GC的密码子,特别是以C或G结尾的密码子。此外,还发现PRV中G C含量变化较大的UL48、UL40、UL14和IE180等基因附近正好与已知的PRV基因组复制起始区相对应。根据基因功能将PRV基因分为6类进行分析发现,基因功能相同或相近的基因其密码子使用模式相似,其中调节基因的同义密码子相对使用度(RSCU)与其他基因有显著差异,在调节基因中以C结尾的密码子的RSCU值远大于其他同义密码子。最后,对PRV基因氨基酸组成差异进行多元分析,发现不同功能的PRV基因在对应分析图上分布不同,表明PRV基因密码子使用模式可能与基因功能相关。  相似文献   

4.
Chen LL  Gao F 《The FEBS journal》2005,272(13):3328-3336
Eukaryotic genomes are composed of isochores, i.e. long sequences relatively homogeneous in GC content. In this paper, the isochore structure of Arabidopsis thaliana genome has been studied using a windowless technique based on the Z curve method and intuitive curves are drawn for all the five chromosomes. Using these curves, we can calculate the GC content at any resolution, even at the base level. It is observed that all the five chromosomes are composed of several GC-rich and AT-rich regions alternatively. Usually, these regions, named 'isochore-like regions', have large fluctuations in the GC content. Five isochores with little fluctuations are also observed. Detailed analyses have been performed for these isochores. A GC-rich 'isochore-like region' and a GC-isochore in chromosome II and IV, respectively, are the nucleolar organizer regions (NORs), and genes located in the two regions prefer to use GC-ending codons. Another GC-isochore located in chromosome II is a mitochondrial DNA insertion region, the position and size of this region is precisely predicted by the current method. The amino acid usage and codon preference of genes in this organellar-to-nuclear transfer region show significant difference from other regions. Moreover, the centromeres are located in GC-rich 'isochore-like regions' in all the five chromosomes. The current method can provide a useful tool for analyzing whole genomic sequences of eukaryotes.  相似文献   

5.
To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carryout its growth in the GC-rich host Pseudomonas aeruginosa,synonymous codon and amino acid usage bias ofPhiKZ was investigated and the data were compared with that of P.aeruginosa.It was found that synonymouscodon and amino acid usage of PhiKZ was distinct from that of P.aeruginosa.In contrast to P.aeruginosa,the third codon position of the synonymous codons of PhiKZ carries mostly A or T base;codon usage biasin PhiKZ is dictated mainly by mutational bias and,to a lesser extent,by translational selection.A clusteranalysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZis evolutionary much closer to Escherickia coli phage T4.Further analysis reveals that the three factors ofmean molecular weight,aromaticity and cysteine content are mostly responsible for the variation of aminoacid usage in PhiKZ proteins,whereas amino acid usage of P.aeruginosa proteins is mainly governed bygrand average of hydropathicity,aromaticity and cysteine content.Based on these observations,we suggestthat codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residuesinto their proteins during translation,thereby economizing the cost of its development in GC-rich P.aeruginosa.  相似文献   

6.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

7.
Tan HW  Liu GH  Dong X  Lin RQ  Song HQ  Huang SY  Yuan ZG  Zhao GH  Zhu XQ 《PloS one》2011,6(8):e23008
In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Apis cerana, the Asiatic cavity-nesting honeybee. We present here an analysis of features of its gene content and genome organization in comparison with Apis mellifera to assess the variation within the genus Apis and among main groups of Hymenoptera. The size of the entire mt genome of A. cerana is 15,895 bp, containing 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA (tRNA) genes and one control region. These genes are transcribed from both strands and have a nucleotide composition high in A and T. The contents of A+T of the complete genomes are 83.96% for A. cerana. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. There are a total of 3672 codons in all 13 protein-coding genes, excluding termination codons. The most frequently used amino acid is Leu (15.52%), followed by Ile (12.85%), Phe (10.10%), Ser (9.15%) and Met (8.96%). Intergenic regions in the mt genome of A. cerana are 705 bp in total. The order and orientation of the gene arrangement pattern is identical to that of A. mellifera, except for the position of the tRNA-Ser(AGN) gene. Phylogenetic analyses using concatenated amino acid sequences of 13 protein-coding genes, with three different computational algorithms (NJ, MP and ML), all revealed two distinct groups with high statistical support, indicating that A. cerana and A. mellifera are two separate species, consistent with results of previous morphological and molecular studies. The complete mtDNA sequence of A. cerana provides additional genetic markers for studying population genetics, systematics and phylogeographics of honeybees.  相似文献   

8.
Lightfield J  Fram NR  Ely B 《PloS one》2011,6(3):e17677
The GC content of bacterial genomes ranges from 16% to 75% and wide ranges of genomic GC content are observed within many bacterial phyla, including both gram negative and gram positive phyla. Thus, divergent genomic GC content has evolved repeatedly in widely separated bacterial taxa. Since genomic GC content influences codon usage, we examined codon usage patterns and predicted protein amino acid content as a function of genomic GC content within eight different phyla or classes of bacteria. We found that similar patterns of codon usage and protein amino acid content have evolved independently in all eight groups of bacteria. For example, in each group, use of amino acids encoded by GC-rich codons increased by approximately 1% for each 10% increase in genomic GC content, while the use of amino acids encoded by AT-rich codons decreased by a similar amount. This consistency within every phylum and class studied led us to conclude that GC content appears to be the primary determinant of the codon and amino acid usage patterns observed in bacterial genomes. These results also indicate that selection for translational efficiency of highly expressed genes is constrained by the genomic parameters associated with the GC content of the host genome.  相似文献   

9.
Liu Q  Feng Y  Xue Q 《Mitochondrion》2004,4(4):313-320
In this paper, the main factors shaping codon usage in the mitochondrion genome of rice were reported. Correspondence analysis, a commonly used multivariate statistical approach, was carried out to analyze synonymous codon usage bias. The results showed that the main trend was strongly correlated with the gene expression level assessed by the 'Codon Adaptation Index' value, a result that was confirmed by the distribution of genes along the first axis. From the results that there were two significant correlations between axis 1 coordinates and the GC, GC3s content at silent sites of each sequence, and clearly significant correlations between the 'Effective Number of Codons' values and GC, GC3s content, we inferred that codon usage bias was affected by gene nucleotide composition also. In addition, the hydrophobicity of each protein also played some roles in shaping codon usage in this organelle, which could be confirmed by the significant correlation between the positions of genes placed on the first axis and the hydrophobicity value of each protein. In summary, natural selection played a crucial role, nucleotide mutational bias and amino acid composition only in a minor way, in shaping codon usage in the mitochondrion genome of rice. Notably, 21 codons defined firstly as 'optimal codons' might provide some more useful information for gene engineering and/or evolution studying.  相似文献   

10.
Pesole G  Bernardi G  Saccone C 《FEBS letters》1999,464(1-2):60-62
The efficiency of AUG start codon recognition in translation initiation is modulated by its sequence context. Here we investigated a non-redundant set of 5914 human genes and show that this context is different in genes located in different isochores. In particular, of the two main consensus start sequences, RCCaugR is five-fold more represented than AARaugR in genes from the GC-rich H3 isochores compared to genes from the GC-poor L isochores. Furthermore, genes located in GC-rich isochores have shorter 5' UTRs and stronger avoidance of upstream AUG than genes located in GC-poor isochores. This suggests that genes requiring highly efficient translation are located in GC-rich isochores and genes requiring fine modulation of expression are located in GC-poor isochores. This is in agreement with independent data from the literature concerning the location of housekeeping and tissue-specific genes, respectively.  相似文献   

11.
A functional significance for codon third bases   总被引:9,自引:0,他引:9  
Epstein RJ  Lin K  Tan TW 《Gene》2000,245(2):291-298
  相似文献   

12.
Sau K  Gupta SK  Sau S  Mandal SC  Ghosh TC 《Bio Systems》2006,85(2):107-113
Synonymous codon and amino acid usage biases have been investigated in 903 Mimivirus protein-coding genes in order to understand the architecture and evolution of Mimivirus genome. As expected for an AT-rich genome, third codon positions of the synonymous codons of Mimivirus carry mostly A or T bases. It was found that codon usage bias in Mimivirus genes is dictated both by mutational pressure and translational selection. Evidences show that four factors such as mean molecular weight (MMW), hydropathy, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in Mimivirus proteins. Based on our observation, we suggest that genes involved in translation, DNA repair, protein folding, etc., have been laterally transferred to Mimivirus a long ago from living organism and with time these genes acquire the codon usage pattern of other Mimivirus genes under selection pressure.  相似文献   

13.
Pavlícek A  Jabbari K  Paces J  Paces V  Hejnar JV  Bernardi G 《Gene》2001,276(1-2):39-45
Alus and LINEs (LINE1) are widespread classes of repeats that are very unevenly distributed in the human genome. The majority of GC-poor LINEs reside in the GC-poor isochores whereas GC-rich Alus are mostly present in GC-rich isochores. The discovery that LINES and Alus share similar target site duplication and a common AT-rich insertion site specificity raised the question as to why these two families of repeats show such a different distribution in the genome. This problem was investigated here by studying the isochore distributions of subfamilies of LINES and Alus characterized by different degrees of divergence from the consensus sequences, and of Alus, LINEs and pseudogenes located on chromosomes 21 and 22. Young Alus are more frequent in the GC-poor part of the genome than old Alus. This suggests that the gradual accumulation of Alus in GC-rich isochores has occurred because of their higher stability in compositionally matching chromosomal regions. Densities of Alus and LINEs increase and decrease, respectively, with increasing GC levels, except for the telomeric regions of the analyzed chromosomes. In addition to LINEs, processed pseudogenes are also more frequent in GC-poor isochores. Finally, the present results on Alu and LINE stability/exclusion predict significant losses of Alu DNA from the GC-poor isochores during evolution, a phenomenon apparently due to negative selection against sequences that differ from the isochore composition.  相似文献   

14.
Gu W  Zhou T  Ma J  Sun X  Lu Z 《Bio Systems》2004,73(2):89-97
The role of silent position in the codon on the protein structure is an interesting and yet unclear problem. In this paper, 563 Homo sapiens genes and 417 Escherichia coli genes coding for proteins with four different folding types have been analyzed using variance analysis, a multivariate analysis method newly used in codon usage analysis, to find the correlation between amino acid composition, synonymous codon, and protein structure in different organisms. It has been found that in E. coli, both amino acid compositions in differently folded proteins and synonymous codon usage in different gene classes coding for differently folded proteins are significantly different. It was also found that only amino acid composition is different in different protein classes in H. sapiens. There is no universal correlation between synonymous codon usage and protein structure in these two different organisms. Further analysis has shown that GC content on the second codon position can distinguish coding genes for different folded proteins in both organisms.  相似文献   

15.
The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study.We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content(G+C),GC content in first,second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions,levels of arginine codon usage(Arg2:AGA/G;Arg4:CGX),levels of amino acid usage and the entropy of amino acid content distribution.In archaeal genomes with strong GC pressure,arginine is coded preferably by GC-rich Arg4 codons,whereas in most of archaeal genomes with G+C0.6,arginine is coded preferably by AT-rich Arg2 codons.In the genome of Haloquadratum walsbyi,which is closely related to GC-rich archaea,GC content has decreased mostly in third codon positions,while Arg4Arg2 bias still persists.Proteomes of archaeal species carry characteristic amino acid biases:levels of isoleucine and lysine are elevated,while levels of alanine,histidine,glutamine and cytosine are relatively decreased.Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.  相似文献   

16.
Romero H  Zavala A  Musto H 《Gene》2000,242(1-2):307-311
It is widely accepted that the compositional pressure is the only factor shaping codon usage in unicellular species displaying extremely biased genomic compositions. This seems to be the case in the prokaryotes Mycoplasma capricolum, Rickettsia prowasekii and Borrelia burgdorferi (GC-poor), and in Micrococcus luteus (GC-rich). However, in the GC-poor unicellular eukaryotes Dictyostelium discoideum and Plasmodium falciparum, there is evidence that selection, acting at the level of translation, influences codon choices. This is a twofold intriguing finding, since (1) the genomic GC levels of the above mentioned eukaryotes are lower than the GC% of any studied bacteria, and (2) bacteria usually have larger effective population sizes than eukaryotes, and hence natural selection is expected to overcome more efficiently the randomizing effects of genetic drift among prokaryotes than among eukaryotes. In order to gain a new insight about this problem, we analysed the patterns of codon preferences of the nuclear genes of Entamoeba histolytica, a unicellular eukaryote characterised by an extremely AT-rich genome (GC = 25%). The overall codon usage is strongly biased towards A and T in the third codon positions, and among the presumed highly expressed sequences, there is an increased relative usage of a subset of codons, many of which are C-ending. Since an increase in C in third codon positions is 'against' the compositional bias, we conclude that codon usage in E. histolytica, as happens in D. discoideum and P. falciparum, is the result of an equilibrium between compositional pressure and selection. These findings raise the question of why strongly compositionally biased eukaryotic cells may be more sensitive to the (presumed) slight differences among synonymous codons than compositionally biased bacteria.  相似文献   

17.
Analysis of synonymous codon usage pattern in the genome of a thermophilic cyanobacterium, Thermosynechococcus elongatus BP-1 using multivariate statistical analysis revealed a single major explanatory axis accounting for codon usage variation in the organism. This axis is correlated with the GC content at third base of synonymous codons (GC3s) in correspondence analysis taking T. elongatus genes. A negative correlation was observed between effective number of codons i.e. Nc and GC3s. Results suggested a mutational bias as the major factor in shaping codon usage in this cyanobacterium. In comparison to the lowly expressed genes, highly expressed genes of this organism possess significantly higher proportion of pyrimidine-ending codons suggesting that besides, mutational bias, translational selection also influenced codon usage variation in T. elongatus. Correspondence analysis of relative synonymous codon usage (RSCU) with A, T, G, C at third positions (A3s, T3s, G3s, C3s, respectively) also supported this fact and expression levels of genes and gene length also influenced codon usage. A role of translational accuracy was identified in dictating the codon usage variation of this genome. Results indicated that although mutational bias is the major factor in shaping codon usage in T. elongatus, factors like translational selection, translational accuracy and gene expression level also influenced codon usage variation.  相似文献   

18.
Vanishing GC-rich isochores in mammalian genomes   总被引:25,自引:0,他引:25  
Duret L  Semon M  Piganeau G  Mouchiroud D  Galtier N 《Genetics》2002,162(4):1837-1847
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals.  相似文献   

19.
Palidwor GA  Perkins TJ  Xia X 《PloS one》2010,5(10):e13431

Background

In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns.

Principal Findings

In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively.

Conclusions

The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.  相似文献   

20.
Since base composition of translational stop codons (TAG, TAA, and TGA) is biased toward a low G+C content, a differential density for these termination signals is expected in random DNA sequences of different base compositions. The expected length of reading frames (DNA segments of sense codons flanked by in-phase stop codons) in random sequences is thus a function of GC content. The analysis of DNA sequences from several genome databases stratified according to GC content reveals that the longest coding sequences—exons in vertebrates and genes in prokaryotes—are GC-rich, while the shortest ones are GC-poor. Exon lengthening in GC-rich vertebrate regions does not result, however, in longer vertebrate proteins, perhaps because of the lower number of exons in the genes located in these regions. The effects on coding-sequence lengths constitute a new evolutionary meaning for compositional variations in DNA GC content. Correspondence to: J. L. Oliver  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号