首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

The third codon positions are generally thought to be largely neutral, allowing for synonymous mutations. To see how much the third positions are loaded in general we analyzed the sequences of the nucleotides in the third positions. Simple word count analysis revealed excessive clustering of pyrimidines in the third position sequences of prokaryotic mRNA. The clusters have a clear tendency to follow one after another at characteristic distance of 25–30 triplets. Thus, the third codon positions do carry a rather strong message. Possible connection with loop-fold structure of proteins and (cotranslational) protein folding is discussed.  相似文献   

2.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

3.
G D'Onofrio  G Bernardi 《Gene》1992,110(1):81-88
We have investigated the compositional distributions of third codon positions of genes from the 16 prokaryotes and seven eukaryotes for which the largest numbers of coding sequences are available in data banks. In prokaryotes, both narrow and broad distributions were found. In eukaryotes, distributions were very broad (except for Saccharomyces cerevisiae) and remarkably different for different genomes. In low-GC genomes, third codon positions were lower in GC than first + second codon positions and trailed towards high GC; the opposite situation was found for high-GC genomes. In all genomes, first codon positions were higher in GC than second codon positions. We then investigated the compositional correlations between third and first + second codon positions in prokaryotic genomes (the 16 mentioned above plus 87 additional ones) and in genome compartments of eukaryotes. A general, common relationship was found, which also holds within the same (heterogeneous) genomes. This universal correlation is due to the fact that the relative effects of compositional constraints on different codon positions are the same, on the average, whatever the genome under consideration.  相似文献   

4.
Summary We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unity slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.  相似文献   

5.
Compositional distributions in the three codon positions of the coding sequences of 12 fully sequenced prokaryotic genomes, which are publicly available, were investigated. A universal compositional correlation was observed in most of the genomes under investigation irrespective of their overall genomic GC contents. In all the genomes, the GC contents at the first codon positions are always greater than the overall GC contents of the genomes whereas the reverse is true in the case of second codon positions. GC contents at the third codon positions are higher than the overall genomic GC contents in high GC containing genomes, and the opposite situation was found in case of low GC genomes except for Helicobacter pylori. In high-GC rich genomes, the GC contents at the first + second codon positions are less than the GC contents at the third codon positions, and they are low in low-GC genomes except for Helicobacter pylori. The distributions of four bases at the three different positions were also investigated for all 12 organisms. It was observed that in high-GC genomes G is the most dominant base and in low-GC genomes A is the most dominant base in the first codon positions. But purine bases, i.e., (A + G), predominantly occur in the first codon position. In the second codon position, A is the most dominant base in most of the organisms and G is the least dominant base in all the organisms. There is no unique regular pattern of individual bases at the third codon positions; however, there are significant differences in the occurrences of (G + C) contents in the third codon positions among the different organisms. Calculations of dinucleotide frequencies in 12 different organisms indicate that in GC-rich genomes GG, GC, CC, and CG dinucleotides are the most dominant whereas the reverse is true in case of low-GC genomes. Biological implications of these results are discussed in this paper.  相似文献   

6.
Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC levels among the three codon positions is I>II>III as observed in other extremely high AT rich organisms. B. aphidicola being an AT rich organism is expected to have A and/or T at the third positions of codons. Overall codon usage analyses indicate that A and/or T ending codons are predominant in this organism and some particular amino acids are abundant in the coding region of genes. However, multivariate statistical analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the GC contents at the third synonymous positions of codons, and the other being associated with the expression level of genes. Moreover, codon usage biases of the highly expressed genes are almost identical with the overall codon usage biases of all the genes of this organism. These observations suggest that mutational bias is the main factor in determining the codon usage variation among the genes in B. aphidicola.  相似文献   

7.
The frequencies of occurrence of four bases in the first, second and third codon positions and in the total coding sequences have been calculated by the codon usage table published in 1990 by Ikemura et al. The distribution of frequencies are further analysed in detail by a graphic technique presented recently by us. Formulas expressing the frequencies of four bases in the first and second codon positions in terms of frequencies of amino acids have been given. It is shown by the graphic analysis that for 90 species, in the first codon position the purine bases are dominant and in most cases G is the most dominant base. In the second codon position A is the most dominant base, while G is the least dominant base. In the third codon position the G + C content varies from 0.1 to 0.9, keeping the A + C content equal to 1/2 and G content equal to that of C, approximately. If the frequencies for bases A, C, G and U in the total coding sequences are denoted by a, c, g and u, respectively, it is found that the unequal formula: a2 + c2 + g2 + u2 less than 1/3, is valid for each of the 90 species including the human and E.coli etc.  相似文献   

8.
It is well known that due to the degeneracy of genetic code, most of the silent substitutions appear in the third codon position, so the mutation frequency of the third codon position is much higher than that of the first two positions. However, it remains unknown whether the directionality of point mutation in three codon positions is similar or not. In this paper, through analyzing 15 sets of orthologous genes, it is revealed that most of the substitution types are significantly different between any two codon positions, especially between the 2nd and the 3rd phases. Furthermore, the average frequencies of each type of substitution calculated from the fifteen sets of orthologous genes are similar to those identified in single nucleotide polymorphisms (SNPs) of human and mouse genome. The present analyses suggest that the nucleotide substitution in protein-coding sequences is not only context-dependent (so called neighboring-nucleotide effects), but also phase-dependent, which is of significance to improving the prevalent nucleotide-evolution models.  相似文献   

9.
Translational selection on codon usage in Xenopus laevis   总被引:2,自引:0,他引:2  
A correspondence analysis of codon usage in Xenopus laevis revealed that the first axis is strongly correlated with the base composition at third codon positions. The second axis discriminates between putatively highly expressed genes and the other coding sequences, with expression levels being confirmed by the analysis of Expressed sequence tag frequencies. The comparison of codon usage of the sequences displaying the extreme values on the second axis indicates that several codons are statistically more frequent among the highly expressed (mainly housekeeping) genes. Translational selection appears, therefore, to influence synonymous codon usage in Xenopus.  相似文献   

10.
The complete sequence of honeybee (Apis mellifera) mitochondrial DNA is reported being 16,343 bp long in the strain sequenced. Relative to their positions in the Drosophila map, 11 of the tRNA genes are in altered positions, but the other genes and regions are in the same relative positions. Comparisons of the predicted protein sequences indicate that the honeybee mitochondrial genetic code is the same as that for Drosophila; but the anticodons of two tRNAs differ between these two insects. The base composition shows extreme bias, being 84.9% AT (cf. 78.6% in Drosophila yakuba). In protein-encoding genes, the AT bias is strongest at the third codon positions (which in some cases lack guanines altogether), and least in second codon positions. Multiple stepwise regression analysis of the predicted products of the protein-encoding genes shows a significant association between the numbers of occurrences of amino acids and %T in codon family, but not with the number of codons per codon family or other parameters associated with codon family base composition. Differences in amino acid abundances are apparent between the predicted Apis and Drosophila proteins, with a relative abundance in the Apis proteins of lysine and a relative deficiency of alanine. Drosophila alanine residues are as often replaced by serine as conserved in Apis. The differences in abundances between Drosophila and Apis are associated with %AT in the codon families, and the degree of divergence in amino acid composition between proteins correlates with the divergence in %AT at the second codon positions. Overall, transversions are about twice as abundant as transitions when comparing Drosophila and Apis protein-encoding genes, but this ratio varies between codon positions. Marked excesses of transitions over chance expectation are seen for the third positions of protein-coding genes and for the gene for the small subunit of ribosomal RNA. For the third codon positions the excess of transitions is adequately explained as due to the restriction of observable substitutions to transitions for conserved amino acids with two-codon families; the excess of transitions over expectation for the small ribosomal subunit suggests that the conservation of nucleotide size is favored by selection.  相似文献   

11.
The isochore structure of the nuclear genome of angiosperms described by Salinas et al. (1) was confirmed by using a different experimental approach, namely by showing that the levels of coding sequences from both dicots and Gramineae are linearly correlated with GC levels of the corresponding flanking sequences. The compositional distribution of homologous coding sequences from several orders of dicots and from Gramineae were also studied and shown to mimick the compositional distributions previously seen (1) for coding sequences in general, most coding sequences from Gramineae being much higher than those of the dicots explored. These differences were even stronger for third codon positions and led to striking codon usages for many coding sequences especially in the case of Gramineae.  相似文献   

12.
Adenine nucleotides have been found to appear preferentially in the regions after the initiation codons or before the termination codons of bacterial genes. Our previous experiments showed that AAA and AAT, the two most frequent second codons in Escherichia coli, significantly enhance translation efficiency. To determine whether such a characteristic feature of base frequencies exists in eukaryote genes, we performed a comparative analysis of the base biases at the gene terminal portions using the proteomes of seven eukaryotes. Here we show that the base appearance at the codon third positions of gene terminal regions is highly biased in eukaryote genomes, although the codon third positions are almost free from amino acid preference. The bias changes depending on its position in a gene, and is characteristic of each species. We also found that bias is most outstanding at the second codon, the codon after the initiation codon. NCN is preferred in every genome; in particular, GCG is strongly favored in human and plant genes. The presence of the bias implies that the base sequences at the second codon affect translation efficiency in eukaryotes as well as bacteria.  相似文献   

13.
Codon usage in Clonorchis sinensis was analyzed using 12,515 codons from 38 coding sequences. Total GC content was 49.83%, and GC1, GC2 and GC3 contents were 56.32%, 43.15% and 50.00%, respectively. The effective number of codons converged at 51-53 codons. When plotted against total GC content or GC3, codon usage was distributed in relation to GC3 biases. Relative synonymous codon usage for each codon revealed a single major trend, which was highly correlated with GC content at the third position when codons began with A or U at the first two positions. In codons beginning with G or C base at the first two positions, the G or C base rarely occurred at the third position. These results suggest that codon usage is shaped by a bias towards G or C at the third base, and that this is affected by the first and second bases.  相似文献   

14.
Fishes of the order Cypriniformes are almost completely restricted to freshwater bodies and number > 3400 species placed in 5 families, each with poorly defined subfamilies and/or tribes. The present study represents the first attempt toward resolution of the higher-level relationships of the world’s largest freshwater-fish clade based on whole mitochondrial (mt) genome sequences from 53 cypriniforms (including 46 newly determined sequences) plus 6 outgroups. Unambiguously aligned, concatenated mt genome sequences (14,563 bp) were divided into 5 partitions (first, second, and third codon positions of the protein-coding genes, rRNA genes, and tRNA genes), and partitioned Bayesian analyses were conducted, with protein-coding genes being treated in 3 different manners (all positions included; third codon positions converted into purine [R] and pyrimidine [Y] [RY-coding]; third codon positions excluded). The resultant phylogenies strongly supported monophyly of the Cypriniformes as well as that of the families Cyprinidae, Catostomidae, and a clade comprising Balitoridae + Cobitidae, with the 2 latter loach families being reciprocally paraphyletic. Although all of the data sets yielded nearly identical tree topologies with regard to the shallower relationships, deeper relationships among the 4 major clades (the above 3 major clades plus Gyrinocheilidae, represented by a single species Gyrinocheilus aymonieri in this study), were incongruent depending on the data sets. Treatment of the rapidly saturated third codon–position transitions appeared to be a source of such incongruities, and we advocate that RY-coding, which takes only transversions into account, effectively removes this likely “noise” from the data set and avoids the apparent lack of signal by retaining all available positions in the data set. [Reviewing Editor: Rafael Zardoya]  相似文献   

15.
Summary The compositional distributions of coding sequences and DNA molecules (in the 50-100-kb range) are remarkably narrower in murids (rat and mouse) compared to humans (as well as to all other mammals explored so far). In murids, both distributions begin at higher and end at lower GC values. A comparison of homologous coding sequences from murids and humans revealed that their different compositional distributions are due to differences in GC levels in all three codon positions, particularly of genes located at both ends of the distribution. In turn, these differences are responsible for differences in both codon usage and amino acids. When GC levels at first+second codon positions and third codon positions, respectively, of murid genes are plotted against corresponding GC levels of homologous human genes, linear relationships (with very high correlation coefficients and slopes of about 0.78 and 0.60, respectively) are found. This indicates a conservation of the order of GC levels in homologous genes from humans and murids. (The same comparison for mouse and rat genes indicates a conservation of GC levels of homologous genes.) A similar linear relationship was observed when plotting GC levels of corresponding DNA fractions (as obtained by density gradient centrifugation in the presence of a sequence-specific ligand) from mouse and human. These findings indicate that orderly compositional changes affecting not only coding sequences but also noncoding sequences took place since the divergence of murids. Such directional fixations of mutations point to the existence of selective pressures affecting the genome as a whole.  相似文献   

16.
Homoplasy Increases Phylogenetic Structure   总被引:11,自引:1,他引:10  
According to currently accepted theories, rapidly evolving nucleotide sites are phylogenetically less informative than more slowly evolving ones, especially for recognizing more ancient groupings. For this reason third codon positions are often regarded as less reliable than first and second positions as indicators of phylogeny. Analysis of the largest nucleotide matrix treated to date—2538 rbc L sequences covering all major lineages of green plants—shows the opposite: although rapidly evolving and highly homoplastic, third positions contain most of the phylogenetic structure in the data. Frequency of change should thus be used with caution as a criterion for weighting or selecting characters.  相似文献   

17.
Codon contexts in enterobacterial and coliphage genes   总被引:6,自引:0,他引:6  
This investigation of the codon context of enterobacteria, plasmid, and phage protein genes was based on a search for correlations between the presence of one base type at codon position III and the presence of another base type at some other position in adjacent codons. Enterobacterial genes were compared with eukaryotic sequences for codon context effects. In enterobacterial genes, base usage at codon position III is correlated with the third position of the upstream adjacent codon and with all three positions of the downstream codon. Plasmid genes are free of context biases. Phage genes are heterogeneous: MS2 codons have no biased context, whereas lambda genes partly follow the trends of the host bacterium, and T7 genes have biased codon contexts that differ from those of the host. It has been reported that two successive third-codon positions tend to be occupied by two purines or two pyrimidines in Escherichia coli genes of low expression level. Here, the extent to which highly expressed protein genes can modulate base usage at two successive codon positions III, given the constraints on codon usage and protein sequence that act on them, was quantified. This demonstrates that the above-mentioned favored patterns are not a characteristic of weakly expressed genes but occur in all genes in which codon context can vary appreciably. The correlation between successive third-codon positions is a distinct feature of enterobacteria and of some phages, one that may result from adaptation of gene structure to translational efficiency. Conversely, codon context in yeast and human genes is biased--but for reasons unrelated to translation.   相似文献   

18.
DNA序列进化过程中核苷酸替代的非独立性研究   总被引:4,自引:2,他引:2  
杨子恒 《遗传学报》1990,17(5):354-359
本文评述了DNA序列间核苷酸替代数的估计方法,并通过对七个物种中组蛋白基因的比较对DNA进化的模型进行了考察。发现H2A基因第三位点上的碱基组成在物种间变异很大,并且跟H2A基因第一位点、H4基因第一、三位点及H2A上游,下游序列中的碱基组成有强正相关,提示DNA序列进化过程中存在着物种特异的区域性约束力。可能的原因是高等真核生物中GC含量升高,或者是染色体重组使这些同源序列位于不同的等质区段,从而受到不同的选择突变压。密码内各位点上核苷酸替代的相关性分析表明不同位点的替代是非独立的,其原因可能是一次替代事件引起多个位点的变化。文中讨论了这些结果对进化树推断的意义。  相似文献   

19.
We have compared the partial nucleotide and derived amino acid sequences of a phaseolin seed storage protein gene ofPhaseolus vulgaris (1) and a conglycinin storage protein gene ofGlycine max (2). Although these proteins are not antigenically related to one another, the architecture of the genes is similar throughout the sequences compared here. Intervening sequences interrupt the same amino acid positions in both genes. Within the 28% of theG. max gene and the 38% of theP. vulgaris gene represented in this comparison, 73% of the nucleotides in the coding and intervening sequences are identical, excluding the insertions and deletions. The nucleotide mismatches found in the coding sequences are distributed throughout the three codon positions with little bias towards the third codon position. In addition to the single nucleotide differences, six insertions or deletions, ranging from three to twenty-seven nucleotides in length, occur in this portion of the coding region and these are partially responsible for the molecular weight differences of the conglycinin α′-subunit and the phaseolin subunit.  相似文献   

20.
Phylogenetic analyses of first and second codon positions (DNA1 + 2 analysis) and amino acid sequences (protein analysis) are often thought to provide similar estimates of deep-level phylogeny. However, here we report a novel artifact influencing DNA level phylogenetic inference of protein-coding genes introduced by codon usage heterogeneity that causes significant incongruities between DNA1 + 2 and protein analyses. DNA1 + 2 analyses of plastid-encoded psbA genes (encoding of photosystem II D1 proteins) strongly suggest a relationship between haptophyte plastids and typical (peridinin-containing) dinoflagellate plastids. The psbA genes from haptophytes and a subset of the peridinin-type plastids display similar codon usage patterns for Leu, Ser, and Arg, which are each encoded by two separated codon sets that differ at first or first plus second codon positions. Our detailed analyses clearly indicate that these unusual preferences shared by haptophyte and some peridinin-type plastid genes are largely responsible for their strong affinity in DNA analyses. In particular, almost all of the support from DNA level analyses for the monophyly of haptophyte and peridinin-type plastids is lost when the codons corresponding to constant Leu, Ser, and Arg amino acids are excluded, suggesting that this signal comes from rapidly evolving synonymous substitutions, rather than from substitutions that result in amino acid changes. Indeed, protein maximum-likelihood analyses of concatenated PsaA and PsbA amino acid sequences indicate that, although 19' hexanoyloxyfucoxanthin-type (19' HNOF-type) plastids in dinoflagellates group with haptophyte plastids, peridinin-type plastids group weakly with those of stramenopiles. Consequently our results cast doubt on the single origin of peridinin-type and 19' HNOF-type plastids in dinoflagellates previously suggested on the basis of psaA and psbA concatenated gene phylogenetic analyses. We suggest that codon usage heterogeneity could be a more general problem for DNA level analyses of protein-coding genes, even when third codon positions are excluded.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号