首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
De novo origin of coding sequence remains an obscure issue in molecular evolution. One of the possible paths for addition (subtraction) of DNA segments to (from) a gene is stop codon shift. Single nucleotide substitutions can destroy the existing stop codon, leading to uninterrupted translation up to the next stop codon in the gene’s reading frame, or create a premature stop codon via a nonsense mutation. Furthermore, short indels-caused frameshifts near gene’s end may lead to premature stop codons or to translation past the existing stop codon. Here, we describe the evolution of the length of coding sequence of prokaryotic genes by change of positions of stop codons. We observed cases of addition of regions of 3′UTR to genes due to mutations at the existing stop codon, and cases of subtraction of C-terminal coding segments due to nonsense mutations upstream of the stop codon. Many of the observed stop codon shifts cannot be attributed to sequencing errors or rare deleterious variants segregating within bacterial populations. The additions of regions of 3′UTR tend to occur in those genes in which they are facilitated by nearby downstream in-frame triplets which may serve as new stop codons. Conversely, subtractions of coding sequence often give rise to in-frame stop codons located nearby. The amino acid composition of the added region is significantly biased, compared to the overall amino acid composition of the genes. Our results show that in prokaryotes, shift of stop codon is an underappreciated contributor to functional evolution of gene length.  相似文献   

2.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

3.
Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC levels among the three codon positions is I>II>III as observed in other extremely high AT rich organisms. B. aphidicola being an AT rich organism is expected to have A and/or T at the third positions of codons. Overall codon usage analyses indicate that A and/or T ending codons are predominant in this organism and some particular amino acids are abundant in the coding region of genes. However, multivariate statistical analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the GC contents at the third synonymous positions of codons, and the other being associated with the expression level of genes. Moreover, codon usage biases of the highly expressed genes are almost identical with the overall codon usage biases of all the genes of this organism. These observations suggest that mutational bias is the main factor in determining the codon usage variation among the genes in B. aphidicola.  相似文献   

4.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

5.
Properties of mRNA leading regions that modulate protein synthesis are little known (besides effects of their secondary structure). Here I explore how coding properties of leading regions may account for their disparate efficiencies. Trinucleotides that form off frame stop codons decrease costs of ribosomal slippages during protein synthesis: protein activity (as a proxy of gene expression, and as measured in experiments using artificial variants of 5' leading sequences of beta galactosidase in Escherichia coli) increases proportionally to the number of stop motifs in any frame in the 5' leading region. This suggests that stop codons in the 5' leading region, upstream of the recognized coding sequence, terminate eventual translations that sometimes start before ribosomes reach the mRNA's recognized start codon, increasing efficiency. This hypothesis is confirmed by further analyses: mRNAs with 5' leading regions containing in the same frame a start preceding a stop codon (in any frame) produce less enzymatic activity than those with the stop preceding the start. Hence coding properties, in addition to other properties, such as the secondary structure of the 5' leading region, regulate translation. This experimentally (a) confirms that within coding regions, off frame stops increase protein synthesis efficiency by early stopping frameshifted translation; (b) suggests that this occurs for all frames also in 5' leading regions and that (c) several alternative start codons that function at different probabilities should routinely be considered for all genes in the region of the recognized initiation codon. An unknown number of short peptides might be translated from coding and non-coding regions of RNAs.  相似文献   

6.
It is shown that synonymous codon usage is less biased in favor of those codons preferred by highly expressed genes at the end ofEscherichia coli genes than in the middle. This appears to be due to the close proximity of manyE. coli genes. It is shown that a substantial number of genes overlap either the Shine-Dalgarno sequence or the coding sequence of the next gene on the chromosome and that the codons that overlap have lower synonymous codon bias than those which do not. It is also shown that there is an increase in the frequency of A-ending codons, and a decrease in the frequency of G-ending codons at the end ofE. coli genes that lie close to another gene. It is suggested that these trends in composition could be associated with selection against the formation of mRNA secondary structure near the start of the next gene on the chromosome. Stop codon use is also affected by the close proximity of genes; many genes are forced to use TGA and TAG stop codons because they terminate either within the Shine-Dalgarno or coding sequence of the next gene on the chromosome. The implications these results have for the evolution of synonymous codon use are discussed.  相似文献   

7.
This work assesses relationships for 30 complete prokaryotic genomes between the presence of the Shine-Dalgarno (SD) sequence and other gene features, including expression levels, type of start codon, and distance between successive genes. A significant positive correlation of the presence of an SD sequence and the predicted expression level of a gene based on codon usage biases was ascertained, such that predicted highly expressed genes are more likely to possess a strong SD sequence than average genes. Genes with AUG start codons are more likely than genes with other start codons, GUG or UUG, to possess an SD sequence. Genes in close proximity to upstream genes on the same coding strand in most genomes are significantly higher in SD presence. In light of these results, we discuss the role of the SD sequence in translation initiation and its relationship with predicted gene expression levels and with operon structure in both bacterial and archaeal genomes.  相似文献   

8.
Adaptive codon usage provides evidence of natural selection in one of its most subtle forms: a fitness benefit of one synonymous codon relative to another. Codon usage bias is evident in the coding sequences of a broad array of taxa, reflecting selection for translational efficiency and/or accuracy as well as mutational biases. Here, we quantify the magnitude of selection acting on alternative codons in genes of the nematode Caenorhabditis remanei, an outcrossing relative of the model organism C. elegans, by fitting the expected mutation-selection-drift equilibrium frequency distribution of preferred and unpreferred codon variants to the empirical distribution. This method estimates the intensity of selection on synonymous codons in genes with high codon bias as N(e)s = 0.17, a value significantly greater than zero. In addition, we demonstrate for the first time that estimates of ongoing selection on codon usage among genes, inferred from nucleotide polymorphism data, correlate strongly with long-term patterns of codon usage bias, as measured by the frequency of optimal codons in a gene. From the pattern of polymorphisms in introns, we also infer that these findings do not result from the operation of biased gene conversion toward G or C nucleotides. We therefore conclude that coincident patterns of current and ancient selection are responsible for shaping biased codon usage in the C. remanei genome.  相似文献   

9.
lacZ translation initiation mutations   总被引:32,自引:0,他引:32  
  相似文献   

10.
11.
A very powerful method for detecting functional constraints operative in biological macromolecules is presented. This method entails performing a base permanence analysis of protein coding genes at each codon position simultaneously in different species. It calculates the degree of permanence of subregions of the gene by dividing it into segments, c codons long, counting how many sites remain unchanged in each segment among all species compared. By comparing the base permanence among several sequences with the expectations based on a stochastic evolutionary process, gene regions showing different degrees of conservation can be selected. This means that wherever the permanence deviates significantly from the expected value generated by the simulation, the corresponding regions are considered "constrained" or "hypervariable". The constrained regions are of two types: alpha and beta. The alpha regions result from constraints at the amino acid level, whereas the beta regions are those probably involved in "control" processing. The method has been applied to mitochondrial genes coding for subunit 6 of the ATPase and subunit 1 of the cytochrome oxidase in four mammalian species: human, rat, mouse, and cow. In the two mitochondrial genes a few regions that are highly conserved in all codon positions have been identified. Among these regions a sequence, common to both genes, that is complementary to a strongly conserved region of 12S rRNA has been found. This method can also be of great help in studying molecular evolution mechanisms.  相似文献   

12.
Sense codons are found in specific contexts   总被引:27,自引:0,他引:27  
The sequence environment of codons in structural genes has been investigated statistically, using computer methods. A set of Escherichia coli genes with abundant products was compared with a set having low gene product levels, in order to detect potential differences associated with expression. The results show striking non-randomness in the nucleotides occurring near codons. These effects are, unexpectedly, very much larger and more homogeneous among the genes with rare products. The intensity of effects in weakly expressed genes suggests that such non-random sequence environments decrease expression. In the weakly expressed set of genes, the 5' neighbor of a codon, and all positions of the 3' neighbor codon are biased. In the highly expressed genes, the first nucleotide of the next codon is a uniquely affected site. The distribution of non-randomness in weakly expressed genes suggests that sequence bias is primarily due to a constraint acting directly on the secondary or tertiary structure of the codon/anticodon. In highly expressed genes, the observed bias suggests an interaction between the codon/anticodon and a site outside the codon/anticodon. Much of the tendency to non-random near-neighbor sequences in weakly expressed genes can be ascribed to a correlation between nearby nucleotides and the wobble nucleotide of the codon, despite the fact that selection of such correlations will alter the amino acid sequence. The favored pattern, in genes expressed at low level, is R YYR or Y RRY. R indicates purine, Y indicates pyrimidine; the space is the boundary between codons. It seems likely that this preference for nearby sequences is the physical basis of the genetic context effect. Under this assumption such sequence biases will affect expression. On this basis, we predict new sites for contextual mutations which decrease expression, and suggest strategy for the design of messages having optimal translational activity.  相似文献   

13.
Adenine nucleotides have been found to appear preferentially in the regions after the initiation codons or before the termination codons of bacterial genes. Our previous experiments showed that AAA and AAT, the two most frequent second codons in Escherichia coli, significantly enhance translation efficiency. To determine whether such a characteristic feature of base frequencies exists in eukaryote genes, we performed a comparative analysis of the base biases at the gene terminal portions using the proteomes of seven eukaryotes. Here we show that the base appearance at the codon third positions of gene terminal regions is highly biased in eukaryote genomes, although the codon third positions are almost free from amino acid preference. The bias changes depending on its position in a gene, and is characteristic of each species. We also found that bias is most outstanding at the second codon, the codon after the initiation codon. NCN is preferred in every genome; in particular, GCG is strongly favored in human and plant genes. The presence of the bias implies that the base sequences at the second codon affect translation efficiency in eukaryotes as well as bacteria.  相似文献   

14.
The nucleotide sequences of the entire gene family, comprising six genes, that encodes the Rubisco small subunit (rbcS) multigene family in Mesembryanthemum crystallinum (common ice plant), were determined. Five of the genes are arranged in a tandem array spanning 20 kb, while the sixth gene is not closely linked to this array. The mature small subunit coding regions are highly conserved and encode four distinct polypeptides of equal lengths with up to five amino acid differences distinguishing individual genes. The transit peptide coding regions are more divergent in both amino acid sequence and length, encoding five distinct peptide sequences that range from 55 to 61 amino acids in length. Each of the genes has two introns located at conserved sites within the mature peptide-coding regions. The first introns are diverse in sequence and length ranging from 122 by to 1092 bp. Five of the six second introns are highly conserved in sequence and length. Two genes, rbcS-4 and rbcS-5, are identical at the nucleotide level starting from 121 by upstream of the ATG initiation codon to 9 by downstream of the stop codon including the sequences of both introns, indicating recent gene duplication and/or gene conversion. Functionally important regulatory elements identified in rbcS promoters of other species are absent from the upstream regions of all but one of the ice plant rbcS genes. Relative expression levels were determined for the rbcS genes and indicate that they are differentially expressed in leaves.  相似文献   

15.
The DNA sequence orgainzation of the protein encoding region of the gene for silk fibroin has been analyzed. The accompanying paper (Manningm R. F., and Gage, L. P. (1980) J. Biol. Chem. 255, 9451-9457) shows that the total length of the gene, and its protein, as well as the pattern of restriction sites in the gene is highly polymorphic among inbred stocks of Bombyx mori, In this paper, those features of fibroin gene structure which are invariant among these alleles are presented. Fibroin is composed primarily of relatively short "crystalline" and "amorphous" peptides of known sequence whose arrangement in the protein is unknown. Knowledge of the codons most commonly used in fibroin mRNA allowed utilization of particular restriction inzymes as a means for determing the nature and organization of crystalline and amorphous coding sequences in the fibroin gene. Three restriction endonucleases were identified that cleve sequences coding for amorphous region peptides. Their cleavage pattern revelaed that the repetitive coding sequence of the gene core (approximately 15 kilobases) is divided into at least 10 large crystalline coding domains interrupted by smaller amorphous coding domains. Many restriction endoncleases do not cleave the fibroin core at all, three of them with four gase recognition sequences. Specific deductions as to codon usage and repetitive sequence homogeneity in the gene follow from these results. One novel finding is the rigorous exclusion of the glycine codon GGA prior to serine codons even though this glycine codon is used frequently prior to alanine codons. The sequence homogeneity and the regularly alternating arrangement of crystalline and amorphous coding sequences of the gene are discussed in terms of the function of fibroin protein and the evolution of highly repetitive DNA.  相似文献   

16.
Highly expressed plastid genes display codon adaptation, which is defined as a bias toward a set of codons which are complementary to abundant tRNAs. This type of adaptation is similar to what is observed in highly expressed Escherichia coli genes and is probably the result of selection to increase translation efficiency. In the current work, the codon adaptation of plastid genes is studied with regard to three specific features that have been observed in E. coli and which may influence translation efficiency. These features are (1) a relatively low codon adaptation at the 5′ end of highly expressed genes, (2) an influence of neighboring codons on codon usage at a particular site (codon context), and (3) a correlation between the level of codon adaptation of a gene and its amino acid content. All three features are found in plastid genes. First, highly expressed plastid genes have a noticeable decrease in codon adaptation over the first 10–20 codons. Second, for the twofold degenerate NNY codon groups, highly expressed genes have an overall bias toward the NNC codon, but this is not observed when the 3′ neighboring base is a G. At these sites highly expressed genes are biased toward NNT instead of NNC. Third, plastid genes that have higher codon adaptations also tend to have an increased usage of amino acids with a high G + C content at the first two codon positions and GNN codons in particular. The correlation between codon adaptation and amino acid content exists separately for both cytosolic and membrane proteins and is not related to any obvious functional property. It is suggested that at certain sites selection discriminates between nonsynonymous codons based on translational, not functional, differences, with the result that the amino acid sequence of highly expressed proteins is partially influenced by selection for increased translation efficiency. Received: 21 July 1999 / Accepted: 5 November 1999  相似文献   

17.
A very powerful method for detecting functional constraints operative in biological macromolecules is presented. This method entails performing a base permanence analysis of protein coding genes at each codon position simultaneously in different species. It calculates the degree of permanence of subregions of the gene by dividing it into segments,c codons long, counting how many sites remain unchanged in each segment among all species compared. By comparing the base permanence among several sequences with the expectations based on a stochastic evolutionary process, gene regions showing different degrees of conservation can be selected. This means that wherever the permanence deviates significantly from the expected value generated by the simulation, the corresponding regions are considered “constrained” or “hypervariable”. The constrained regions are of two types: α and β. The α regions result from constraints at the amino acid level, whereas the β regions are those probably involved in “control” processing. The method has been applied to mitochondrial genes coding for subunit 6 of the ATPase and subunit 1 of the cytochrome oxidase in four mammalian species: human, rat, mouse, and cow. In the two mitochondrial genes a few regions that are highly conserved in all codon positions have been identified. Among these regions a sequence, common to both genes, that is complementary to a strongly conserved region of 12S rRNA has been found. This method can also be of great help in studying molecular evolution mechanisms.  相似文献   

18.
同义密码子用语的位置依赖   总被引:4,自引:0,他引:4  
研究了在大肠杆菌编码区不同位置上的同底密码子用语,发现许多氨基酸的密码子用语在转译起始区有显著的变化,仅有少数氨基酸在转译区有较弱的变化,由于密码子用语与基因表达关系密切。这些结果与实验发现的编码区5‘端密码子用对表达的重要性是一致的。更进一步的结果还暗示了哪些密码子在特定位置的使用可能会影响基因表达。  相似文献   

19.
R. M. Kliman  J. Hey 《Genetics》1994,137(4):1049-1056
Codon bias varies widely among the loci of Drosophila melanogaster, and some of this diversity has been explained by variation in the strength of natural selection. A study of correlations between intron and coding region base composition shows that variation in mutation pattern also contributes to codon bias variation. This finding is corroborated by an analysis of variance (ANOVA), which shows a tendency for introns from the same gene to be similar in base composition. The strength of base composition correlations between introns and codon third positions is greater for genes with low codon bias than for genes with high codon bias. This pattern can be explained by an overwhelming effect of natural selection, relative to mutation, in highly biased loci. In particular, this correlation is absent when examining fourfold degenerate sites of highly biased genes. In general, it appears that selection acts more strongly in choosing among fourfold degenerate codons than among twofold degenerate codons. Although the results indicate regional variation in mutational bias, no evidence is found for large scale regions of compositional homogeneity.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号