首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Recombinant plasmids that direct synthesis of rat preproinsulin under the direction of the SV40 early promoter have been used to probe the mechanism of initiation of translation. Insertion of an upstream AUG triplet that was out-of-frame with respect to the coding sequence for preproinsulin reduced the yield of proinsulin, in keeping with the predictions of the scanning model. The extent to which an upstream AUG codon interfered depended on sequences surrounding the AUG triplet; with two constructs ( p255 /20 and C2) the 5'-proximal AUG codon constituted an absolute barrier: there was no initiation at the downstream start site for preproinsulin. With two other constructs ( p255 /9, p255 /21), however, proinsulin was made despite the presence of an upstream, out-of-frame AUG codon in a favorable context for initiation. In those cases the reading frame set by the first AUG triplet was short, terminating before the start of the preproinsulin coding sequence. The interpretation that ribosomes initiate at the first AUG, terminate, and then reinitiate at the AUG that directly precedes the preproinsulin coding sequence was tested by introducing a point mutation that eliminated the terminator codon: the resulting mutant made no proinsulin.  相似文献   

2.
Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.  相似文献   

3.
The bacterial DNA sequence in GenBank database were divided into coding and noncoding regions and examined for the base-trimer distribution in every triplet frame on the sense and antisense strands. The results revealed that for the noncoding region, both strands have very similar base-trimer distributions and have no frame specificity; that is, DNA is symmetric in the noncoding region. For the coding region, on the other hand, the symmetry is broken only in the triplet framework, and we found a special triplet-frame-specific symmetry which appears when the two complementary strands of the coding region are read from their 5 ends. In addition, the following frame specificity was also observed in the distribution of stop codons on the antisense strand of the coding region. When the antisense sequences of the open reading frames (ORFs) in the database are read in the three reading frames, the same reading frame as the corresponding ORF contains a significantly larger amount of long open frames without stop codons (i.e., nonstop frames [NSFs]) than expected, while the number of NSFs in the other two reading frames is similar to that of the expected one. That is, NSFs as well as ORFs are maintained in a frame-specific manner, and in this sense, DNA becomes symmetrical even in the coding region. These two kinds of frame-specific symmetries indicate that only an ORF and its complementary triplets are specifically recognized and maintained in DNA. We suppose that the antisense strands as well as the sense strands in the coding region may be transcribed, thereby producing various kinds of proteins corresponding to NSFs, though their amount may not be large. The presence of these proteins should have some benefits for living organisms, and therefore we propose that these proteins are upcoming enzymes having novel functions.Correspondence to: I. Urabe  相似文献   

4.
Evolutionary studies commonly model single nucleotide substitutions and assume that they occur as independent draws from a unique probability distribution across the sequence studied. This assumption is violated for protein-coding sequences, and we consider modeling approaches where codon positions (CPs) are treated as separate categories of sites because within each category the assumption is more reasonable. Such "codon-position" models have been shown to explain the evolution of codon data better than homogenous models in previous studies. This paper examines the ways in which codon-position models outperform homogeneous models and characterizes the differences in estimates of model parameters across CPs. Using the PANDIT database of multiple species DNA sequence alignments, we quantify the differences in the evolutionary processes at the 3 CPs in a systematic and comprehensive manner, characterizing previously undescribed features of protein evolution. We relate our findings to the functional constraints imposed by the genetic code, protein function, and the types of mutation that cause synonymous and nonsynonymous codon changes. The results increase our understanding of selective constraints and could be incorporated into phylogenetic analyses or gene-finding techniques in the future. The methods used are extended to an overlapping reading frame data set, and we discover that overlapping reading frames do not necessarily cause more stringent evolutionary constraints.  相似文献   

5.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

6.
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.  相似文献   

7.
The coding sequence for mammalian ornithine decarboxylase antizyme is in two different partially overlapping reading frames with no independent ribosome entry to the second ORF. Immediately before the stop codon of the first ORF, a proportion of ribosomes undergo a quadruplet translocation event to shift to the +1 reading frame of the second and main ORF. The proportion that frameshifts is dependent on the polyamine level and, because the product antizyme is a negative regulator of intracellular polyamine levels, the frameshifting acts to complete an autoregulatory circuit by sensing polyamine levels. An mRNA element just 5' of the shift site and a 3' pseudoknot are important for efficient frameshifting. Previous work has shown that a cassette with the mammalian shift site and associated signals directs efficient shifting in the budding yeast Saccharomyces cerevisiae at the same codon to the correct frame, but that the shift is -2 instead of +1. The product contains an extra amino acid corresponding to the shift site. The present work shows efficient frameshifting also occurs in the fission yeast, Schizosaccharomyces pombe. This frameshifting is 80% +1 and 20% -2. The response of S. pombe translation apparatus to the mammalian antizyme recoding signals is more similar to that of the mammalian system than to that of S. cerevisiae. S. pombe provides a good model system for genetic studies on the mechanism of at least this type of programmed mammalian frameshifting.  相似文献   

8.
9.
10.
Overlapping genes are two protein-coding sequences sharing a significant part of the same DNA locus in different reading frames. Although in recent times an increasing number of examples have been found in bacteria the underlying mechanisms of their evolution are unknown. In this work we explore how selective pressure in a protein-coding sequence influences its overlapping genes in alternative reading frames. We model evolution using a time-continuous Markov process and derive the corresponding model for the remaining frames to quantify selection pressure and genetic noise. Our findings lead to the presumption that, once information is embedded in the reverse reading frame −2 (relative to the mother gene in +1) purifying selection in the protein-coding reading frame automatically protects the sequences in both frames. We also found that this coincides with the fact that the genetic noise measured using the conditional entropy is minimal in frame −2 under selection in the coding frame.  相似文献   

11.
Properties of mRNA leading regions that modulate protein synthesis are little known (besides effects of their secondary structure). Here I explore how coding properties of leading regions may account for their disparate efficiencies. Trinucleotides that form off frame stop codons decrease costs of ribosomal slippages during protein synthesis: protein activity (as a proxy of gene expression, and as measured in experiments using artificial variants of 5' leading sequences of beta galactosidase in Escherichia coli) increases proportionally to the number of stop motifs in any frame in the 5' leading region. This suggests that stop codons in the 5' leading region, upstream of the recognized coding sequence, terminate eventual translations that sometimes start before ribosomes reach the mRNA's recognized start codon, increasing efficiency. This hypothesis is confirmed by further analyses: mRNAs with 5' leading regions containing in the same frame a start preceding a stop codon (in any frame) produce less enzymatic activity than those with the stop preceding the start. Hence coding properties, in addition to other properties, such as the secondary structure of the 5' leading region, regulate translation. This experimentally (a) confirms that within coding regions, off frame stops increase protein synthesis efficiency by early stopping frameshifted translation; (b) suggests that this occurs for all frames also in 5' leading regions and that (c) several alternative start codons that function at different probabilities should routinely be considered for all genes in the region of the recognized initiation codon. An unknown number of short peptides might be translated from coding and non-coding regions of RNAs.  相似文献   

12.
An unusual nucleotide sequence, called H10, was previously isolated by biopanning with a random peptide library on filamentous phage. The sequence encoded a peptide that bound to the growth hormone binding protein. Despite the fact that the H10 sequence can be expressed in Escherichia coli as a fusion to the gene III minor coat protein of the M13 phage, the sequence contained two TGA stop codons in the zero frame. Several mutant derivatives of the H10 sequence carried not only a stop codon, but also showed frameshifts, either +1 or -1 in individual isolates, between the H10 start and the gene III sequences. In this work, we have subcloned the H10 sequence and three of its derivatives (one requiring a +1 reading frameshift for expression, one requiring a -1 reading frameshift, and one open reading frame) in gene fusions to a reporter beta-galactosidase gene. These sequences have been cloned in all three reading frames relative to the reporter. The non-open reading frame constructs gave (surprisingly) high expression of the reporter (10-40% of control vector expression levels) in two out of the three frames. A site-directed mutant of the TGA stop codon (to TTA) in the +1 shifter greatly reduced the frameshift and gave expression primarily in the zero frame. By contrast, a site-directed mutant of the TGA in the -1 shifter had little effect on the pattern of expression, and alteration of the first TGA (of two) in H10 itself paradoxically reduced expression by half. We believe these phenomena to reflect a translational recoding mechanism in which ribosomes switch reading frames or read past stop codons upon encountering a signal encoded in the nucleotide sequence of the mRNA, because both the open reading frame derivative (which has six nucleotide changes from parental H10) and the site-directed mutant of the +1 shifter, primarily expressed the reporter only in the zero frame.  相似文献   

13.
An intact gene for the ribosomal protein S19 (rps19) is absent from Oenothera mitochondria. The conserved rps19 reading frame found in the mitochondrial genome is interrupted by a termination codon. This rps19 pseudogene is cotranscribed with the downstream rps3 gene and is edited on both sides of the translational stop. Editing, however, changes the amino acid sequence at positions that were well conserved before editing. Other strange editings create translational stops in open reading frames coding for functional proteins. In coxI and rps3 mRNAs CGA codons are edited to UGA stop codons only five and three codons, respectively, downstream to the initiation codon. These aberrant editings in essential open reading frames and in the rps19 pseudogene appear to have been shifted to these positions from other editing sites. These observations suggest a requirement for a continuous evolutionary constraint on the editing specificities in plant mitochondria.  相似文献   

14.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

15.
The nucleotide sequence of part of the late region of the polyoma virus genome was determined. It contains coding information for the major capsid protein VP1 and the C-terminal region of the minor proteins VP2 and VP3. In the sequence with the same polarity as late mRNA's, all coding frames are blocked by termination codons in a region around 48 units on the physical map. This is the region where the N-terminus of VP1 and the C-termini of VP2 and VP3 have been located (T. Hunter and W. Gibson, J. Virol. 28:240-253, 1978; S. G. Siddell and A. E. Smith, J. Virol. 27:427-431, 1978; Smith et al., Cell 9:481-487, 1976). There are two long uninterrupted coding frames in the late region of polyoma virus DNA. One lies at the 5' end of the sequence and contains potential coding sequences for VP2 and VP3. The other contains 383 consecutive sense codons starting with the ATG at nucleotide position 1,218, extends from 47.5 to 25.8 units counterclockwise on the physical map, and is located where the VP1 gene has been mapped. The VP1 gene overlaps the genes for proteins VP2/VP3 by 32 nucleotides and uses a different coding frame. From the DNA sequence, the amino acid sequence of VP1 was predicted. The proposed VP1 sequence is in good agreement with other data, namely, with the partial N-terminal amino acid sequence and the total amino acid composition. The VP1 coding frame terminates with a TAA codon at 25.8 map units. This is followed by an AATAAA sequence, which may act as a processing signal for the viral late mRNA's. When both nucleotide and amino acid sequences are compared with their counterparts in the related simian virus 40, extensive homologies are found over the entire region of the two viral genomes. Maximum homology appears to occur in those regions which code for the C-termini of the VP1 proteins. The overlap region of VP1 with VP2/VP3 of polyoma virus is shorter by 90 nucleotides than is that of simian virus 40 and shows very limited homology with the simian virus 40 sequence. This leads to the suggestion that the overlap segments of both viruses have been freed from stringency imposed on drifting during evolution and that proteins VP2 and VP3 of polyoma virus may have been truncated by the appearance of a termination codon within the sequence.  相似文献   

16.
H Grosjean  W Fiers 《Gene》1982,18(3):199-209
By considering the nucleotide sequence of several highly expressed coding regions in bacteriophage MS2 and mRNAs from Escherichia coli, it is possible to deduce some rules which govern the selection of the most appropriate synonymous codons NNU or NNC read by tRNAs having GNN, QNN or INN as anticodon. The rules fit with the general hypothesis that an efficient in-phase translation is facilitated by proper choice of degenerate codewords promoting a codon-anticodon interaction with intermediate strength (optimal energy) over those with very strong or very weak interaction energy. Moreover, codons corresponding to minor tRNAs are clearly avoided in these efficiently expressed genes. These correlations are clearcut in the normal reading frame but not in the corresponding frameshift sequences +1 and +2. We hypothesize that both the optimization of codon-anticodon interaction energy and the adaptation of the population to codon frequency or vice versa in highly expressed mRNAs of E. coli are part of a strategy that optimizes the efficiency of translation. Conversely, codon usage in weakly expressed genes such as repressor genes follows exactly the opposite rules. It may be concluded that, in addition to the need for coding an amino acid sequence, the energetic consideration for codon-anticodon pairing, as well as the adaptation of codons to the tRNA population, may have been important evolutionary constraints on the selection of the optimal nucleotide sequence.  相似文献   

17.
On the mechanism of ribosomal frameshifting at hungry codons   总被引:9,自引:0,他引:9  
In a few, rather rare cases, frameshift mutant alleles are phenotypically suppressed during limitation for particular aminoacyl-tRNA species. The simplest interpretation is compensatory ribosome frameshifting at a "hungry" codon in the vicinity of the suppressed frameshift mutation. We have now tested this interpretation directly by obtaining amino acid sequence data on such a phenotypically suppressed protein. We used a plasmid-borne lacZ gene, engineered to be in the (+) reading frame. Its background leakiness is increased by two orders of magnitude during lysyl-tRNA limitation. The enzyme made under this condition has the amino acid sequence expected from the DNA sequence up to the first lysine codon, then shifts in the (-) direction to recreate the correct lacZ reading frame. The lysine is replaced by serine, presumably due to cognate reading of an overlapping AGC codon displaced by one base to the 3' side of the AAG codon. When the 3' overlapping codon is AGA or AGG, there is no ribosome frameshifting; when it is AGU (read by the same serine tRNA) there is frameshifting, although less efficiently than in the case of AGC. The mechanism of cognate overlapping reading contradicts more elaborate models that two of the authors have suggested previously. However, the possibility remains that there is more than one mechanism of ribosome frameshifting at hungry codons.  相似文献   

18.
We performed 3′ RNA sequence analyses of [32P]pCp-end-labeled La Crosse (LAC) virus, alternate LAC virus isolate L74, and snowshoe hare bunyavirus large (L), medium (M), and small (S) negative-stranded viral RNA species to determine the coding capabilities of these species. These analyses were confirmed by dideoxy primer extension studies in which we used a synthetic oligodeoxynucleotide primer complementary to the conserved 3′-terminal decanucleotide of the three viral RNA species (Clerx-van Haaster and Bishop, Virology 105:564-574, 1980). The deduced sequences predicted translation of two S-RNA gene products that were read in overlapping reading frames. So far, only single contiguous open reading frames have been identified for the viral M- and L-RNA species. For the negative-stranded M-RNA species of all three viruses, the single reading frame developed from the first 3′-proximal UAC triplet. Likewise, for the L-RNA of the alternate LAC isolate, a single open reading frame developed from the first 3′-proximal UAC triplet. The corresponding L-RNA sequences of prototype LAC and snowshoe hare viruses initiated open reading frames; however, for both viral L-RNA species there was a preceding 3′-proximal UAC triplet in another reading frame that was followed shortly afterward by a termination codon. A comparison of the sequence data obtained for snowshoe hare virus, LAC virus, and the alternate LAC virus isolate showed that the identified nucleotide substitutions were sufficient to account for some of the fingerprint differences in the L-, M-, and S-RNA species of the three viruses. Unlike the distribution of the L- and M-RNA substitutions, significantly fewer nucleotide substitutions occurred after the initial UAC triplet of the S-RNA species than before this triplet, implying that the overlapping genes of the S RNA provided a constraint against evolution by point mutation. The comparative sequence analyses predicted amino acid differences among the corresponding L-, M-, and S-RNA gene products of snowshoe hare virus and the two LAC virus isolates.  相似文献   

19.
MOTIVATION: Overlapping gene coding sequences (CDSs) are particularly common in viruses but also occur in more complex genomes. Detecting such genes with conventional gene-finding algorithms can be difficult for several reasons. If an overlapping CDS is on the same read-strand as a known CDS, then there may not be a distinct promoter or mRNA. Furthermore, the constraints imposed by double-coding can result in atypical codon biases. However, these same constraints lead to particular mutation patterns that may be detectable in sequence alignments. RESULTS: In this paper, we investigate several statistics for detecting double-coding sequences with pairwise alignments--including a new maximum-likelihood method. We also develop a model for double-coding sequence evolution. Using simulated sequences generated with the model, we characterize the distribution of each statistic as a function of sequence composition, length, divergence time and double-coding frame. Using these results, we develop several algorithms for detecting overlapping CDSs. The algorithms were tested on known overlapping CDSs and other overlapping open reading frames (ORFs) in the hepatitis B virus (HBV), Escherichia coli and Salmonella typhimurium genomes. The algorithms should prove useful for detecting novel overlapping genes--especially short coding ORFs in viruses. AVAILABILITY: Programs may be obtained from the authors. SUPPLEMENTARY INFORMATION: http://biochem.otago.ac.nz/double.html.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号