首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The subset X0=[AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG, GAA,GAC,GAG,GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC] of 20 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0 is a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in the frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). X0 is called a C3 code (Arquès and Michel, 1997, J. Biosyst 44, 107-134). A quantitative study of these three subsets X0, X1 and X2 in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of X0, X1 and X2 in the frame 0 of eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5' and 3' regions of eukaryotes where X0, X1 and X2 occur with variable frequencies around the random value (1/3). Several frequency asymmetries unexpectedly observed, e.g. the frequency difference between X1 and X2 in the frame 0, are related to a new property of the C3 code X0 involving substitutions. An evolutionary analytical model at three parameters (p, q, t) based on an independent mixing of the 20 codons (trinucleotides in the frame 0) of X0 with equiprobability (1/20) followed by t approximately 4 substitutions per codon according to the proportions p approximately 0.1, q approximately 0.1 and r = 1 - p - q approximately 0.8 in the three codon sites respectively, retrieves the frequencies of X0, X1 and X2 observed in the three frames of protein genes and explains these asymmetries. The complex behaviour of these analytical curves is totally unexpected and a priori difficult to imagine. Finally, the evolutionary analytical method developed could be applied to the phylogenetic tree reconstruction and the DNA sequence alignment.  相似文献   

2.
A circular code is a set of trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons, and automatically with a window of few nucleotides. In 1996, a common circular code, called X, was identified in large populations of eukaryotic and prokaryotic genes. Hence, it is believed to be an ancestral structural property of genes. A new computational approach based on comparative genomics is developed to identify essential molecular functions associated with circular codes. It is based on a quantitative and sensitive statistical method (FPTF) to identify three permuted trinucleotide sets in the three frames of genes, a flower automaton algorithm to determine if a trinucleotide set is a circular code or not, and an integrated Gene Ontology and Taxonomy (iGOT) database. By carrying out automatic circular code analyses on a huge number of gene populations where each population is associated with a particular molecular function, it identifies 266 gene populations having circular codes close to X. Surprisingly, their molecular functions include 98% of those covered by the essential genes of the DEG database (Database of Essential Genes). Furthermore, three trinucleotides GTG, AAG and GCG, replacing three trinucleotides of the code X and called “evolutionary” trinucleotides, significantly occur in these 266 gene populations. Finally, a new method developed to analyse and quantify the stability of a set of trinucleotides demonstrates that these evolutionary trinucleotides are associated with a significant increase of the stability of the common circular code X. Indeed, its stability increases from the 1502th rank to the 16th rank after the replacement of the three evolutionary trinucleotides among 9920 possible trinucleotide replacement sets.  相似文献   

3.
Self-complementary circular codes are involved in pairing genetic processes. A maximal \(C^3\) self-complementary circular code X of trinucleotides was identified in genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel in Life 7(20):1–16 2017, J Theor Biol 380:156–177, 2015; Arquès and Michel in J Theor Biol 182:45–58 1996). In this paper, self-complementary circular codes are investigated using the graph theory approach recently formulated in Fimmel et al. (Philos Trans R Soc A 374:20150058, 2016). A directed graph \(\mathcal {G}(X)\) associated with any code X mirrors the properties of the code. In the present paper, we demonstrate a necessary condition for the self-complementarity of an arbitrary code X in terms of the graph theory. The same condition has been proven to be sufficient for codes which are circular and of large size \(\mid X \mid \ge 18\) trinucleotides, in particular for maximal circular codes (\(\mid X \mid = 20\) trinucleotides). For codes of small-size \(\mid X \mid \le 16\) trinucleotides, some very rare counterexamples have been constructed. Furthermore, the length and the structure of the longest paths in the graphs associated with the self-complementary circular codes are investigated. It has been proven that the longest paths in such graphs determine the reading frame for the self-complementary circular codes. By applying this result, the reading frame in any arbitrary sequence of trinucleotides is retrieved after at most 15 nucleotides, i.e., 5 consecutive trinucleotides, from the circular code X identified in genes. Thus, an X motif of a length of at least 15 nucleotides in an arbitrary sequence of trinucleotides (not necessarily all of them belonging to X) uniquely defines the reading (correct) frame, an important criterion for analyzing the X motifs in genes in the future.  相似文献   

4.
We develop here a new class of stochastic models of gene evolution in which a random subset of the 64 possible trinucleotides mutates at each evolutionary time t according to some time dependent substitution probabilities. Therefore, at each time t, the numbers and the types of mutable trinucleotides are unknown. Thus, the mutation matrix changes at each time t. This pseudochaotic model developed generalizes the standard model in which all the trinucleotides mutate at each time t. It determines the occurrence probabilities at time t of trinucleotides which pseudochaotically mutate according to 3 time dependent substitution parameters associated with the 3 trinucleotide sites. The main result proves that under suitable assumptions, this pseudochaotic model converges to a uniform probability vector identical to that of the standard model. Furthermore, an application of this pseudochaotic model allows an evolutionary study of the 3 circular codes identified in both eukaryotic and prokaryotic genes. A circular code is a particular set of trinucleotides whose main property is the retrieval of the frames in genes locally, i.e., anywhere in genes and particularly without start codons, and automatically with a window of a few nucleotides. After a certain evolutionary time and with particular time dependent functions for the 3 substitution parameters, precisely an exponential decrease in the 1st and 2nd trinucleotide sites and an exponential increase in the 3rd one, this pseudochaotic model retrieves the main statistical properties of the 3 circular codes observed in genes. Furthermore, it leads to a circular code asymmetry stronger than the standard model (nonpseudochaotic) and, therefore, to a better correlation with the genes.  相似文献   

5.
We develop here a new class of stochastic models of gene evolution in which the mutations are chaotic, i.e. a random subset of the 64 possible trinucleotides mutates at each evolutionary time t according to some substitution probabilities. Therefore, at each time t, the numbers and the types of mutable trinucleotides are unknown. Thus, the mutation matrix changes at each time t. The chaotic model developed generalizes the standard model in which all the trinucleotides mutate at each time t. It determines the occurrence probabilities at time t of trinucleotides which chaotically mutate according to three substitution parameters associated with the three trinucleotide sites. Two theorems prove that this chaotic model has a probability vector at each time t and that it converges to a uniform probability vector identical to that of the standard model. Furthermore, four applications of this chaotic model (with a uniform random strategy for the 64 trinucleotides and with a particular strategy for the three stop codons) allow an evolutionary study of the three circular codes identified in both eukaryotic and prokaryotic genes. A circular code is a particular set of trinucleotides whose main property is the retrieval of the frames in genes locally, i.e. anywhere in genes and particularly without start codons, and automatically with a window of a few nucleotides. After a certain evolutionary time and with particular values for the three substitution parameters, the chaotic models retrieve the main statistical properties of the three circular codes observed in genes. These applications also allow an evolutionary comparison between the standard and chaotic models.  相似文献   

6.
Many retroviruses express gag-pol or gag-pro-pol polypeptides by coupling their translation from overlapping reading frames with -1 ribosomal frameshifts. Here, we show that the well-known ribosomal frameshift signals found in retroviral mRNA will provoke Escherichia coli ribosomes to shift frame in the same manner as their eukaryotic counterparts. Ribosomes of E. coli respond in vivo to both the tandem slippery codons present at the retroviral frameshift site and the 3' flanking sequence. Slight alteration of the mouse mammary tumor virus gag-pro frameshift site from A-AAA-AAC to A-AAA-AAG boosts the level of frameshifting in E. coli to over 50%. This suggests that A-AAA-AAG, and its slippery relatives, may be utilized by E. coli genes as sites of high-level ribosomal frameshifting. This observed conservation of response to retroviral frameshift signals affords new avenues to dissect the mechanism of ribosomal frameshifting evoked by these mRNA sequences.  相似文献   

7.
Previous experiments have shown that limitation for certain aminoacyl-tRNA species results in phenotypic suppression of a subset of frameshift mutant alleles, including members in both the (+) and (-) incorrect reading frames. Here, we demonstrate that such phenotypic suppression can occur through a ribosome reading frame shift at a hungry AAG codon calling for lysyl-tRNA in short supply. Direct amino acid sequence analysis of the product and DNA sequence manipulation of the gene demonstrate that the ribosome frameshift occurs through a movement of one base to the left, so as to decode the triplet overlapping the hungry codon from the left or 5' side, followed by continued normal translation in the new, shifted reading frame.  相似文献   

8.
The Ty element of yeast is a member of a class of eukaryotic transposons which bear a striking resemblance to retroviral proviruses in their structure and expression strategies (1,2). A direct comparison can be drawn between the production of a fusion protein encoded by Ty, resulting from a frameshift event which fuses two out-of-phase open reading frames TYA and TYB, and the production of Pr180gag-pol in a retrovirus such as Rous Sarcoma Virus (RSV) (3,4). We present data which shows, definitively, that RNA splicing is not responsible for the frameshift in Ty. By in vitro mutation of a class I element, Ty1-15, we demonstrate that 31 nucleotides contained within the region where the TYA and TYB open reading frames overlap direct the frameshift. Within this short sequence there is a region of homology with a class II element which we show is also able to frameshift.  相似文献   

9.
We develop here a new class of gene evolution models in which the nucleotide mutations are time dependent. These models allow to study nonlinear gene evolution by accelerating or decelerating the mutation rates at different evolutionary times. They generalize the previous ones which are based on constant mutation rates. The stochastic model developed in this class determines at some time t the occurrence probabilities of trinucleotides mutating according to 3 time dependent substitution parameters associated with the 3 trinucleotide sites. Therefore, it allows to simulate the evolution of the circular code recently observed in genes. By varying the class of function for the substitution parameters, 1 among 12 models retrieves after mutation the statistical properties of the observed circular code in the 3 frames of actual genes. In this model, the mutation rate in the 3rd trinucleotide site increases during gene evolution while the mutation rates in the 1st and 2nd sites decrease. This property agrees with the actual degeneracy of the genetic code. This approach can easily be generalized to study evolution of motifs of various lengths, e.g., dicodons, etc., with time dependent mutations.  相似文献   

10.
The Escherichia coli dnaX gene encodes both the tau and gamma subunits of DNA polymerase III holoenzyme in one reading frame. The 71.1 kDa tau and the shorter gamma share N-terminal sequences. Mutagenesis of a potential ribosomal frameshift signal located at codons 428-430 without changing the amino acid sequence of the tau product, eliminated detectable synthesis of the gamma subunit, suggesting that the reading frame is shifted at that sequence and gamma is terminated by a nonsense codon located in the -1 frame 3 nucleotides downstream of the signal. This seems to be the first known case of a frameshift which is used, along with the termination codon in the -1 frame, to terminate a peptide within a reading frame. [Mutagenesis of a dibasic peptide (lys-lys) at codons 498-499, the site at which a tau'-'LacZ fusion protein was cleaved in vitro (1) had no effect on gamma formation in vivo, suggesting that cleavage observed in vitro is not the mechanism of gamma formation in vivo.  相似文献   

11.
Ribosomes can be programmed to shift from one reading frame to another during translation. Hepatitis C virus (HCV) uses such a mechanism to produce F protein from the -2/+1 reading frame. We now report that the HCV frameshift signal can mediate the synthesis of the core protein of the zero frame, the F protein of the -2/+1 frame, and a 1.5-kDa protein of the -1/+2 frame. This triple decoding function does not require sequences flanking the frameshift signal and is apparently independent of membranes and the synthesis of the HCV polyprotein. Two consensus -1 frameshift sequences in the HCV type 1 frameshift signal facilitate ribosomal frameshifts into both overlapping reading frames. A sequence which is located immediately downstream of the frameshift signal and has the potential to form a double stem-loop structure can significantly enhance translational frameshifting in the presence of the peptidyl-transferase inhibitor puromycin. Based on these results, a model is proposed to explain the triple decoding activities of the HCV ribosomal frameshift signal.  相似文献   

12.
In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C3 maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization.  相似文献   

13.
A mutational analysis of the eukaryotic elongation factor EF-1 alpha indicates that this protein functions to limit the frequency of errors during genetic code translation. We found that both amino acid misincorporation and reading frame errors are controlled by EF-1 alpha. In order to examine the function of this protein, the TEF2 gene, which encodes EF-1 alpha in Saccharomyces cerevisiae, was mutagenized in vitro with hydroxylamine. Sixteen independent TEF2 alleles were isolated by their ability to suppress frameshift mutations. DNA sequence analysis identified eight different sites in the EF-1 alpha protein that elevate the frequency of mistranslation when mutated. These sites are located in two different regions of the protein. Amino acid substitutions located in or near the GTP-binding and hydrolysis domain of the protein cause suppression of frameshift and nonsense mutations. These mutations may effect mistranslation by altering the binding or hydrolysis of GTP. Amino acid substitutions located adjacent to a putative aminoacyl-tRNA binding region also suppress frameshift and nonsense mutations. These mutations may alter the binding of aminoacyl-tRNA by EF-1 alpha. The identification of frameshift and nonsense suppressor mutations in EF-1 alpha indicates a role for this protein in limiting amino acid misincorporation and reading frame errors. We suggest that these types of errors are controlled by a common mechanism or closely related mechanisms.  相似文献   

14.
Recently, a new genetic process termed RNA editing has been identified showing insertions and deletions of nucleotides in particular RNA molecules. On the other hand, there are a few non-random statistical properties in genes: in particular, the periodicity modulo 3 (P3) associated with an open reading frame, the periodicity modulo 2 (P2) associated with alternating purine/pyrimidine stretches, the YRY(N)6YRY preferential occurrence (R = purine = adenine or guanine, Y = pyrimidine = cytosine or thymine, N = R or Y) representing a "code" of the DNA helix pitch, etc. The problem investigated here is whether a process of the type RNA editing can lead to the non-random statistical properties commonly observed in genes. This paper will show in particular that: The process of insertions and deletions of mononucleotides in the initial sequence [YRY(N)3]* [series of YRY(N)3] can lead to the periodicity modulo 2 (P2). The process of insertions and deletions of trinucleotides in the initial sequence [YRY(N)6]* [series of YRY(N)6] can lead to the periodicity modulo 3 (P3) and the YRY(N)6YRY preferential occurrence. Furthermore, these two processes lead to a strong correlation with the reality, namely the mononucleotide insertion/deletion process, with the 5' eukaryotic regions and the trinucleotide insertion/deletion process, with the eukaryotic protein coding genes.  相似文献   

15.
The short-chain oxidoreductase (SCOR) family of enzymes includes over 6000 members, extending from bacteria and archaea to humans. Nucleic acid sequence analysis reveals that significant numbers of these genes are remarkably free of stopcodons in reading frames other than the coding frame, including those on the antisense strand. The genes from this subset also use almost entirely the GC-rich half of the 64 codons. Analysis of a million hypothetical genes having random nucleotide composition shows that the percentage of SCOR genes having multiple open reading frames exceeds random by a factor of as much as 1 x 10(6). Nevertheless, screening the content of the SWISS-PROT TrEMBL database reveals that 15% of all genes contain multiple open reading frames. The SCOR genes having multiple open reading frames and a GC-rich coding bias exhibit a similar GC bias in the nucleotide triple composition of their DNA. This bias is not correlated with the GC content of the species in which the SCOR genes are found. One possible explanation for the conservation of multiple open reading frames and extreme bias in nucleic acid composition in the family of Rossman folds is that the primordial member of this family was encoded early using only very stable GC-rich DNA and that evolution proceeded with extremely limited introduction of any codons having two or more adenine or thymine nucleotides. These and other data suggest that the SCOR family of enzymes may even have diverged from a common ancestor before most of the AT-rich half of the genetic code was fully defined.  相似文献   

16.
Antisense-induced ribosomal frameshifting   总被引:1,自引:0,他引:1  
Programmed ribosomal frameshifting provides a mechanism to decode information located in two overlapping reading frames by diverting a proportion of translating ribosomes into a second open reading frame (ORF). The result is the production of two proteins: the product of standard translation from ORF1 and an ORF1–ORF2 fusion protein. Such programmed frameshifting is commonly utilized as a gene expression mechanism in viruses that infect eukaryotic cells and in a subset of cellular genes. RNA secondary structures, consisting of pseudoknots or stem–loops, located downstream of the shift site often act as cis-stimulators of frameshifting. Here, we demonstrate for the first time that antisense oligonucleotides can functionally mimic these RNA structures to induce +1 ribosomal frameshifting when annealed downstream of the frameshift site, UCC UGA. Antisense-induced shifting of the ribosome into the +1 reading frame is highly efficient in both rabbit reticulocyte lysate translation reactions and in cultured mammalian cells. The efficiency of antisense-induced frameshifting at this site is responsive to the sequence context 5′ of the shift site and to polyamine levels.  相似文献   

17.
《FEBS letters》1985,193(2):164-168
An open reading frame (ORF) preceding the cytochrome oxidase subunit II (CO II) gene in Oenothera mitochondria has four nucleotides in common with this gene. The last two nucleotides of the CO II initiation codon ATG are the first two nucleotides of the TGA termination codon in the upstream ORF. Both reading frames are cotranscribed in a bicistronic mRNA species of 1250 nucleotides in length in Oenothera. The open reading frame codes for a protein of 58 amino acids with structural homology to the ATPase subunit 8 genes in fungal and mammalian mitochondria. Using coding space optimally though overlapping genes appears to be without economical reason considering the large size of higher plant mitochondrial genomes.  相似文献   

18.
Human T-cell leukemia virus type II (HTLV-II) isolated from a T-cell variant of hairy cell leukemia contains gag, pol and env genes as well as a fourth gene termed X, which can code three major open reading frames Xa, Xb and Xc. Proteins with molecular masses of 26 kDa (p26Xb) and 24 kDa (p24Xb) encoded by the Xb open reading frame were identified with antisera directed against synthetic peptides corresponding to the N-terminal and C-terminal amino acid sequences deduced from the structure of the Xb open reading frame. More than half the Xb products were found to be located in the nuclear fraction of HTLV-II-infected cells.  相似文献   

19.
D. Arquès and C. Michel have discovered by statistical methods a set of 20 trinucleotides which has remarkable properties, allows to retrieve the reading frame 0 in protein coding genes sequences and may play a role in molecular evolution theory. We make some comments on this and show (giving an example) that it is possible, by using theoretical arguments, to find sets having similar properties.  相似文献   

20.
Coding capacity of complementary DNA strands.   总被引:7,自引:4,他引:3       下载免费PDF全文
A Fortran computer algorithm has been used to analyze the nucleotide sequence of several structural genes. The analysis performed on both coding and complementary DNA strands shows that whereas open reading frames shorter than 100 codons are randomly distributed on both DNA strands, open reading frames longer than 100 codons ("virtual genes") are significantly more frequent on the complementary DNA strand than on the coding one. These "virtual genes" were further investigated by looking at intron sequences, splicing points, signal sequences and by analyzing gene mutations. On the basis of this analysis coding and complementary DNA strands of several eukaryotic structural genes cannot be distinguished. In particular we suggest that the complementary DNA strand of the human epsilon-globin gene might indeed code for a protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号