首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We conducted classification for 472,288 regions of triplet periodicity found in 578,868 genes from release 29 of KEGG databank. A new concept of triplet periodicity class and a measure of similarity between them are introduced. Totally 2520 classes were created that contain 94% of found triplet periodicity. For 92% of triplet periodicity regions contained in classes an identical linkage of triplet periodicity to reading frame is observed. For the rest triplet periodicity cases a shift between reading frame of a gene and reading frame common for majority of genes contained in a class of triplet periodicity was observed. These periodicity regions were encoded into hypothetical amino acid sequences in accordance with reading frame built by triplet periodicity class. By BLAST program it was shown that 2660 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of triplet periodicity regions that joined classes mutated by means of reading frame shift. Created classes of triplet periodicity can be used for identification of coding regions of genes as well as for searching for mutations arisen from reading frame shift.  相似文献   

2.
Frenkel FE  Korotkov EV 《Gene》2008,421(1-2):52-60
We introduce a new concept of triplet periodicity class (TPC) and a measure of similarity between such classes. We performed classification of 472288 triplet periodicity (TP) regions found in 578868 genes from 29th release of KEGG databank. Totally 2520 classes were obtained. They contain 94% of 472288 found cases of TP. For 92% of TP regions contained in classes the same linkage of TP to open reading frame (ORF) is observed. For 8% of TP cases we revealed a shift between ORF of a gene and ORF common for majority of genes contained in a TPC. For these 8% of periodic regions the hypothetical amino acid sequences corresponding to ORF built by TPC were made. BLAST program has shown that 2679 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of TP regions contained in classes possess a mutation originating from ORF shift. Obtained TPCs can be used for identification of genes' coding regions as well as for searching for mutations arisen arising from ORF shift.  相似文献   

3.
We introduce a novel approach for the detection of possible mutations leading to a reading frame (RF) shift in a gene. Deletions and insertions of DNA coding regions are considerable events for genes because an RF shift results in modifications of the extensive region of amino acid sequence coded by a gene. The suggested method is based on the phenomenon of triplet periodicity (TP) in coding regions of genes and its relative resistance to substitutions in DNA sequence. We attempted to extend 326 933 regions of continuous TP found in genes from the KEGG databank by considering possible insertions and deletions. We revealed totally 824 genes where such extension was possible and statistically significant. Then we generated amino acid sequences according to active (KEGG''s) and hypothetically ancient RFs in order to find confirmation of a shift at a protein level. Consequently, 64 sequences have protein similarities only for ancient RF, 176 only for active RF, 3 for both and 581 have no protein similarity at all. We aimed to have revealed lower bound for the number of genes in which a shift between RF and TP is possible. Further ways to increase the number of revealed RF shifts are discussed.  相似文献   

4.
We have isolated and sequenced two full-length cDNA clones encoding actin from carrot. The two carrot clones are almost identical at the nucleotide level, and are quite homologous to each other and to other plant actins at the amino acid level. In those regions where amino acid variation exists between the two genes from carrot, the differences have arisen from very simple changes at the nucleotide level. The most common changes are nucleotide insertion(s) coupled to the deletion of a different nucleotide(s) nearby in the DNA sequence, resulting in the restoration of the proper reading frame for the protein; thus, these changes can be viewed as multiple or coupled frameshift mutations. There are almost no base substitutions between the two carrot genes. In contrast to this, when the carrot actin nucleotide sequences are compared to those of a soybean actin gene or a maize actin gene, many base substitutions are observed (ca. 21.8% and 23.5%), more than half of which are third base changes which do not alter the protein sequence. At the amino acid level, both carrot genes show greater similarity to maize actin than they do to soybean actin, thus reinforcing the idea that plant actin genes diverged from a single common ancestral actin gene prior to the divergence of monocots and dicots.  相似文献   

5.
6.
The cDNA sequence coding for the coat protein of cucumber mosaic virus (Japanese Y strain) was cloned, and its nucleotide sequence was determined. The sequence contains an open reading frame that encodes the coat protein composed of 218 amino acids. The nucleotide and deduced amino acid sequences of the coat protein of this strain were compared with those of the Q strain; the homologies of the sequences were 78% and 81%, respectively. Further study of the sequences gave an insight into the genome organization and the molecular features of the coat protein. The coding region can be divided into three characteristic regions. The N-terminal region has conserved features in the positively charged structure, the hydropathy pattern and the predicted secondary structure, although the amino acid sequence is varied mainly due to frameshift mutations. It is noteworthy that the positions of arginine residues in this region are highly conserved. Both the nucleotide and amino acid sequences of the central region are well conserved. The amino acid sequence of the C-terminal region is not conserved, because of frameshift mutations, however, the total number of amino acids is conserved. The nucleotide sequence of the 3'-noncoding region is divergent, but it could form a tRNA-like structure similar to those reported for other viruses. Detailed investigation suggests that the Y and Q strains are evolutionarily distant.  相似文献   

7.
H J Pel  M Rep    L A Grivell 《Nucleic acids research》1992,20(17):4423-4428
We have recently reported the cloning and sequencing of the gene for the mitochondrial release factor mRF-1. mRF-1 displays high sequence similarity to the bacterial release factors RF-1 and RF-2. A database search for proteins resembling these three factors revealed high similarities to two amino acid sequences deduced from unassigned genomic reading frames in Escherichia coli and Bacillus subtilis. The amino acid sequence derived from the Bacillus reading frame is 47% identical to E.coli and Salmonella typhimurium RF-2, strongly suggesting that it represents B.subtilis RF-2. Our comparison suggests that the expression of the B.subtilis gene is, like that of the E.coli and S. typhimurium RF-2 genes, autoregulated by a stop codon dependent +1 frameshift. A comparison of prokaryotic and mitochondrial release factor sequences, including the putative B.subtilis RF-2, leads us to propose a five-domain model for release factor structure. Possible functions of the various domains are discussed.  相似文献   

8.
The DNA sequences of the entire coding regions of the A and C type variable surface protein genes from Paramecium tetraurelia, stock 51 have been determined. The 8151 nucleotide open reading frame of the A gene contains several tandem repeats of 210 nucleotides within the central portion of the molecule as well as a periodic structure defined by cysteine residues. The 6699 nucleotide open reading frame of the C gene does not contain any identifiable tandem repeats or internal similarity but maintains a periodicity based on the cysteine residue spacing. The deduced amino acid sequences encoded by the two genes are most similar within the 600 amino-terminal and 600 carboxyl-terminal amino acid residues, the central portions show only limited sequence similarity. We conclude that internal repeats are not a conserved feature of variable surface proteins in Paramecium and discuss the possible importance of the regular pattern of cysteine residues.  相似文献   

9.
The major structural components of the P2 contractile tail are encoded in the FETUD tail gene operon. The sequences of genes F(I) and F(II), encoding the major tail sheath and tail tube proteins, have been reported previously (L. M. Temple, S. L. Forsburg, R. Calendar, and G. E. Christie, Virology 181:353-358, 1991). Sequence analysis of the remainder of this operon and the locations of amber mutations Eam30, Tam5, Tam64, Tam215, Uam25, Uam77, Uam92, and Dam6 and missense mutation Ets55 identified the coding regions for genes E, T, U, and D, completing the sequence determination of the P2 genome. Inspection of the DNA sequence revealed a new open reading frame overlapping the end of the essential tail gene E. Lack of an apparent translation initiation site and identification of a putative sequence for a programmed translational frameshift within the E gene suggested that this new reading frame (E') might be translated as an extension of gene E, following a -1 translational frameshift. Complementation analysis demonstrated that E' was essential for P2 lytic growth. Analysis of fusion polypeptides verified that this reading frame was translated as a -1 frameshift extension of gpE, with a frequency of approximately 10%. The arrangement of these two genes within the tail gene cluster of phage P2 and their coupling via a translational frameshift appears to be conserved among P2-related phages. This arrangement shows a striking parallel to the organization in the tail gene cluster of phage lambda, despite a lack of amino acid sequence similarity between the tail gene products of these phage families.  相似文献   

10.
A mutational analysis of the eukaryotic elongation factor EF-1 alpha indicates that this protein functions to limit the frequency of errors during genetic code translation. We found that both amino acid misincorporation and reading frame errors are controlled by EF-1 alpha. In order to examine the function of this protein, the TEF2 gene, which encodes EF-1 alpha in Saccharomyces cerevisiae, was mutagenized in vitro with hydroxylamine. Sixteen independent TEF2 alleles were isolated by their ability to suppress frameshift mutations. DNA sequence analysis identified eight different sites in the EF-1 alpha protein that elevate the frequency of mistranslation when mutated. These sites are located in two different regions of the protein. Amino acid substitutions located in or near the GTP-binding and hydrolysis domain of the protein cause suppression of frameshift and nonsense mutations. These mutations may effect mistranslation by altering the binding or hydrolysis of GTP. Amino acid substitutions located adjacent to a putative aminoacyl-tRNA binding region also suppress frameshift and nonsense mutations. These mutations may alter the binding of aminoacyl-tRNA by EF-1 alpha. The identification of frameshift and nonsense suppressor mutations in EF-1 alpha indicates a role for this protein in limiting amino acid misincorporation and reading frame errors. We suggest that these types of errors are controlled by a common mechanism or closely related mechanisms.  相似文献   

11.
Authentic cDNAs encoding the activator protein for acid beta-glucosidase (EC3.2.1.45), co-beta-glucosidase, were cloned from the pCD and lambda gt11 human cDNA libraries. Initial screening with oligonucleotide mixtures encoding amino acid sequences of co-beta-glucosidase identified partial cDNAs which were used to obtain a potentially full-length cDNA from the lambda gt11 library. This clone (2767 bp), EGTISI, contained 5' (38 bp) and 3' (1157 bp) noncoding sequences, a translation initiation site, and an open reading frame encoding 524 amino acids which included a typical hydrophobic signal sequence (16 amino acids). Computer analyses identified three regions of high similarity to co-beta-glucosidase encoded by tandem sequences in EGTISI. Searches revealed that two of these regions encoded peptides of known function; SAP1 (sphingolipid activator protein 1) and protein C (a new sphingolipid activator protein) were encoded by EGTISI sequences 5' and 3', respectively, to those for co-beta-glucosidase. The third region of similarity, encoding a theoretical peptide (undefined function), was located most 5' in the cDNA. EGTISI and its encoded polypeptide had high similarity (77% nucleotide identity and about 80% amino acid similarity) to a rat Sertoli cell cDNA and its encoded sulfated glycoprotein-1. These results indicate that a single highly conserved gene encodes the precursor for four potential sphingolipid activator proteins in rat and man.  相似文献   

12.
Missense and nonsense suppressors can correct frameshift mutations   总被引:6,自引:0,他引:6  
Missense and nonsense suppressor tRNAs, selected for their ability to read a new triplet codon, were observed to suppress one or more frameshift mutations in trpA of Escherichia coli. Two of the suppressible frameshift mutants, trpA8 and trpA46AspPR3, were cloned, sequenced, and found to be of the +1 type, resulting from the insertion of four nucleotides and one nucleotide, respectively. Twenty-two suppressor tRNAs were examined, 20 derived from one of the 3 glycine isoacceptor species, one from lysT, and one from trpT. The sequences of all but four of the mutant tRNAs are known, and two of those four were converted to suppressor tRNAs that were subsequently sequenced. Consideration of the coding specificities and anticodon sequences of the suppressor tRNAs does not suggest a unitary mechanism of frameshift suppression. Rather, the results indicate that different suppressors may shift frame according to different mechanisms. Examination of the suppression windows of the suppressible frameshift mutations indicates that some of the suppressors may work at cognate codons, either in the 0 frame or in the +1 frame, and others may act at noncognate codons (in either frame) by some as-yet-unspecified mechanism. Whatever the mechanisms, it is clear that some +1 frameshifting can occur at non-monotonous sequences. A striking example of a frameshifting missense suppressor is a mutant lysine tRNA that differs from wild-type lysine tRNA by only a single base in the amino acid acceptor stem, a C to U70 transition that results in a G.U base pair. It is suggested that when this mutant lysine tRNA reads its cognate codon, AAA, the presence of the G.U base pair sometimes leads either to a conformational change in the tRNA or to an altered interaction with some component of the translation machinery involved in translocation, resulting in a shift of reading frame. In general, the results indicate that translocation is not simply a function of anticodon loop size, that different frameshifting mechanisms may operate with different tRNAs, and that conformational features, some far removed from the anticodon region, are involved in maintaining fidelity in translocation.  相似文献   

13.
14.
Previous experiments have shown that limitation for certain aminoacyl-tRNA species results in phenotypic suppression of a subset of frameshift mutant alleles, including members in both the (+) and (-) incorrect reading frames. Here, we demonstrate that such phenotypic suppression can occur through a ribosome reading frame shift at a hungry AAG codon calling for lysyl-tRNA in short supply. Direct amino acid sequence analysis of the product and DNA sequence manipulation of the gene demonstrate that the ribosome frameshift occurs through a movement of one base to the left, so as to decode the triplet overlapping the hungry codon from the left or 5' side, followed by continued normal translation in the new, shifted reading frame.  相似文献   

15.
External suppressors, sufS, of a -1 frameshift mutant cause ribosomes to shift into the -1 frame when reading the sequence CAG GGA GUG. The resulting product is not Gln-Gly-Val but Gln-Gly-Ser with Ser being encoded by the underlined AGU. The alleles investigated are approximately 2% efficient in causing frameshifting. Two other suppressors, hopR and hopE of the same -1 frameshift mutant, cause some ribosomes reading the sequence GUG UG to decode a single amino acid, Val, from the five nucleotides. The possibility is considered that peptidyl-tRNA(Val) dissociates from the mRNA, but re-pairs in a triplet manner after the mRNA slips forward by two bases.  相似文献   

16.
Insertions and deletions of lengths not divisible by 3 in protein-coding sequences cause frameshifts that usually induce premature stop codons and may carry a high fitness cost. However, this cost can be partially offset by a second compensatory indel restoring the reading frame. The role of such pairs of compensatory frameshifting mutations (pCFMs) in evolution has not been studied systematically. Here, we use whole-genome alignments of protein-coding genes of 100 vertebrate species, and of 122 insect species, studying the prevalence of pCFMs in their divergence. We detect a total of 624 candidate pCFM genes; six of them pass stringent quality filtering, including three human genes: RAB36, ARHGAP6, and NCR3LG1. In some instances, amino acid substitutions closely predating or following pCFMs restored the biochemical similarity of the frameshifted segment to the ancestral amino acid sequence, possibly reducing or negating the fitness cost of the pCFM. Typically, however, the biochemical similarity of the frameshifted sequence to the ancestral one was not higher than the similarity of a random sequence of a protein-coding gene to its frameshifted version, indicating that pCFMs can uncover radically novel regions of protein space. In total, pCFMs represent an appreciable and previously overlooked source of novel variation in amino acid sequences.  相似文献   

17.
The complete pullulanase gene (amyB) from Thermoanaerobacterium thermosulfurigenes EM1 was cloned in Escherichia coli, and the nucleotide sequence was determined. The reading frame of amyB consisted of 5,586 bp encoding an exceptionally large enzyme of 205,991 Da. Sequence analysis revealed a composite structure of the pullulanase consisting of catalytic and noncatalytic domains. The N-terminal half of the protein contained a leader peptide of 35 amino acid residues and the catalytic domain, which included the four consensus regions of amylases. Comparison of the consensus regions of several pullulanases suggested that enzymes like pullulanase type II from T. thermosulfurigenes EM1 which hydrolyze alpha-1,4- and alpha-1,6-glycosidic linkages have specific amino acid sequences in the consensus regions. These are different from those of pullulanases type I which only cleave alpha-1,6 linkages. The C-terminal half, which is not necessary for enzymatic function, consisted of at least two different segments. One segment of about 70 kDa contained two copies of a fibronectin type III-like domain and was followed by a linker region rich in glycine, serine, and threonine residues. At the C terminus, we found three repeats of about 50 amino acids which are also present at the N-termini of surface layer (S-layer) proteins of, e.g., Thermus thermophilus and Acetogenium kivui. Since the pullulanase of T. thermosulfurigenes EM1 is known to be cell bound, our results suggest that this segment serves as an S-layer anchor to keep the pullulanase attached to the cell surface. Thus, a general model for the attachment of extracellular enzymes to the cell surface is proposed which assigns the S-layer a new function and might be widespread among bacteria with S-layers. The triplicated S-layer-like segment is present in several enzymes of different bacteria. Upstream of amyB, another open reading frame, coding for a hypothetical protein of 35.6 kDa, was identified. No significant similarity to other sequences available in DNA and protein data bases was found.  相似文献   

18.
A method for refining the beginnings of genes and a search for shifts of the reading frame is proposed. The method is based on a comparison of nucleotide and amino acid sequences of homologous genes of related organisms. The algorithm is based on the fact that the rate of changes in the protein-coding regions of the genome is substantially lower than that of noncoding regions. A modification of the Smith-Waterman algorithm is proposed, which makes it possible to align the amino acid sequences obtained by formal translation of the starting nucleotide sequences by taking into account a possible shift of the reading frame. The algorithm has been implemented in the package of ORTOLOGATOR-GeneCorrector programs. Testing the program showed that the approach enables one to detect a wrong annotation of the beginnings in 1% of genes (even in well-studied organisms such as Escherichia coli) and identify several (approximately 10) shifts of the open reading frame. Thus, the algorithm can be used at both the initial and final stages of analysis of the genome.  相似文献   

19.
Three mutanase (alpha-1,3-glucanase)-producing microorganisms isolated from soil samples were identified as a relatives of Paenibacillus. A mutanase was purified to homogeneity from cultures of each, and the molecular masses of the purified enzymes were approximately 132, 141, and 141kDa, respectively. The corresponding three genes for mutanases were cloned by PCR using primers designed from each N-terminal amino acid sequence. Another mutanase-like gene from one strain was also cloned by PCR using primers designed from conserved amino acid sequences among known mutanases. Consequently, four mutanase-like genes were sequenced. The genes contained long open reading frames of 3411 to 3915bp encoding 1136 to 1304 amino acids. The deduced amino acid sequences of the mutanases showed relatively high similarity to those of a mutanase (E16590) from Bacillus sp. RM1 with 46.9% to 73.2% identity and an alpha-1,3-glucanase (AB248056) from Bacillus circulans KA-304 with 46.7% to 70.4% identity. Phylogenetic analysis based on the amino acid sequences of the enzymes showed bacterial mutanases form a new family between fungal mutanases (GH family 71) and Streptomycetes mycodextranases (GH family 87).  相似文献   

20.
Shu P  Dai H  Mandecki W  Goldman E 《Gene》2004,343(1):127-132
Previously published experiments had indicated unexpected expression of a control vector in which a beta-galactosidase reporter was in the +1 reading frame relative to the translation start. This control vector contained the codon pair CCC CGA in the zero reading frame, raising the possibility that ribosomes rephased on this sequence, with peptidyl-tRNA(Pro) pairing with CCC in the +1 frame. This putative rephasing might also be exacerbated by the rare CGA Arg codon in the second position due to increased vacancy of the ribosomal A-site. To test this hypothesis, a series of site-directed mutants was constructed, including mutations in both the first and second codons of this codon pair. The results show that interrupting the continuous run of C residues with synonymous codon changes essentially abolishes the frameshift. Further, changing the rare Arg codon to a common Arg codon also reduces the frequency of the frameshift. These results provide strong support for the hypothesis that CCC CGA in the zero frame is indeed a weak translational frameshift site in Escherichia coli, with a 1-2% efficiency. Because the vector sequence also contains another CCC triplet in the +1 reading frame starting within the next codon after the CGA, our data also support possible contribution to expression of a +7 nucleotide ribosome hop into the same +1 reading frame. We also confirm here a previous report that CCC UGA is a translational frameshift site, in these experiments, with about 5% efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号