首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Totally, 472 288 regions of triplet periodicity were found in 578 868 genes from KEGG databank version 29 and classified. A new concept of triplet periodicity class and a measure of similarity between periodicity classes were introduced. Overall, 2520 classes were created and contained 94% of the triplet periodicity cases found. A similar correlation between the triplet periodicity and reading frame was observed for 92% of triplet periodicity regions contained in different classes. The remaining triplet periodicity regions displayed a shift of the reading frame relative to that common for the majority of genes belonging to the same triplet periodicity class. The hypothetical amino acid sequences were deduced from the periodicity regions according to the reading frame characteristic of the given triplet periodicity class. BLAST analysis demonstrated that 2660 hypothetical amino acid sequences display a statistically significant similarity to proteins from the Uni-Prot databank. It was supposed that 8% of the triplet periodicity regions contained in the classes have frameshift mutations. The triplet periodicity classes can be used to identify the coding regions in genes and to searching for frameshift mutations.  相似文献   

2.
Frenkel FE  Korotkov EV 《Gene》2008,421(1-2):52-60
We introduce a new concept of triplet periodicity class (TPC) and a measure of similarity between such classes. We performed classification of 472288 triplet periodicity (TP) regions found in 578868 genes from 29th release of KEGG databank. Totally 2520 classes were obtained. They contain 94% of 472288 found cases of TP. For 92% of TP regions contained in classes the same linkage of TP to open reading frame (ORF) is observed. For 8% of TP cases we revealed a shift between ORF of a gene and ORF common for majority of genes contained in a TPC. For these 8% of periodic regions the hypothetical amino acid sequences corresponding to ORF built by TPC were made. BLAST program has shown that 2679 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of TP regions contained in classes possess a mutation originating from ORF shift. Obtained TPCs can be used for identification of genes' coding regions as well as for searching for mutations arisen arising from ORF shift.  相似文献   

3.
We introduce a novel approach for the detection of possible mutations leading to a reading frame (RF) shift in a gene. Deletions and insertions of DNA coding regions are considerable events for genes because an RF shift results in modifications of the extensive region of amino acid sequence coded by a gene. The suggested method is based on the phenomenon of triplet periodicity (TP) in coding regions of genes and its relative resistance to substitutions in DNA sequence. We attempted to extend 326 933 regions of continuous TP found in genes from the KEGG databank by considering possible insertions and deletions. We revealed totally 824 genes where such extension was possible and statistically significant. Then we generated amino acid sequences according to active (KEGG''s) and hypothetically ancient RFs in order to find confirmation of a shift at a protein level. Consequently, 64 sequences have protein similarities only for ancient RF, 176 only for active RF, 3 for both and 581 have no protein similarity at all. We aimed to have revealed lower bound for the number of genes in which a shift between RF and TP is possible. Further ways to increase the number of revealed RF shifts are discussed.  相似文献   

4.
The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in ~16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed.  相似文献   

5.
6.
A method for refining the beginnings of genes and a search for shifts of the reading frame is proposed. The method is based on a comparison of nucleotide and amino acid sequences of homologous genes of related organisms. The algorithm is based on the fact that the rate of changes in the protein-coding regions of the genome is substantially lower than that of noncoding regions. A modification of the Smith-Waterman algorithm is proposed, which makes it possible to align the amino acid sequences obtained by formal translation of the starting nucleotide sequences by taking into account a possible shift of the reading frame. The algorithm has been implemented in the package of ORTOLOGATOR-GeneCorrector programs. Testing the program showed that the approach enables one to detect a wrong annotation of the beginnings in 1% of genes (even in well-studied organisms such as Escherichia coli) and identify several (approximately 10) shifts of the open reading frame. Thus, the algorithm can be used at both the initial and final stages of analysis of the genome.  相似文献   

7.
The DNA sequences of the entire coding regions of the A and C type variable surface protein genes from Paramecium tetraurelia, stock 51 have been determined. The 8151 nucleotide open reading frame of the A gene contains several tandem repeats of 210 nucleotides within the central portion of the molecule as well as a periodic structure defined by cysteine residues. The 6699 nucleotide open reading frame of the C gene does not contain any identifiable tandem repeats or internal similarity but maintains a periodicity based on the cysteine residue spacing. The deduced amino acid sequences encoded by the two genes are most similar within the 600 amino-terminal and 600 carboxyl-terminal amino acid residues, the central portions show only limited sequence similarity. We conclude that internal repeats are not a conserved feature of variable surface proteins in Paramecium and discuss the possible importance of the regular pattern of cysteine residues.  相似文献   

8.
Authentic cDNAs encoding the activator protein for acid beta-glucosidase (EC3.2.1.45), co-beta-glucosidase, were cloned from the pCD and lambda gt11 human cDNA libraries. Initial screening with oligonucleotide mixtures encoding amino acid sequences of co-beta-glucosidase identified partial cDNAs which were used to obtain a potentially full-length cDNA from the lambda gt11 library. This clone (2767 bp), EGTISI, contained 5' (38 bp) and 3' (1157 bp) noncoding sequences, a translation initiation site, and an open reading frame encoding 524 amino acids which included a typical hydrophobic signal sequence (16 amino acids). Computer analyses identified three regions of high similarity to co-beta-glucosidase encoded by tandem sequences in EGTISI. Searches revealed that two of these regions encoded peptides of known function; SAP1 (sphingolipid activator protein 1) and protein C (a new sphingolipid activator protein) were encoded by EGTISI sequences 5' and 3', respectively, to those for co-beta-glucosidase. The third region of similarity, encoding a theoretical peptide (undefined function), was located most 5' in the cDNA. EGTISI and its encoded polypeptide had high similarity (77% nucleotide identity and about 80% amino acid similarity) to a rat Sertoli cell cDNA and its encoded sulfated glycoprotein-1. These results indicate that a single highly conserved gene encodes the precursor for four potential sphingolipid activator proteins in rat and man.  相似文献   

9.
The complete pullulanase gene (amyB) from Thermoanaerobacterium thermosulfurigenes EM1 was cloned in Escherichia coli, and the nucleotide sequence was determined. The reading frame of amyB consisted of 5,586 bp encoding an exceptionally large enzyme of 205,991 Da. Sequence analysis revealed a composite structure of the pullulanase consisting of catalytic and noncatalytic domains. The N-terminal half of the protein contained a leader peptide of 35 amino acid residues and the catalytic domain, which included the four consensus regions of amylases. Comparison of the consensus regions of several pullulanases suggested that enzymes like pullulanase type II from T. thermosulfurigenes EM1 which hydrolyze alpha-1,4- and alpha-1,6-glycosidic linkages have specific amino acid sequences in the consensus regions. These are different from those of pullulanases type I which only cleave alpha-1,6 linkages. The C-terminal half, which is not necessary for enzymatic function, consisted of at least two different segments. One segment of about 70 kDa contained two copies of a fibronectin type III-like domain and was followed by a linker region rich in glycine, serine, and threonine residues. At the C terminus, we found three repeats of about 50 amino acids which are also present at the N-termini of surface layer (S-layer) proteins of, e.g., Thermus thermophilus and Acetogenium kivui. Since the pullulanase of T. thermosulfurigenes EM1 is known to be cell bound, our results suggest that this segment serves as an S-layer anchor to keep the pullulanase attached to the cell surface. Thus, a general model for the attachment of extracellular enzymes to the cell surface is proposed which assigns the S-layer a new function and might be widespread among bacteria with S-layers. The triplicated S-layer-like segment is present in several enzymes of different bacteria. Upstream of amyB, another open reading frame, coding for a hypothetical protein of 35.6 kDa, was identified. No significant similarity to other sequences available in DNA and protein data bases was found.  相似文献   

10.
11.
The nucleotide sequence of a thermophilic, liquefying alpha-amylase gene cloned from B. stearothermophilus was determined. The NH2-terminal amino acid sequence analysis of the B. stearothermophilus alpha-amylase confirmed that the reading frame of the gene consisted of 1,644 base pairs (548 amino acids). The B. stearothermophilus alpha-amylase had a signal sequence of 34 amino acids, which was cleaved at exactly the same site in E. coli. The mature enzyme contained two cysteine residues, which might play an important role in maintenance of a stable protein conformation. Comparison of the amino acid sequence inferred from the B. stearothermophilus alpha-amylase gene with those inferred from other bacterial liquefying alpha-amylase genes and with the amino acid sequences of eukaryotic alpha-amylases showed three homologous sequences in the enzymatically functional regions.  相似文献   

12.
13.
Determination of the amino acid sequence of beef pancreas tryptophanyl-tRNA synthetase was undertaken through both cDNA and direct peptide sequencing. A full-length cDNA clone containing a 475 amino acid open reading frame was obtained. The molecular mass of the corresponding peptide chain, 53,728 Da, was in agreement with that of beef tryptophanyl-tRNA synthetase, as determined by physicochemical methods (54 kDa). Expression of this clone in Escherichia coli led to tryptophanyl-tRNA synthetase activity in cell extracts. The open reading frame included two sequences analogous to the consensus sequences, HIGH and KMSKS, found in class I aminoacyl-tRNA synthetases. The homology with prokaryotic and yeast mitochondrial tryptophanyl-tRNA synthetases was low and was limited to the regions of the consensus sequences. However, a 90% homology was observed with the recently described rabbit peptide chain release factor (eRF) [Lee et al. (1990) Proc. Natl. Acad. Sci. 87, 3508-3512]. Such a strong homology may reveal a new group of genes deriving from a common ancestor, the products of which could be involved in tRNA aminoacylation (tryptophanyl-tRNA synthetase) or translation termination (eRF).  相似文献   

14.
We have isolated and sequenced two full-length cDNA clones encoding actin from carrot. The two carrot clones are almost identical at the nucleotide level, and are quite homologous to each other and to other plant actins at the amino acid level. In those regions where amino acid variation exists between the two genes from carrot, the differences have arisen from very simple changes at the nucleotide level. The most common changes are nucleotide insertion(s) coupled to the deletion of a different nucleotide(s) nearby in the DNA sequence, resulting in the restoration of the proper reading frame for the protein; thus, these changes can be viewed as multiple or coupled frameshift mutations. There are almost no base substitutions between the two carrot genes. In contrast to this, when the carrot actin nucleotide sequences are compared to those of a soybean actin gene or a maize actin gene, many base substitutions are observed (ca. 21.8% and 23.5%), more than half of which are third base changes which do not alter the protein sequence. At the amino acid level, both carrot genes show greater similarity to maize actin than they do to soybean actin, thus reinforcing the idea that plant actin genes diverged from a single common ancestral actin gene prior to the divergence of monocots and dicots.  相似文献   

15.
Two genes encoding haloacetate dehalogenases, H-1 and H-2, are closely linked on a plasmid from Moraxella sp. strain B. H-1 predominantly acts on fluoroacetate, but H-2 does not. To elucidate the molecular relationship between the two enzymes, we compared their structural genes. Two restriction fragments of the plasmid DNA were subcloned on M13 phages and their nucleotide sequences were determined. The sequence of each fragment contained an open reading frame that was identified as the structural gene for each of the two dehalogenases on the basis of the following criteria; N-terminal amino acid sequence, amino acid composition, and molecular mass. The genes for H-1 and H-2, designated dehH1 and dehH2, respectively, had different sizes (885 bp and 675 bp) and G+C contents (58.3% and 53.4%). Sequence analysis revealed no homology between the two genes. We concluded that the dehalogenases H-1 and H-2 have no enzyme-evolutionary relationship. The deduced amino acid sequence of the dehH1 gene showed significant similarity to those of three hydrolases of Pseudomonas putida and a haloalkane dehalogenase of Xanthobacter autotrophicus. The dehH2 coding region was sandwiched between two repeated sequences about 1.8 kb long, which might play a part in the frequent spontaneous deletion of dehH2 from the plasmid.  相似文献   

16.
A 2 kb fragment was isolated from an Anacystis nidulans genomic DNA library by hybridization with synthetic oligonucleotide probes derived from the N-terminal amino acid sequence of Anacystis photolyase. This fragment contains a 1452 bp-long open reading frame encoding a polypeptide of 484 amino acids (Mr 54475). Antibodies raised against purified Anacystis photolyase reacted with extracts of cells harboring fused genes between lacZ of Escherichia coli and this gene. A 40.7% similarity was found between the deduced amino acid sequences of Anacystis and E. coli photolyases, notwithstanding the difference in chromophore structure.  相似文献   

17.
This work reports the isolation and characterization of a gene encoding a superoxide dismutase (SOD. EC.1.15.1.1.) from Pneumocystis carinii derived from rat. Sense and antisense oligonucleotides, deduced from SOD amino acid sequences from a wide variety of organisms, allowed amplification of a 669 bp genomic DNA fragment specific to this P. carinii. RACE-PCR was used to obtain the major pan of the complementary DNA; the 5- and 3'-genomic regions were obtained respectively from a Mbo I subgenomic library and from an amplified fragment using oligonucleotides designed from the cDNA sequence. Comparison of genomic and cDNA sequences showed an open reading frame of 660 bp interrupted by seven small introns. The deduced amino acid sequence contained 220 residues. Protein sequence alignment demonstrated the highest homology (50.5% identity. 70.3% similarity) with Saccharomyces cerevisiae manganese-SOD (MnSOD) suggesting that P. carinii SOD belongs to the mitochondrial MnSOD group. A putative targeting peptide found at the 5'-end of the P. carinii SOD sequence also suggested its mitochondrial localization.  相似文献   

18.
19.
The DNA sequences of the argG genes of Methanosarcina barkeri MS and Methanococcus vannielii were determined. The polypeptide products of these methanogen genes have amino acid sequences which are 50% identical to each other and 38% identical to the amino acid sequence encoded by the exons of the human argininosuccinate synthetase gene. Introns in the human chromosomal gene separate regions which encode amino acids conserved in both the archaebacterial and human gene products. An open reading frame immediately upstream of argG in Methanosarcina barkeri MS codes for an amino acid sequence which is 45 and 31% identical to the sequences of the large subunits of carbamyl phosphate synthetase in Escherichia coli and Saccharomyces cerevisiae, respectively. If this gene encodes carbamyl phosphate synthetase in Methanosarcina barkeri, this is the first example, in an archaebacterium, of physical linkage of genes that encode enzymes which catalyze reactions in the same amino acid biosynthetic pathway.  相似文献   

20.
Three mutanase (alpha-1,3-glucanase)-producing microorganisms isolated from soil samples were identified as a relatives of Paenibacillus. A mutanase was purified to homogeneity from cultures of each, and the molecular masses of the purified enzymes were approximately 132, 141, and 141kDa, respectively. The corresponding three genes for mutanases were cloned by PCR using primers designed from each N-terminal amino acid sequence. Another mutanase-like gene from one strain was also cloned by PCR using primers designed from conserved amino acid sequences among known mutanases. Consequently, four mutanase-like genes were sequenced. The genes contained long open reading frames of 3411 to 3915bp encoding 1136 to 1304 amino acids. The deduced amino acid sequences of the mutanases showed relatively high similarity to those of a mutanase (E16590) from Bacillus sp. RM1 with 46.9% to 73.2% identity and an alpha-1,3-glucanase (AB248056) from Bacillus circulans KA-304 with 46.7% to 70.4% identity. Phylogenetic analysis based on the amino acid sequences of the enzymes showed bacterial mutanases form a new family between fungal mutanases (GH family 71) and Streptomycetes mycodextranases (GH family 87).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号