首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the GenBank database, 14766 sequences with a periodicity of two nucleotides have been found. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

2.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the Gen-Bank database, 14 766 sequences with a periodicity of two nucleotides have been found at a high level of statistical significance. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

3.
We conducted classification for 472,288 regions of triplet periodicity found in 578,868 genes from release 29 of KEGG databank. A new concept of triplet periodicity class and a measure of similarity between them are introduced. Totally 2520 classes were created that contain 94% of found triplet periodicity. For 92% of triplet periodicity regions contained in classes an identical linkage of triplet periodicity to reading frame is observed. For the rest triplet periodicity cases a shift between reading frame of a gene and reading frame common for majority of genes contained in a class of triplet periodicity was observed. These periodicity regions were encoded into hypothetical amino acid sequences in accordance with reading frame built by triplet periodicity class. By BLAST program it was shown that 2660 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of triplet periodicity regions that joined classes mutated by means of reading frame shift. Created classes of triplet periodicity can be used for identification of coding regions of genes as well as for searching for mutations arisen from reading frame shift.  相似文献   

4.
Totally, 472 288 regions of triplet periodicity were found in 578 868 genes from KEGG databank version 29 and classified. A new concept of triplet periodicity class and a measure of similarity between periodicity classes were introduced. Overall, 2520 classes were created and contained 94% of the triplet periodicity cases found. A similar correlation between the triplet periodicity and reading frame was observed for 92% of triplet periodicity regions contained in different classes. The remaining triplet periodicity regions displayed a shift of the reading frame relative to that common for the majority of genes belonging to the same triplet periodicity class. The hypothetical amino acid sequences were deduced from the periodicity regions according to the reading frame characteristic of the given triplet periodicity class. BLAST analysis demonstrated that 2660 hypothetical amino acid sequences display a statistically significant similarity to proteins from the Uni-Prot databank. It was supposed that 8% of the triplet periodicity regions contained in the classes have frameshift mutations. The triplet periodicity classes can be used to identify the coding regions in genes and to searching for frameshift mutations.  相似文献   

5.
Frenkel FE  Korotkov EV 《Gene》2008,421(1-2):52-60
We introduce a new concept of triplet periodicity class (TPC) and a measure of similarity between such classes. We performed classification of 472288 triplet periodicity (TP) regions found in 578868 genes from 29th release of KEGG databank. Totally 2520 classes were obtained. They contain 94% of 472288 found cases of TP. For 92% of TP regions contained in classes the same linkage of TP to open reading frame (ORF) is observed. For 8% of TP cases we revealed a shift between ORF of a gene and ORF common for majority of genes contained in a TPC. For these 8% of periodic regions the hypothetical amino acid sequences corresponding to ORF built by TPC were made. BLAST program has shown that 2679 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of TP regions contained in classes possess a mutation originating from ORF shift. Obtained TPCs can be used for identification of genes' coding regions as well as for searching for mutations arisen arising from ORF shift.  相似文献   

6.
A web server for searching latent periodicity based on the method of modified profile analysis has been developed. This method allows searching latent periodicity in presence of insertions and deletions. During searching process, the periodicity classes are used which were found by us earlier for various groups of organisms. Period length belongs to the range 2-20 nt, not including the triplet periodicity. The results obtained are subjected to various filtration steps to ensure their statistical significance. Availability: The use of web server is free for non-commercial users. No registration is required. URL of the server is http://victoria.biengi.ac.ru/lepscan. Current software version is 1.06.  相似文献   

7.
This article is in the area of protein sequence investigation. It studies protein sequence periodicity. The notion of latent periodicity is introduced. A mathematical method for searching for latent periodicity in protein sequences is developed. Implementation of the method developed for known cases of perfect and imperfect periodicity is demonstrated. Latent periodicity of many protein sequences from the SWISS-PROT data bank is revealed by the method and examples of latent periodicity of amino acid sequences are demonstrated for: the translation initiation factor EIF-2B (epsilon subunit) of Saccharomyces cerevisiae from the E2BE_YEAST sequence; the E.coli ferrienterochelin receptor from the FEPA_ECOLI sequence; the lysozyme of Bacteriophage SF6 from the LY_BPSF6 sequence; lipoamide dehydrogenase of Azotobacter vinelandii from the DLDH_AZOVI sequence. These protein sequences have latent periods equal to six, two, seven and 19 amino acids, respectively. We propose that a possible purpose of the amino acid sequence latent periodicity is to determine certain protein structures.  相似文献   

8.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

9.
An earlier reported method for revealing latent periodicity of the nucleotide sequences has been considerably modified in a case of small samples, by applying a Monte Carlo method. This improved method has been used to search for the latent periodicity of some nucleotide sequences of the EMBL data bank. The existence of the nucleotide sequences' latent periodicity has been shown for some genes. The results obtained have implied that periodicity of gene structure is projected onto the periodicity of primary amino acid sequences and, further, onto spatial protein conformation. Even though the periodic structure of gene sequences has been eroded, it is still retained in primary and/or spatial structures of corresponding proteins. Furthermore, in a few cases the study of genes' periodicity has suggested their possible evolutionary origin by multifold duplications of some gene's fragments.  相似文献   

10.
A method of noise decomposition has been developed. This method allows for the identification of a latent periodicity with symbol insertions and deletions that is specific for all or most amino acid sequences belonging to the same protein family or protein domain. The latent periodicity has been identified in catalytic domains of 85% of serine/threonine and tyrosine protein kinases. Similar results have been obtained for 22 other protein families. The possible role of latent periodicity in protein families is discussed.__________Translated from Molekulyarnaya Biologiya, Vol. 39, No. 3, 2005, pp. 420–436.Original Russian Text Copyright © 2005 by Laskin, Kudryashov, Skryabin, Korotkov.  相似文献   

11.
A mathematical method has been developed in order to search for latent periodicity in protein amino-acid and other symbolical sequences using dynamic programming and random matrices. The method allows the detection of the latent periodicity with insertions and deletions at positions that are unknown beforehand. The developed method has been applied to search for the periodicity in the amino-acid sequences of several proteins and in the euro/dollar exchange rate since 2001. The presence of a long period with insertions and deletions in amino-acid sequences is shown. The period length of seven amino acids is observed in the proteins that contain supercoiled regions (a coiled-coil structure) as well as of six, five, or more amino acids. The existence of the period length of 6 and 7 days, as well as 24 and 25 h in the analyzed financial time series is observed; note that this periodicity is detectable only for insertions and deletions. The causes that underlie the occurrence of the latent periodicity with insertions and deletions in amino-acid sequences and financial time series are discussed.  相似文献   

12.
For detection of the latent periodicity of the protein families responsible for various biological functions, methods of information decomposition, cyclic profile alignment, and the method of noise decomposition have been used. The latent periodicity, being specific to a particular family, is recognized in 94 of 110 analyzed protein families. Family specific periodicity was found for more than 70% of amino acid sequences in each of these families. Based on such sequences the characteristic profile of the latent periodicity has been deduced for each family. Possible relationship between the recognized latent periodicity, evolution of proteins, and their structural organization is discussed.  相似文献   

13.
Computer-assisted sequence analysis was applied to detect the most apparent nonrandom sequence motifs in eukaryotic introns. We describe in detail a method, which we call distance analysis, that we applied to the extensive study of 405 eukaryotic intron sequences. We observed very strong two-base periodicities for almost all tetranucleotides that are tandem repeats of nonhomopolymeric dinucleotides (the exception was GCGC and CGCG). We also observed, by using a fixed-point alignment method, that these periodic sequence motifs belong to large clusters of dinucleotides repeated tandemly as many as 15–35 times, which corresponds to the cluster lengths of 30–70 bases. We did not observe two-base periodicity of tetranucleotides in the collections of either 262 spliced eukaryotic exons or 107 bacterial genes. Instead, these sequences displayed strong three-base periodicity of some other tetranucleotides. These findings suggest that introns and exons display distinct sequence properties that can be used for mapping purposes.  相似文献   

14.
Prediction of protein structural class from the amino acid sequence   总被引:9,自引:0,他引:9  
P Klein  C Delisi 《Biopolymers》1986,25(9):1659-1672
The multidimensional statistical technique of discriminant analysis is used to allocate amino acid sequences to one of four secondary structural classes: high α content, high β content, mixed α and β, low content of ordered structure. Discrimination is based on four attributes: estimates of percentages of α and β structures, and regular variations in the hydrophobic values of residues along the sequence, occurring with periods of 2 and 3.6 residues. The reliability of the method, estimated by classifying 138 sequences from the Brookhaven Protein Data Bank, is 80%, with no misallocations between α-rich and β-rich classes. The reliability can be increased to 84% by making no allocation for proteins classified with odds close to 1. Classification using previously developed secondary structural prediction methods is considerably less reliable, the best result being 64% obtained using predictions based on the Delphi method.  相似文献   

15.
According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics.  相似文献   

16.
Method of informational decomposition has been developed, allowing one to reveal hidden periodicity in any symbol sequences. The informational decomposition is calculated without conversion of a symbol sequence into the numerical one, which facilitates finding periodicities in a symbol sequence. The method permits introducing an analog of the autocorrelation function of a symbol sequence. The method developed by us has been applied to reveal hidden periodicities in nucleotide and amino acid sequences, as well as in different poetical texts. Hidden periodicity has been detected in various genes, testifying to their quantum structure. The functional and structural role of hidden periodicity is discussed.  相似文献   

17.
A simple method for estimating the transition/transversion ratio was developed. This method can be applied to not only two sequences but also more than two sequences. The statistical properties of the method and some other methods were examined by numerical computation and computer simulation. The results obtained showed that, in terms of bias and variance, the new method gives a better estimate of the transition/transversion ratio than do the other examined methods. The new method was applied to human and chimpanzee mitochondrial control region sequences. Received: 22 September 1997 / Accepted: 1 November 1997  相似文献   

18.
Suvorova YM  Rudenko VM  Korotkov EV 《Gene》2012,491(1):58-64
The triplet periodicity (TP) is a distinguished property of protein coding sequences. There are complex genes with more than one TP type along their sequence. We say that these genes contain a triplet periodicity change point. The aim of the work is to find all genes that contain TP change point and attempt to compare the positions of change point in genes with known biological data. We have developed a mathematical method to identify triplet periodicity changes along a sequence. We have found 311,221 genes with the TP change point in the KEGG/Genes database (version 48). It is about 8% from the total database volume (4013150). We showed that the repetitive sequences are not the only cause of such events. We suppose that the TP change point may indicate a fusion of genes or domains. We performed BLAST analysis to find potential ancestral genes for the parts of genes with TP change point. As a result we found that in 131323 cases sequences with TP change point have proper similarities for one or both parts. The relationship between TP change point and the fusion events in genes is discussed.The program realization of the method is available by request to authors.  相似文献   

19.
Latent sequence periodicity of some oncogenes and DNA-binding protein genes   总被引:2,自引:0,他引:2  
A method of latent periodicity search is developed. We use mutualinformation to reveal the latent periodicity of mRNA sequences.The latent periodicity of an mRNA sequence is a periodicitywith a low level of similarity between any two periods insidethe mRNA sequence. The mutual information between an artificialnumerical sequence and an mRNA sequence is calculated. The lengthof the artificial sequence period is varied from 2 to 150. Thehigh level of the mutual information between artificial andmRNA sequences allows us to find any type of latent periodicityof mRNA sequence. The latent periodicity of many mRNA codingregions has been found. For example, the retinoblastoma geneof HSRBS clone contains a region with a latent period equalto 45 bases. The A-RAF oncogene of HSARAFIR clone contains aregion with a latent period equal to 84 bases. Integrated sequencesfor the regions with latent periodicity are determined. Thepotential significance of latent periodicity is discussed.  相似文献   

20.
Fukushima A  Ikemura T  Kinouchi M  Oshima T  Kudo Y  Mori H  Kanaya S 《Gene》2002,300(1-2):203-211
We used a power spectrum method to identify periodic patterns in nucleotide sequence, and characterized nucleotide sequences that confer periodicities to prokaryotic and eukaryotic genomes and genomes. A 10-bp periodicity was prevalent in hyperthermophilic bacteria and archaebacteria, and an 11-bp periodicity was prevalent in eubacteria. The 10-bp periodicity was also prevalent in the eukaryotes such as the worm Caenorhabditis elegans. Additionally, in the worm genome, a 68-bp periodicity in chromosome I, a 59-bp periodicity in chromosome II, and a 94-bp periodicity in chromosome III were found. In human chromosomes 21 and 22, approximately 167- or 84-bp periodicity was detected along the entire length of these chromosomes. Because the 167-bp is identical to the length of DNA that forms two complete helical turns in nucleosome organization, we speculated that the respective sequences may correspond to arrays of a special compact form of nucleosomes clustered in specific regions of the human chromosomes. This periodic element contained a high frequency of TGG. TGG-rich sequences are known to form a specific subset of folded DNA structures, and therefore, the sequences might have potential to form specific higher order structures related to the clustered occurrence of a specific form of the speculated nucleosomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号