首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A method of informational decomposition has been developed, allowing one to reveal hidden periodicity in any symbol sequence. The informational decomposition is calculated without conversion of a symbol sequence into a numerical one, which facilitates finding periodicities in a symbol sequence. The method permits introducing an analog of the autocorrelation function of a symbol sequence. The method developed by us has been applied to reveal hidden periodicities in nucleotide and amino acid sequences, as well as in different poetical texts. Hidden periodicity has been detected in various genes, testifying to their quantum structure. The functional and structural role of hidden periodicity is discussed.  相似文献   

2.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the GenBank database, 14766 sequences with a periodicity of two nucleotides have been found. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

3.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the Gen-Bank database, 14 766 sequences with a periodicity of two nucleotides have been found at a high level of statistical significance. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

4.
A program package has been developed to search for hidden tandem repeats of any specified type in the protein sequence databases. The applied algorithm of the locally optimal cyclic alignment is able to find subsequences possessing a certain profile-based periodicity type when no appreciable homology between periods is observed, as well as in the presence of arbitrary insertions/deletions. The profile can be adjusted to search for the periodicity types structurally and functionally important. The Swiss-Prot database has been analyzed to reveal the periodicities undetectable earlier that are caused by the secondary and super-secondary structure regularities of the NAD-binding sites. In particular, a significant periodicity of 24 aa was found to be characteristic of the absolute majority of domains possessing the Rossman (or Rossman-like) fold and displaying the apparent regularity in their secondary structures, not being obvious at the primary structure level.  相似文献   

5.
Laskin  A. A.  Korotkov  E. V.  Chaley  M. B.  Kudryashov  N. A. 《Molecular Biology》2003,37(4):561-570
A program package has been developed to search for hidden tandem repeats of any specified type in the protein sequence databases. The applied algorithm of the locally optimal cyclic alignment is able to find subsequences possessing a certain profile-based periodicity type when no appreciable homology between periods is observed, as well as in the presence of arbitrary insertions/deletions. The profile can be adjusted to search for the periodicity types structurally and functionally important. The Swiss-Prot database has been analyzed to reveal the periodicities undetectable earlier that are caused by the secondary and super-secondary structure regularities of the NAD-binding sites. In particular, a significant periodicity of 24 aa was found to be characteristic of the absolute majority of domains possessing the Rossman (or Rossman-like) fold and displaying apparent regularity in their secondary structures, not being obvious at the primary structure level.  相似文献   

6.
A method of noise decomposition has been developed. This method allows for the identification of a latent periodicity with symbol insertions and deletions that is specific for all or most amino acid sequences belonging to the same protein family or protein domain. The latent periodicity has been identified in catalytic domains of 85% of serine/threonine and tyrosine protein kinases. Similar results have been obtained for 22 other protein families. The possible role of latent periodicity in protein families is discussed.__________Translated from Molekulyarnaya Biologiya, Vol. 39, No. 3, 2005, pp. 420–436.Original Russian Text Copyright © 2005 by Laskin, Kudryashov, Skryabin, Korotkov.  相似文献   

7.
Latent sequence periodicity of some oncogenes and DNA-binding protein genes   总被引:2,自引:0,他引:2  
A method of latent periodicity search is developed. We use mutualinformation to reveal the latent periodicity of mRNA sequences.The latent periodicity of an mRNA sequence is a periodicitywith a low level of similarity between any two periods insidethe mRNA sequence. The mutual information between an artificialnumerical sequence and an mRNA sequence is calculated. The lengthof the artificial sequence period is varied from 2 to 150. Thehigh level of the mutual information between artificial andmRNA sequences allows us to find any type of latent periodicityof mRNA sequence. The latent periodicity of many mRNA codingregions has been found. For example, the retinoblastoma geneof HSRBS clone contains a region with a latent period equalto 45 bases. The A-RAF oncogene of HSARAFIR clone contains aregion with a latent period equal to 84 bases. Integrated sequencesfor the regions with latent periodicity are determined. Thepotential significance of latent periodicity is discussed.  相似文献   

8.
A method of computer analysis of DNA sequences has been proposed. It is based on information similarity of compared sequences and it significantly increases the usefulness of the computer analysis. This approach has been applied to the search of interconnected areas of Alu-repeats and replication origins of p15A and R6K plasmids. An Alu-like region located in the first stem of the secondary structure of RNA-1 and E. coli RNA-polymerase binding site has been found in the p15A. On R6K replication origin, Alu-like repeats have been found in the area of tandem 22 bp repeats. This comparison also allowed to reveal hidden periodicity of the sequence of human Alu-repeat. A hypothesis that explained the data obtained has been proposed. The proposed approach may be used as a method for revealing DNA sequences that have similar genetic functions.  相似文献   

9.
10.
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method.  相似文献   

11.
Periodicity in DNA coding sequences: implications in gene evolution   总被引:2,自引:0,他引:2  
In this paper we have employed Fourier analysis of DNA coding and non-coding sequences in an attempt to identify possible patterns in gene sequences. It was found that while intronic sequences show a rather random pattern, coding sequences show periodicities and in particular a periodicity of 3. We were able to reconstruct such patterns by assuming a gene having one codon occurring in about 40% of the sequence. This could indicate that the predominant presence of codons all starting from the same base could confer the observed periodicities. Indeed, it was found that proteins do obey this rule. Implications of this finding in gene evolution are discussed.  相似文献   

12.
Computer-assisted sequence analysis was applied to detect the most apparent nonrandom sequence motifs in eukaryotic introns. We describe in detail a method, which we call distance analysis, that we applied to the extensive study of 405 eukaryotic intron sequences. We observed very strong two-base periodicities for almost all tetranucleotides that are tandem repeats of nonhomopolymeric dinucleotides (the exception was GCGC and CGCG). We also observed, by using a fixed-point alignment method, that these periodic sequence motifs belong to large clusters of dinucleotides repeated tandemly as many as 15–35 times, which corresponds to the cluster lengths of 30–70 bases. We did not observe two-base periodicity of tetranucleotides in the collections of either 262 spliced eukaryotic exons or 107 bacterial genes. Instead, these sequences displayed strong three-base periodicity of some other tetranucleotides. These findings suggest that introns and exons display distinct sequence properties that can be used for mapping purposes.  相似文献   

13.
MOTIVATION: Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing the intra-genomic dynamics as an important feature of comparative genomics. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length. RESULTS: The Spectral Repeat Finder program circumvents these problems by using a discrete Fourier transformation to identify significant periodicities present in a sequence. The specific regions of the sequence that contribute to a given periodicity are located through a sliding window analysis, and an exact search method is then used to find the repetitive units. Efficient and complete detection of repeats is provided together with interactive and detailed visualization of the spectral analysis of input sequence. We demonstrate the utility of our method with various examples that contain previously unannotated repeats. A Web server has been developed for convenient access to the automated program. AVAILABILITY: The Web server is available at http://www.imtech.res.in/raghava/srf and http://www2.imtech.res.in/raghava/srf  相似文献   

14.
Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.  相似文献   

15.
The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters, which brings independent evidence for the lateral gene transfer in the genome of T.maritima. The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms of different origin. The typical genomic periodicity for Bacteria is 11 bp whilst it is 10 bp for Archaea. Eukaryotes have more complex spectra but the dominant period in the yeast Saccharomyces cerevisiae is 10.2 bp. These periodicities are most likely reflective of differences in chromatin structure.  相似文献   

16.
Fukushima A  Ikemura T  Kinouchi M  Oshima T  Kudo Y  Mori H  Kanaya S 《Gene》2002,300(1-2):203-211
We used a power spectrum method to identify periodic patterns in nucleotide sequence, and characterized nucleotide sequences that confer periodicities to prokaryotic and eukaryotic genomes and genomes. A 10-bp periodicity was prevalent in hyperthermophilic bacteria and archaebacteria, and an 11-bp periodicity was prevalent in eubacteria. The 10-bp periodicity was also prevalent in the eukaryotes such as the worm Caenorhabditis elegans. Additionally, in the worm genome, a 68-bp periodicity in chromosome I, a 59-bp periodicity in chromosome II, and a 94-bp periodicity in chromosome III were found. In human chromosomes 21 and 22, approximately 167- or 84-bp periodicity was detected along the entire length of these chromosomes. Because the 167-bp is identical to the length of DNA that forms two complete helical turns in nucleosome organization, we speculated that the respective sequences may correspond to arrays of a special compact form of nucleosomes clustered in specific regions of the human chromosomes. This periodic element contained a high frequency of TGG. TGG-rich sequences are known to form a specific subset of folded DNA structures, and therefore, the sequences might have potential to form specific higher order structures related to the clustered occurrence of a specific form of the speculated nucleosomes.  相似文献   

17.
For detection of the latent periodicity of the protein families responsible for various biological functions, methods of information decomposition, cyclic profile alignment, and the method of noise decomposition have been used. The latent periodicity, being specific to a particular family, is recognized in 94 of 110 analyzed protein families. Family specific periodicity was found for more than 70% of amino acid sequences in each of these families. Based on such sequences the characteristic profile of the latent periodicity has been deduced for each family. Possible relationship between the recognized latent periodicity, evolution of proteins, and their structural organization is discussed.  相似文献   

18.
This article is in the area of protein sequence investigation. It studies protein sequence periodicity. The notion of latent periodicity is introduced. A mathematical method for searching for latent periodicity in protein sequences is developed. Implementation of the method developed for known cases of perfect and imperfect periodicity is demonstrated. Latent periodicity of many protein sequences from the SWISS-PROT data bank is revealed by the method and examples of latent periodicity of amino acid sequences are demonstrated for: the translation initiation factor EIF-2B (epsilon subunit) of Saccharomyces cerevisiae from the E2BE_YEAST sequence; the E.coli ferrienterochelin receptor from the FEPA_ECOLI sequence; the lysozyme of Bacteriophage SF6 from the LY_BPSF6 sequence; lipoamide dehydrogenase of Azotobacter vinelandii from the DLDH_AZOVI sequence. These protein sequences have latent periods equal to six, two, seven and 19 amino acids, respectively. We propose that a possible purpose of the amino acid sequence latent periodicity is to determine certain protein structures.  相似文献   

19.
S. OHNO 《Animal genetics》1988,19(4):305-316
Inasmuch as all events in this universe are governed by multitudes of periodicities, it is a mistake to regard any coding sequence as unique implying the descent from random assemblages of four bases. Instead, each coding sequence is comprised of primordial and derived repeating units. In the case of families of proteins with transmembrane alpha-helices, the primordial repeating units of their coding sequences were base heptamers, thus, giving the heptapeptidic periodicity very conductive to alpha-helix formation to the original polypeptide chains. Even in modern coding sequences for these families of proteins, intact and base-substituted copies of these primordial heptamers are found in more or less even distribution along the entire coding sequence. In addition, there are now locally prominent tandemly recurring units that are only remotely related to primordial heptamers. In the case of Ca++ channel, local prominence of one such nonameric unit gave a unique tripeptidic periodicity to the fourth helix of each unit giving to it a girdle of positively charged residues. All these complex interplays between primordial and derived recurring units that characterize each coding sequence can best be appreciated by their musical transformation. The transformed musical score of a pertinent part of rabbit skeletal muscle Ca++ channel coding sequence is given.  相似文献   

20.
Standard tests for the detection of hidden periodicities in time series are largely ignored by applied workers. Various simple but inappropriate methods are used instead. Therefore a method is suggested which is both simple and appropriate but which requires the prior knowledge of certain characteristics of the suspected periodicity. For illustration, this method is applied to a set of data from a chronobiological study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号