首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
By means of the diffusion entropy approach, we detect the scale-invariance characteristics embedded in the 4737 human promoter sequences. The exponent for the scale-invariance is in a wide range of [0.3,0.9], which centered at delta(c)=0.66. The distribution of the exponent can be separated into left and right branches with respect to the maximum. The left and right branches are asymmetric and can be fitted exactly with Gaussian form with different widths, respectively.  相似文献   

2.
3.
Statistical analysis of nucleotide sequences.   总被引:1,自引:4,他引:1       下载免费PDF全文
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites.  相似文献   

4.
5.
6.
7.
8.
Discriminant analysis of promoter regions in Escherichia coli sequences   总被引:2,自引:0,他引:2  
We have previously developed a general method based on the statisticaltechnique of discriminant analysis to predict splice junctionsin eukaryotic mRNA sequences [Nakata, K., Kanehisa, M. and DeLisi,C. (1985) Nucleic Acids Res., 13, 5327–5340]. In orderto evaluate further applicability of this method, we now analyzethe promoter region of Escherichia coli sequences. The attributesused for discrimination include the accuracy of consensus sequencepatterns measured by the perceptron algorithm, the thermal stabilitymap, the base composition and the Calladine-Dickerson rulesfor helical twist angle, roll angle, torsion angle and propellertwist angle. When applied to selected E. coli sequences in theGenBank database, the method correctly identifies 75 % of thetrue promoter regions. Received on May 15, 1987; accepted on April 17, 1988  相似文献   

9.
10.
Recognition of coding regions within eukaryotic genomes is one of oldest but yet not solved problems of bioinformatics. New high-accuracy methods of splicing sites recognition are needed to solve this problem. A question of current interest is to identify specific features of nucleotide sequences nearby splicing sites and recognize sites in sequence context. We performed a statistical analysis of human genes fragment database and revealed some characteristics of nucleotide sequences in splicing sites neighborhood. Frequencies of all nucleotides and dinucleotides in splicing sites environment were computed and nucleotides and dinucleotides with extremely high\low occurrences were identified. Statistical information obtained in this work can be used in further development of the methods of splicing sites annotation and exon-intron structure recognition.  相似文献   

11.
12.
13.
Prediction of gene sequences and their exon-intron structure in large eukaryotic genomic sequences is one of the central problems of mathematical biology. Solving this problem involves, in particular, high-accuracy splice site recognition. Using statistical analysis of a splice site-containing human gene fragment database, some characteristic features were described for nucleotide sequences in the splicing site neighborhood, the frequencies of all nucleotides and dinucleotides were determined, and those with frequencies increased or decreased in comparison to a random sequence were identified. The results can be used in sequence annotation, splicing site prediction, and the recognition of the gene exon-intron structure.  相似文献   

14.
A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.  相似文献   

15.
16.
Sequence-dependent flexibility in promoter sequences   总被引:7,自引:0,他引:7  
The non-neighbor interactions between base-pairs were taken into account to calculate the angular parameters (Omega, rho and tau) describing the orientation of successive base-pair planes and the translation parameters (D(y)) along the long axis of base-pair steps for 36 independent tetramers. A statistical mechanical model was proposed to predict the DNA flexibility that is mainly related to the thermal fluctuations at individual base-pair steps. The DNA flexibility can be described by the root-mean-square deviation of the end-to-end distance of DNA helical structure. The present model was then used to investigate the extreme flexible pattern in prokaryotic and eukaryotic promoter sequences. The results demonstrated several extreme flexible regions related to functionally important elements exist both in prokaryotic promoters and in eukaryotic promoters, DNA flexibility and AT content are highly correlated. The probabilities finding flexibility pattern in promoter sequences were also estimated statistically. The biological implications were discussed briefly.  相似文献   

17.
18.
Compilation and analysis of eukaryotic POL II promoter sequences.   总被引:52,自引:20,他引:32       下载免费PDF全文
  相似文献   

19.
Compilation and analysis of Escherichia coli promoter DNA sequences.   总被引:472,自引:130,他引:472       下载免费PDF全文
The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter mutations. Nearly all of the altered base pairs in the mutants conform to the following general rule: down-mutations decrease homology and up-mutations increase homology to the consensus sequence.  相似文献   

20.

Background  

Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号