共查询到20条相似文献,搜索用时 15 毫秒
1.
R H Stanley N V Dokholyan S V Buldyrev S Havlin H E Stanley 《Journal of biomolecular structure & dynamics》1999,17(1):79-87
We develop a quantitative method for analyzing repetitions of identical short oligomers in coding and noncoding DNA sequences. We analyze sequences presently available in the GenBank separately for primate, mammal, vertebrate, rodent, invertebrate and plant taxonomic partitions. We find that some oligomers "cluster" more than they would if randomly distributed, while other oligomers "repel" each other. To quantify this degree of clustering, we define clustering measures. We find that (i) clustering significantly differs in coding and noncoding DNA; (ii) in most cases, monomers, dimers and tetramers cluster in noncoding DNA but appear to repel each other in coding DNA. (iii) The degree of clustering for different sources (primates, invertebrates, and plants) is more conserved among these sources in the case of coding DNA than in the case of noncoding DNA. (iv) In contrast to other oligomers, we find that trimers always prefer to cluster. (v) Clustering of each particular oligomer is conserved within the same organism. 相似文献
2.
The entropies of protein coding genes from Escherichia coli were calculated according to Boltzmann's formula. Entropies of the coding regions were compared to the entropies of noncoding or miscoding ones. With nucleotides as code units, the entropies of the coding regions, when compared to the entropies of complete sequences (leader and coding region as well as trailer), were seen to be lower but with a marginal statistical significance. With triplets of nucleotides as code units, the entropies of correct reading frames were significantly lower than the entropies of frameshifts +1 and -1. With amino acids as code units, the results were opposite: Biologically functional proteins had significantly higher entropies than proteins translated from the frameshifted sequences. We attempt to explain this paradox with the hypothesis that the genetic code may have the ability of lowering information content (increasing entropy) of proteins while translating them from DNA. This ability might be beneficial to bacteria because it would make the functional proteins more probable (having a higher entropy) than nonfunctional proteins translated from frameshifted sequences. 相似文献
3.
Relationship between pyrimidine distribution patterns and radiosensitivity (Z) of DNA molecules of different species was derived by computer analysis of recurrence frequency of pyrimidine clusters. Blocking factors (beta) and Z for coding and non-coding DNA sequences of species from different taxonomic classes have been calculated within a new model. The radiosensitivity of coding DNA sequences practically does not vary whereas Z values were increased during evolution from simplest to higher organisms. The beta and Z values calculated for several groups of individual genes were shown to vary considerably. 相似文献
4.
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites. 相似文献
5.
Detecting selection in noncoding regions of nucleotide sequences 总被引:2,自引:0,他引:2
We present a maximum-likelihood method for examining the selection pressure and detecting positive selection in noncoding regions using multiple aligned DNA sequences. The rate of substitution in noncoding regions relative to the rate of synonymous substitution in coding regions is modeled by a parameter zeta. When a site in a noncoding region is evolving neutrally zeta = 1, while zeta > 1 indicates the action of positive selection, and zeta < 1 suggests negative selection. Using a combined model for the evolution of noncoding and coding regions, we develop two likelihood-ratio tests for the detection of selection in noncoding regions. Data analysis of both simulated and real viral data is presented. Using the new method we show that positive selection in viruses is acting primarily in protein-coding regions and is rare or absent in noncoding regions. 相似文献
6.
Characteristics of nucleotide blocks in coding and non-coding DNA sequences from different organisms
Iu A Sprizhitski? Iu D Nechipurenko A A Aleksandrov M V Vol'kenshte?n 《Molekuliarnaia biologiia》1988,22(2):338-356
A statistical analysis of occurrence of particular nucleotide runs (1 divided by 10 nucleotides long) in DNA sequences of different species has been carried out. There are considerable differences in run distributions in DNA sequences of prokaryotes, invertebrates and vertebrates. Distribution of various types of runs has been found to be different in coding and non-coding sequences. There is an abundance of short runs 1 divided by 2 nucleotides long in coding sequences, and there is a deficiency of such runs in the non-coding regions. However, some interesting exceptions from this rule exist: for run distribution of adenine in prokaryotes and for distribution of purine-pyrimidine runs in eukaryotes. This may be stipulated by the fact that the distribution of runs are predetermined by structural peculiarities of the entire DNA molecule. Runs of guanine or cytosine of three to six nucleotides long occur predominantly in the non-coding DNA regions in eukaryotes, especially in vertebrates. 相似文献
7.
8.
9.
Background
The study of large-scale genome structure has revealed patterns suggesting the influence of evolutionary constraints on genome evolution. However, the results of these studies can be difficult to interpret due to the conceptual complexity of the analyses. This makes it difficult to understand how observed statistical patterns relate to the physical distribution of genomic elements. We use a simpler and more intuitive approach to evaluate patterns of genome structure.Methodology/Principal Findings
We used randomization tests based on Morisita''s Index of aggregation to examine average differences in the distribution of purines and pyrimidines among coding and noncoding regions of 261 chromosomes from 223 microbial genomes representing 21 phylum level groups. Purines and pyrimidines were aggregated in the noncoding DNA of 86% of genomes, but were only aggregated in the coding regions of 52% of genomes. Coding and noncoding DNA differed in aggregation in 94% of genomes. Noncoding regions were more aggregated than coding regions in 91% of these genomes. Genome length appears to limit aggregation, but chromosome length does not. Chromosomes from the same species are similarly aggregated despite substantial differences in length. Aggregation differed among taxonomic groups, revealing support for a previously reported pattern relating genome structure to environmental conditions.Conclusions/Significance
Our approach revealed several patterns of genome structure among different types of DNA, different chromosomes of the same genome, and among different taxonomic groups. Similarity in aggregation among chromosomes of varying length from the same genome suggests that individual chromosome structure has not evolved independently of the general constraints on genome structure as a whole. These patterns were detected using simple and readily interpretable methods commonly used in other areas of biology. 相似文献10.
Recognition of coding regions within eukaryotic genomes is one of oldest but yet not solved problems of bioinformatics. New high-accuracy methods of splicing sites recognition are needed to solve this problem. A question of current interest is to identify specific features of nucleotide sequences nearby splicing sites and recognize sites in sequence context. We performed a statistical analysis of human genes fragment database and revealed some characteristics of nucleotide sequences in splicing sites neighborhood. Frequencies of all nucleotides and dinucleotides in splicing sites environment were computed and nucleotides and dinucleotides with extremely high\low occurrences were identified. Statistical information obtained in this work can be used in further development of the methods of splicing sites annotation and exon-intron structure recognition. 相似文献
11.
Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences. We present the COMIT
algorithm to detect functional noncoding motifs in coding regions using sequence conservation, explicitly separating nucleotide
from amino acid effects. COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication
motifs, and microRNA targets, and predicts many novel functional motifs. Intriguingly, COMIT scores are well-correlated to
scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints. 相似文献
12.
Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates 总被引:11,自引:5,他引:11
Insertions and deletions are responsible for gaps in aligned nucleotide
sequences, but they have been usually ignored when the number of nucleotide
substitutions was estimated. We compared six sets of nuclear and
mitochondrial noncoding DNA sequences of primates and obtained the
estimates of the evolutionary rate of insertion and deletion. The
maximum-parsimony principle was applied to locate insertions and deletions
on a given phylogenetic tree. Deletions were about twice as frequent as
insertions for nuclear DNA, and single-nucleotide insertions and deletions
were the most frequent in all events. The rate of insertion and deletion
was found to be rather constant among branches of the phylogenetic tree,
and the rate (approximately 2.0/kb/Myr) for mitochondrial DNA was found to
be much higher than that (approximately 0.2/kb/Myr) for nuclear DNA. The
rates of nucleotide substitution were about 10 times higher than the rate
of insertion and deletion for both nuclear and mitochondrial DNA.
相似文献
13.
14.
The coding function of nucleotide sequences can be discerned by statistical analysis 总被引:6,自引:0,他引:6
The nucleotide sequences of the RNA phage MS2 and the DNA phage φX were subjected to statistical analysis. This analysis alone indicates (a) that the genetic code is a non-overlapping triplet code and (b) what the correct reading frame is. The application of these methods to identify structure in sequences of unknown function is discussed. 相似文献
15.
Determination of nucleotide sequences in DNA 总被引:6,自引:0,他引:6
Frederick Sanger 《Bioscience reports》1981,1(1):3-18
16.
17.
18.
19.
Prediction of gene sequences and their exon-intron structure in large eukaryotic genomic sequences is one of the central problems of mathematical biology. Solving this problem involves, in particular, high-accuracy splice site recognition. Using statistical analysis of a splice site-containing human gene fragment database, some characteristic features were described for nucleotide sequences in the splicing site neighborhood, the frequencies of all nucleotides and dinucleotides were determined, and those with frequencies increased or decreased in comparison to a random sequence were identified. The results can be used in sequence annotation, splicing site prediction, and the recognition of the gene exon-intron structure. 相似文献
20.
Nucleotide sequence organization in the genome of maize has been studied using renaturation kinetics of DNA and S-1 nuclease digestion of the renatured products. Approximately 40% of the genome consists of single copy sequences, and 15% of these sequences are interspersed between repeated sequences and are approximately 1100 nucleotide pairs long. About 54% of the genome consists of repeated sequences. Six per cent of the genome consists of foldback sequences. These sequences are distributed through at least 44% of the genome. It was found using renaturation kinetics that the sum of foldback and highly repeated DNA fractions of Dobrudzhanko maize and inbred lines differ in the amount of DNA composing the fractions. Comparison of the DNA of the Dobrudzhanko maize and inbred lines by the method of DNA-DNA hybridization indicates strong differences in the amount of polynucleotide homologies between the Dobrudzhanko maize and the D1 inbred line on one hand and the A619 inbred line on the other hand. 相似文献