首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The distributions of the junction sequences of homooligomer tracts of various lengths have been examined in prokaryotic DNA sequences and compared with those of eukaryotes. The general trends in the nearest and next to nearest neighbors to the tracts are similar for both groups. In both prokaryotes and eukaryotes A/T runs are preferentially flanked on either the 5' or the 3' ends by A and/or T. G/C runs are preferentially flanked by G and/or C. There is discrimination against A/T runs flanked by G or C and G/C runs flanked by A or T. However, whereas the distribution of prokaryotic homooligomer tract junction sequences was quite homogeneous, large variations were observed in the 5-fold larger eukaryotic database, increasing in magnitude from tracts of length 2 to 3 to 4 base pairs long. Possible DNA conformational implications and in particular DNA curvature and packaging aspects of prokaryotes and eukaryotes are discussed.  相似文献   

2.
Here, we study the frequencies of occurrence of homooligomers flanked by one base, XnU or UXn, where X = A, C, G, T and U not equal to X. Specifically, we search for preferences (or discriminations) in their nearest neighbor doublet, VV. Extensive analysis of the data base reveals striking patterns in such VVUXn or UXn VV oligomers (V = A, C, G, T). With very few exceptions, if the VV and Xn are composed of complementary nucleotides, those oligomers having a pyrimidine (Y)-purine (R) junction are preferred over those with an RY one. If the VV and Xn nucleotides are not complementary, the RY junction oligomers are preferred over their YR counterparts. These trends are observed consistently in eukaryotic and prokaryotic sequences. They are particularly striking in the YR greater than RY oligomers containing complementary nucleotides. The general preferences and discriminations described here are in the same direction as our previous results for homooligomer tracts. These recurrences, along with some additional universal "rules", aid in our understanding of the ordering of nucleotides in the DNA.  相似文献   

3.
Summary The eukaryotic and prokaryotic databases are scanned for potential nearest-neighbor doublet preferences at the 5 and 3 flanks of some oligomers. Here we focus on oligomers containing alternating nucleotides, i.e., UV, UVUV, and UUVV where UV. Strong, consistent trends are observed in eukaryotic sequences. A/T alternation oligomers are preferentially flanked by A/T. G/C flanks are disfavored. G/C alternation oligomers are preferentially flanked by G/C. A/T flanks are disfavored. These trends are consistent with those observed previously for homooligomer tracts (Nussinov et al. 1989a,b). G/C tracts are preferentially flanked by G/C. A/T nearest neighbors are disfavored. The reverse holds for A/T tracts. Additional patterns are described here as well. The possible origin of these DNA composition and sequence trends is discussed. These trends are suggested to stem from protein-DNA interaction constraints.  相似文献   

4.
The melting of the coding and non-coding classes of natural DNA sequences was investigated using a program, MELTSIM, which simulates DNA melting based upon an empirically parameterized nearest neighbor thermodynamic model. We calculated T(m) results of 8144 natural sequences from 28 eukaryotic organisms of varying F(GC) (mole fraction of G and C) and of 3775 coding and 3297 non-coding sequences derived from those natural sequences. These data demonstrated that the T(m) vs. F(GC) relationships in coding and non-coding DNAs are both linear but have a statistically significant difference (6.6%) in their slopes. These relationships are significantly different from the T(m) vs. F(GC) relationship embodied in the classical Marmur-Schildkraut-Doty (MSD) equation for the intact long natural sequences. By analyzing the simulation results from various base shufflings of the original DNAs and the average nearest neighbor frequencies of those natural sequences across the F(GC) range, we showed that these differences in the T(m) vs. F(GC) relationships are largely a direct result of systematic F(GC)-dependent biases in nearest neighbor frequencies for those two different DNA classes. Those differences in the T(m) vs. F(GC) relationships and biases in nearest neighbor frequencies also appear between the sequences from multicellular and unicellular organisms in the same coding or non-coding classes, albeit of smaller but significant magnitudes.  相似文献   

5.
Previous studies of the dinucleotides flanking both the 5' and 3' ends of homooligomer tracts have shown that some flanks are consistently preferred over others (1,2). In the first preferred group, the homooligomer tracts are flanked by the same nucleotide and/or the complementary nucleotides, e.g.,ATAn,TTAn,CCGn, where n = 2-5. Runs flanked by nucleotides with which they cannot base pair are distinctly disfavored. (In this group An/Tn are flanked by C and/or G; Gn/Cn are flanked by A/T, e.g.,CGAn,TnGG,GnAT). The frequencies of runs flanked by A or T, and G or C ("mixed"group) are as expected. Here we seek the origin of this effect and its relevance to protein-DNA interactions. Surprisingly, within the first group, runs flanked by their complements with a pyrimidine-purine junction (e.g.,TTAn,CnGG) are greatly preferred. The frequencies of their purine-pyrimidine junction mirror-images is just as expected. This effect, as well as additional ones enumerated below, is seen universally in eukaryotes and in prokaryotes, although it is stronger in the former. Detailed analysis of regulatory regions shows these strong trends, particularly in GC sequences. The potential relationship to DNA conformation and DNA-protein interaction is discussed.  相似文献   

6.
A statistical analysis of occurrence of particular nucleotide runs (1 divided by 10 nucleotides long) in DNA sequences of different species has been carried out. There are considerable differences in run distributions in DNA sequences of prokaryotes, invertebrates and vertebrates. Distribution of various types of runs has been found to be different in coding and non-coding sequences. There is an abundance of short runs 1 divided by 2 nucleotides long in coding sequences, and there is a deficiency of such runs in the non-coding regions. However, some interesting exceptions from this rule exist: for run distribution of adenine in prokaryotes and for distribution of purine-pyrimidine runs in eukaryotes. This may be stipulated by the fact that the distribution of runs are predetermined by structural peculiarities of the entire DNA molecule. Runs of guanine or cytosine of three to six nucleotides long occur predominantly in the non-coding DNA regions in eukaryotes, especially in vertebrates.  相似文献   

7.
8.
编码序列和非编码序列的3-tuple分布特征   总被引:2,自引:0,他引:2  
傅强  钱敏平  陈良标  朱玉贤 《遗传学报》2005,32(10):1018-1026
非编码序列,特别是内含子的起源,是一个重要的悬而未决的问题。首先通过计算模式生物的编码序列和非编码序列的不同阅读框中3-tupie的频率分布,发现编码区中不同阅读框具有十分不同的3-tuple分布,而在非编码区中,不同阅读框的3-tuple分布几乎相等,并且这一性质不具有物种依赖性。为了描述分布差异的程度,引进夏量一对称相对熵,并通过比较原核生物和真核生物,发现无论是编码区还是非编码区,原核生物都具有比真核生物更高的SRE值。进一步研究表明,某一生物的SRE值与该生物全基因组中编码区所占的百分比存在一定的相关性(相关系数为0.86)。计算机模拟进化实验发现,2%的突变就足以使典型的嗯核生物编码区高SRE值变为真核生物内含子区特有的低SRE值。比对数据库中已经注释的内含子和编码区序列,证明确实有一部分与编码区具有很高同源性的内含子序列。实验表明,至少部分真核生物的内含子可能起源于编码序列,同时也说明SRE可能被用于研究物种基因组序列的进化。  相似文献   

9.
Abstract

Previous studies of the dinucleotides flanking both the 5′ and 3′ ends of homooligomer tracts have shown that some flanks are consistently preferred over others (1,2). In the first preferred group, the homooligomer tracts are flanked by the same nucleotide and/or the complementary nucleotides, e.g., ATAn, TTAn, CCGn, where n=2–5. Runs flanked by nucleotides with which they cannot base pair are distinctly disfavored. (In this group A/Tn are flanked by C and/or G; Gn/Cn are flanked by A/T, e.g., CGAn, TnGG, G., AT). The frequencies of runs flanked by AorT, and G or C (“mixed” group) are as expected. Here we seek the origin of this effect and its relevance to protein-DNA interactions. Surprisingly, within the first group, runs flanked by their complements with a pyrimidine-purine junction (e.g., TTAn, CnGG) are greatly preferred. The frequencies of their purine-pyrimidine junction mirror-images is just as expected. This effect, as well as additional ones enumerated below, is seen universally in eukaryotes and in prokaryotes, although it is stronger in the former. Detailed analysis of regulatory regions shows these strong trends, particularly in GC sequences. The potential relationship to DNA conformation and DNA-protein interaction is discussed.  相似文献   

10.
The recent electronmicroscopic and biochemical mapping of Z-DNA sites in phi X174, SV40, pBR322 and PM2 DNAs has been used to determine two sets of criteria for identification of potential Z-DNA sequences in natural DNA genomes. The prediction of potential Z-DNA tracts and corresponding statistical analysis of their occurrence have been made on a sample of 14 DNA genomes. Alternating purine and pyrimidine tracts longer than 5 base pairs in length and their clusters (quasi alternating fragments) in the 14 genomes studied are under-represented compared to the expectation from corresponding random sequences. The fragments [d(G X C)]n and [d(C X G)]n (n greater than or equal to 3) in general do not occur in circular DNA genomes and are under-represented in the linear DNAs of phages lambda and T7, whereas in linear genomes of adenoviruses they are strongly over-represented. With minor exceptions, potential Z-DNA sites are also under-represented compared to random sequences. In the 14 genomes studied, predicted Z-DNA tracts occur in non-coding as well as in protein coding regions. The predicted Z-DNA sites in phi X174, SV40, pBR322 and PM2 correspond well with those mapped experimentally. A complete listing together with a compact graphical representation of alternating purine-pyrimidine fragments and their Z-forming potential are presented.  相似文献   

11.
Summary We have determined the secondary structure of the human 28S rRNA molecule based on comparative analysis of available eukaryotic cytoplasmic and prokaryotic large-rRNA gene sequences. Examination of large-rRNA sequences of both distantly and closely related species has enabled us to derive a structure that accounts both for highly conserved sequence tracts and for previously unanalyzed variable-sequence tracts that account for the evolutionary differences in size among the large rRNAs.Human 28S rRNA is composed of two different types of sequence tracts: conserved and variable. They differ in composition, degree of conservation, and evolution. The conserved regions demonstrate a striking constancy of size and sequence. We have confirmed that the conserved regions of large-rRNA molecules are capable of forming structures that are superimposable on one another. The variable regions contain the sequences responsible for the 83% increase in size of the human large-rRNA molecule over that ofEscherichia coli. Their locations in the gene are maintained during evolution. They are G+C rich and largely nonhomologous, contain simple repetitive sequences, appear to evolve by frequent recombinational events, and are capable of forming large, stable hairpins.The secondary-structure model presented here is in close agreement with existing prokaryotic 23S rRNA secondary-structure models. The introduction of this model helps resolve differences between previously proposed prokaryotic and eukaryotic large-rRNA secondary-structure models.  相似文献   

12.
The K, I and S regions of the mouse major histocompatibility complex (MHC) are composed of long tracts of DNA which differ in sequence divergence. A correlation exists between the location of an MHC gene in a variable or conserved chromosomal tract and the degree of polymorphism and diversity of the proteins encoded by its alleles. Variable tracts appear to be the result of mechanisms which mutate certain coding and non-coding sequences to the same extent and selective pressures operating on the genes.  相似文献   

13.
P Bucher  G Yagil 《DNA sequence》1991,1(3):157-172
A program to analyse the length and frequency distribution of specific base tracts in genomic sequences is described. The frequency of oligopurine.oligopyrimidine tracts (R.Y. tracts) in a data base of 163 transcribed genes is analysed and compared. The complete genomes of SV40 virus, N. tobacum chloroplast, yeast 2 micron plasmid, bacteriophage lambda, plasmid pBR322 and the E. coli lac operon are also analyzed. A highly significant overrepresentation of oligopurine and oligopyrimidine tracts is observed in all eukaryotic genes examined, as well as in the chloroplast genome. The overrepresentation is evident in all gene subregions of the chloroplast, in the following order: intergenic regions, 3' downstream and 5' upstream (promoter), 5' and 3' untranslated, introns and coding regions. In genes coding for basic proteins, oligopurine rather than oligopyrimidine tracts are found on the coding stand. In prokaryotic genes only the longest R.Y. tracts (greater than or equal to 12) are found in excess, and are concentrated near regulatory regions. While a structural role for R.Y. tracts is most likely in intergenic regions, a functional role, as initiation sites for strand separation, is proposed for regulatory gene regions.  相似文献   

14.
In recent years, the amount of molecular sequencing data from Tetrahymena thermophila has dramatically increased. We analyzed G + C content, codon usage, initiator codon context and stop codon sites in the extremely A + T rich genome of this ciliate. Average G + C content was 38% for protein coding regions, 21% for 5' non-coding sequences, 19% for 3' non-coding sequences, 15% for introns, 19% for micronuclear limited sequences and 17% for macronuclear retained sequences flanking micronuclear specific regions. The 75 available T. thermophila protein coding sequences favored codons ending in T and, where possible, avoided those with G in the third position. Highly expressed genes were relatively G + C-rich and exhibited an extremely biased pattern of codon usage while developmentally regulated genes were more A + T-rich and showed less codon usage bias. Regions immediately preceding Tetrahymena translation initiator codons were generally A-rich. For the 60 stop codons examined, the frequency of G in the end + 1 site was much higher than expected whereas C never occupied this position.  相似文献   

15.
Interpolated markov chains for eukaryotic promoter recognition.   总被引:9,自引:0,他引:9  
MOTIVATION: We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS: A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.  相似文献   

16.
17.
18.
19.
The mature mRNA always carries nucleotide sequences that faithfully mirror the protein product according to the niles of the genetic code. However, in the chromosome, the nucleotide sequence that represents a certain protein is interrupted by additional sequences. Therefore, most eukaryotic genes are longer than their final mRNA products. The human genome project revealed that only a tiny portion of sequences serves as protein-coding region and almost one quarter of the genome is occupied by non-coding intervening sequences. The elimination of these non-coding regions from the precursor RNA in a process termed splicing must be extremely precise, because even a single nucleotide mistake may cause a fatal error. At present, two types of intervening sequences have been identified in protein-coding genes. One of them, the U2-dependent or major-class is prevalent and represents 99% of known sequences. The other one, the so-called U12-dependent or minor-class of introns, occurs in much lesser amounts in the genome. The basic problem of nuclear splicing concerns i/ the molecular mechanisms, which ensure that the coding regions are correctly recognized and spliced together: ii/ the principles and mechanisms that guarantee the high fidelity of the splicing system; iii/ the differences in the excision mechanisms of the two classes of introns. We are going to present models explaining how intervening sequences are accurately removed and the coding regions correctly juxtaposed. The two splicing mechanisms will also be compared.  相似文献   

20.
To study possible involvement of polypurine and polypyrimidine DNA tracts capable of forming triple-stranded structures (the H-form of DNA) in compaction of eukaryotic chromosomes, an in silico search for complementary polypurine and polypyrimidine sequences was carried out within 12 eukaryotic genes. It was shown that, in chromosomal gene loci, 10–11 bp polypurine and polypyrimidine tracts potentially capable of interacting with each other with the formation of triplex structures (“structuring” regions) are located in predominantly in introns and gene-flanking regions. In vivo, such DNA-DNA interactions can result in the chromosomal gene domain folding into several small loops. The character of the DNA triplex-mediated compaction of chromosomal gene loci may be related to gene functioning. A similar analysis of long (LINE) and short (SINE) interspersed repeat sequences, as well as of satellite DNA, showed essential resemblance between the compaction mechanisms of coding and noncoding chromosome regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号