首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Investigating extended regulatory regions of genomic DNA sequences.   总被引:2,自引:0,他引:2  
MOTIVATION: Despite the growing volume of data on primary nucleotide sequences, the regulatory regions remain a major puzzle with regard to their function. Numerous recognising programs considering a diversity of properties of regulatory regions have been developed. The system proposed here allows the specific contextual, conformational and physico-chemical properties to be revealed based on analysis of extended DNA regions. RESULTS: The Internet-accessible computer system RegScan, designed to analyse the extended regulatory regions of eukaryotic genes, has been developed. The computer system comprises the following software: (i) programs for classification dividing a set of promoters into TATA-containing and TATA-less promoters and promoters with and without CpG islands; (ii) programs for constructing (a) nucleotide frequency profiles, (b) sequence complexity profiles and (c) profiles of conformational and physico-chemical properties; (iii) the program for constructing the sets of degenerate oligonucleotide motifs of a specified length; and (iv) the program searching for and visualising repeats in nucleotide sequences. The system has allowed us to demonstrate the following characteristic patterns of vertebrate promoter regions: the TATA box region is flanked by regions with an increased G+C content and increased bending stiffness, the TATA box content is asymmetric and promoter regions are saturated with both direct and inverted repeats. AVAILABILITY: The computer system RegScan is available via the Internet at http://www.mgs.bionet.nsc. ru/Systems/RegScan, http://www.cbil.upenn.edu/mgs/systems/r egscan/.  相似文献   

2.
Finding composite regulatory patterns in DNA sequences   总被引:1,自引:0,他引:1  
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be 'too weak'. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data.  相似文献   

3.
4.
Abstract A genomic DNA sequence of Streptomyces strain ISP 5485 was cloned, sequenced and compared with corresponding information from nucleic acid data banks. The DNA sequence was unique, but showed homology to DNA coding for the condensing enzyme, 2-oxoacyl synthase, of the deoxyerythronolide B synthase complex (DEBS) from Saccharopolyspora erythraea NRRL 2338. A subfragment of the sequenced DNA was used to construct a gene-specific probe that formed part of the putative 2-oxoacyl synthase gene. The PCR-amplified and labelled probe was used in hybridization experiments involving 33 streptomycete strains that produced different classes of antibiotics. The probe showed widespread homology with DNA considered to be part of analogous genes within genomes of different polyketide producers. The implications of the probe homology to bacterial chromosomal DNA are discussed.  相似文献   

5.
RsrI DNA methyltransferase (M-RsrI) from Rhodobacter sphaeroides has been purified to homogeneity, and its gene cloned and sequenced. This enzyme catalyzes methylation of the same central adenine residue in the duplex recognition sequence d(GAATTC) as does M-EcoRI. The reduced and denatured molecular weight of the RsrI methyltransferase (MTase) is 33,600 Da. A fragment of R. sphaeroides chromosomal DNA exhibited M.RsrI activity in E. coli and was used to sequence the rsrIM gene. The deduced amino acid sequence of M.RsrI shows partial homology to those of the type II adenine MTases HinfI and DpnA and N4-cytosine MTases BamHI and PvuII, and to the type III adenine MTases EcoP1 and EcoP15. In contrast to their corresponding isoschizomeric endonucleases, the deduced amino acid sequences of the RsrI and EcoRI MTases show very little homology. Either the EcoRI and RsrI restriction-modification systems assembled independently from closely related endonuclease and more distantly related MTase genes, or the MTase genes diverged more than their partner endonuclease genes. The rsrIM gene sequence has also been determined by Stephenson and Greene (Nucl. Acids Res. (1989) 17, this issue).  相似文献   

6.
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.  相似文献   

7.
MOTIVATION: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition. RESULTS: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus ) and human-invertebrate (Drosophila ) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements. AVAILABILITY: Pro-Gen v. 3.0 is available to academic researchers free of charge at http://www.anchorgen.com/pro_gen/pro_gen.html.  相似文献   

8.
In E. coli homologous recombination, a filament of RecA protein formed on DNA searches and pairs a homologous sequence within a second DNA molecule with remarkable speed and fidelity. Here, we directly probe the strength of the two-molecule interactions involved in homology search and recognition using dual-molecule manipulation, combining magnetic and optical tweezers. We find that the filament's secondary DNA-binding site interacts with a single strand of the incoming double-stranded DNA during homology sampling. Recognition requires opening of the helix and is strongly promoted by unwinding torsional stress. Recognition is achieved upon binding of both strands of the incoming dsDNA to each of two ssDNA-binding sites in the filament. The data indicate a physical picture for homology recognition in which the fidelity of the search process is governed by the distance between the DNA-binding sites.  相似文献   

9.
Symmetry observations in long nucleotide sequences.   总被引:2,自引:0,他引:2       下载免费PDF全文
  相似文献   

10.
11.
Integration of retroviral DNA into the host chromosome requires a virus-encoded integrase (IN). IN recognizes, cuts and then joins specific viral DNA sequences (LTR ends) to essentially random sites in host DNA. We have used computer-assisted protein alignments and mutagenesis in an attempt to localize these functions within the avian retroviral IN protein. A comparison of the deduced amino acid sequences for 80 retroviral/retrotransposon IN proteins reveals strong conservation of an HHCC N-terminal 'Zn finger'-like domain, and a central D(35)E region which exhibits striking similarities with sequences deduced for bacterial IS elements. We demonstrate that the HHCC region is not required for DNA binding, but contributes to specific recognition of viral LTRs in the cutting and joining reactions. Deletions which extend into the D(35)E region destroy the ability of IN to bind DNA. Thus, we propose that the D(35)E region may specify a DNA-binding/cutting domain that is conserved throughout evolution in enzymes with similar functions.  相似文献   

12.
Possible molecular detent in the DNA structure at regulatory sequences   总被引:10,自引:0,他引:10  
A common feature that appears in a number of DNA sites where proteins interact is the sequence GTG/CAC. In the lac operator this sequence leads to a region with a higher imino proton exchange rate well below the optical melting temperature. It is suggested that this reflects a structural feature recognized by proteins that bind specific sites on the DNA molecule.  相似文献   

13.
14.
Use of runs statistics for pattern recognition in genomic DNA sequences.   总被引:2,自引:0,他引:2  
In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.  相似文献   

15.
16.
An algorithm is proposed for extracting regulatory signals from DNA sequences. The algorithm complexity is nearly quadratic. The results of testing the algorithm on artificial and natural sequences are presented.  相似文献   

17.
18.
We present data on the frequencies of nucleotides and nucleotide substitutions in conservative DNA regions involved in the regulation of gene expression. Data on prokaryotes and eukaryotes are considered separately. In both cases DNA strands complementary to those which serve as templates for RNA-polymerase have low frequencies of cytosine. The most conservative positions also have an increased frequency of adenine. Various substitutions in the series of homologous regulatory DNA sequences, as compared to their consensuses, have different frequencies. In prokaryotes guanine in a consensus sequence is substituted for at the lowest and adenine at the highest frequency, whereas in eukaryotes cytosine is substituted for at the lowest and guanine at the highest frequency. In both cases the nucleotides substituted for are most frequently replaced with cytosine. Deviations from consensus sequences tend to cluster in adjacent positions. The more pronounced the consequences of a nucleotide substitution are the higher is the frequency of substitutions in adjacent positions. Possible explanations for these phenomena are discussed.  相似文献   

19.
Preferential psoralen photobinding sites have been mapped in vitro on restriction fragments spanning the SV40 origin region and surrounding sequences by a new fine structure analysis technique. Purified DNA fragments were photoreacted with 3H-5-methylisopsoralen (3H-5-MIP), a psoralen derivative which forms only monoadducts. Fragments were then end-labeled and digested with lambda exonuclease, a 5' processive enzyme which we have determined pauses at 5-MIP monoadducts. When photobinding sites were mapped on denaturing sequencing gels, it was observed that 5-MIP binds preferentially to 5'-TA sites, and to a lesser degree to 5'-AT sites. Utilizing this approach, we have identified a psoralen hypersensitive region in which the binding sites were much stronger than those in the surrounding sequences. This region extends from 150 base pairs (bp) to the late side of the enhancers to the early enhancer/promoter boundary. We suggest that this region contains a sequence directed structural alteration of the DNA helix which can be detected by the psoralen mapping approach described.  相似文献   

20.
A speculative model based on published sequences to explain the specific binding of RNA polymerase of E. coli to promoters is suggested: {ie181-1}A fragment 24 base pairs long to the left of the initiation site must not contain the G's in the positions indicated by the circles and must contain T's (or G's) in the positions marked by the squares. In most known cases mutual disposition of the circle and square patterns is as shown above. In one case (the fd G3 promoter) the pattern of squares is shifted by 4 base pairs to the right relative to the pattern of universal non-G's (circles).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号