首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all frequent structured motifs that have quorum q. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences.  相似文献   

2.
Detection of functional DNA motifs via statistical over-representation   总被引:14,自引:0,他引:14  
  相似文献   

3.
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.  相似文献   

4.
MOTIVATION: Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? RESULTS: The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.  相似文献   

5.
6.
Finding motifs in the twilight zone   总被引:8,自引:0,他引:8  
  相似文献   

7.
8.
Lu CH  Lin YS  Chen YC  Yu CS  Chang SY  Hwang JK 《Proteins》2006,63(3):636-643
To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.  相似文献   

9.
Mining frequent stem patterns from unaligned RNA sequences   总被引:1,自引:0,他引:1  
MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request.  相似文献   

10.
MOTIVATION: Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable. RESULTS: This article proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif-finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools. AVAILABILITY: The source code is available upon request from the authors.  相似文献   

11.
Identification of two novel arginine binding DNAs.   总被引:5,自引:0,他引:5       下载免费PDF全文
K Harada  A D Frankel 《The EMBO journal》1995,14(23):5798-5811
RNA tertiary structure is known to play critical roles in RNA-protein recognition and RNA function. To examine how DNA tertiary structure might relate to RNA structure, we performed in vitro selection experiments to identify single-stranded DNAs that specifically bind arginine, and compared the results with analogous experiments performed with RNA. In the case of RNA, a motif related to the arginine binding site in human immunodeficiency virus TAR RNA was commonly found, whereas in the case of DNA, two novel motifs and no TAR-like structures were found. One DNA motif, found in approximately 40% of the cloned sequences, forms of hairpin structure with a highly conserved 10 nucleotide loop, whereas the second motif is especially rich in G residues. Chemical interference and mutagenesis experiments identified nucleotides in both motifs that form specific arginine binding sites, and dimethylsulfate footprinting experiments identified single guanine residues in both that are protected from methylation in the presence of arginine, suggesting possible sites of arginine contact or conformational changes in the DNAs. Circular dichroism experiments indicated that both DNAs undergo conformational changes upon arginine binding and that the arginine guanidinium group alone is responsible for binding. A model for the G-rich motif is proposed in which mixed guanine and adenine quartets may form a novel DNA structure. Arginine binding DNAs and RNAs should provide useful model systems for studying nucleic acid tertiary structure.  相似文献   

12.
The archaeal intron-encoded homing enzymes I-PorI and I-DmoI belong to a family of endonucleases that contain two copies of a characteristic LAGLIDADG motif. These endonucleases cleave their intron- or intein- alleles site-specifically, and thereby facilitate homing of the introns or inteins which encode them. The protein structure and the mechanism of DNA recognition of these homing enzymes is largely unknown. Therefore, we examined these properties of I-PorI and I-DmoI by protein footprinting. Both proteins were susceptible to proteolytic cleavage within regions that are equidistant from each of the two LAGLIDADG motifs. When complexed with their DNA substrates, a characteristic subset of the exposed sites, located in regions immediately after and 40-60 amino acids after each of the LAGLIDADG motifs, were protected. Our data suggest that the enzymes are structured into two, tandemly repeated, domains, each containing both the LAGLIDADG motif and two putative DNA binding regions. The latter contains a potentially novel DNA binding motif conserved in archaeal homing enzymes. The results are consistent with a model where the LAGLIDADG endonucleases bind to their non-palindromic substrates as monomeric enzymes, with each of the two domains recognizing one half of the DNA substrate.  相似文献   

13.
14.
15.
Analyzing protein-DNA recognition mechanisms   总被引:1,自引:0,他引:1  
We present a computational algorithm that can be used to analyze the generic mechanisms involved in protein-DNA recognition. Our approach is based on energy calculations for the full set of base sequences that can be threaded onto the DNA within a protein-DNA complex. It is able to reproduce experimental consensus binding sequences for a variety of DNA binding proteins and also correlates well with the order of measured binding free energies. These results suggest that the crystal structure of a protein-DNA complex can be used to identify all potential binding sequences. By analyzing the energy contributions that lead to base sequence selectivity, it is possible to quantify the concept of direct versus indirect recognition and to identify a new concept describing whether the protein-DNA interaction and DNA deformation terms select optimal binding sites by acting in accord or in disaccord.  相似文献   

16.
17.
The sap1 gene from Schizosaccharomyces pombe, which is essential for mating-type switching and for growth, encodes a sequence-specific DNA-binding protein with no homology to other known proteins. We have used a reiterative selection procedure to isolate binding sites for sap1, using a bacterially expressed protein and randomized double-strand oligonucleotides. The sap1 homodimer preferentially selects a pentameric motif, TA(A/G)CG, organized as a direct repeat and spaced by 5 nucleotides. Removal of a C-terminal dimerization domain abolishes recognition of the direct repeat and creates a new specificity for a DNA sequence containing the same pentameric motif but organized as an inverted repeat. We present evidence that the orientation of the DNA-binding domain is controlled by two independent oligomerization interfaces. The C-terminal dimerization domain allows a head-to-tail organization of the DNA-binding domains in solution, while an N-terminal domain is involved in a cooperative interaction on the DNA target between pairs of dimers.  相似文献   

18.
High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein. Such regions are then investigated for overrepresented sequence motifs, the assumption being that they must correspond to the binding specificity of the profiled protein. However this approach often fails: many bound regions do not contain the ‘expected’ motif. This is because binding DNA directly at its recognition site is not the only way the protein can cause the region to immunoprecipitate. Its binding specificity can change through association with different co-factors, it can bind DNA indirectly, through intermediaries, or even enforce its function through long-range chromosomal interactions. Conventional motif discovery methods, though largely capable of identifying overrepresented motifs from bound regions, lack the ability to characterize such diverse modes of protein–DNA binding and binding specificities. We present a novel Bayesian method that identifies distinct protein–DNA binding mechanisms without relying on any motif database. The method successfully identifies co-factors of proteins that do not bind DNA directly, such as mediator and p300. It also predicts literature-supported enhancer–promoter interactions. Even for well-studied direct-binding proteins, this method provides compelling evidence for previously uncharacterized dependencies within positions of binding sites, long-range chromosomal interactions and dimerization.  相似文献   

19.
As part of our analysis of the role of a uniquely clustered set of dam methylation sites (the motif GATC) within the origin of DNA replication in Escherichia coli, we have studied the effect of GATCs in various methylation states on the intrinsic curvature of DNA. We have designed a set of DNA linkers and used commercially available linkers containing GATC motifs. The linkers were ligated and the electrophoretic mobility of the resulting multimers in different states of methylation was tested relative to reference fragments. We report that properly phased GATCs in certain sequence environments modulate DNA curvature and that these effects may be enhanced by N6-adenine methylation of the GATCs. These structural alterations may in turn affect DNA-protein interactions, especially those involving proteins that rely on both primary sequence and structure for recognition. We present an example, where introduction of a GATC within an integration host factor (IHF) binding site, which does not alter the consensus sequence, reduces the binding affinity of the protein for the modified site. Received: 16 December 1997 / Accepted: 24 February 1998  相似文献   

20.
GAME: detecting cis-regulatory elements using a genetic algorithm   总被引:3,自引:0,他引:3  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号