首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
ABSTRACT: BACKGROUND: Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. RESULTS: The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. CONCLUSIONS: Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.  相似文献   

4.
5.
RSIR: regularized sliced inverse regression for motif discovery   总被引:3,自引:0,他引:3  
  相似文献   

6.
7.
8.
We examine the problem of extracting maximal irredundant motifs from a string. A combinatorial argument poses a linear bound on the total number of such motifs, thereby opening the way to the quest for the fastest and most efficient methods of extraction. The basic paradigm explored here is that of iterated updates of the set of irredundant motifs in a string under consecutive unit symbol extensions of the string itself. This approach exposes novel characterizations for the base set of motifs in a string, hinged on notions of partial order. Such properties support the design of ad hoc data structures and constructs, and lead to develop an O(n(3)) time incremental discovery algorithm.  相似文献   

9.
10.
11.
12.
Structure motif discovery and mining the PDB   总被引:2,自引:0,他引:2  
MOTIVATION: Many of the most interesting functional and evolutionary relationships among proteins are so ancient that they cannot be reliably detected through sequence analysis and are apparent only through a comparison of the tertiary structures. The conserved features can often be described as structural motifs consisting of a few single residues or Secondary Structure (SS) elements. Confidence in such motifs is greatly boosted when they are found in more than a pair of proteins. RESULTS: We describe an algorithm for the automatic discovery of recurring patterns in protein structures. The patterns consist of individual residues having a defined order along the protein's backbone that come close together in the structure and whose spatial conformations are similar. The residues in a pattern need not be close in the protein's sequence. The work described in this paper builds on an earlier reported algorithm for motif discovery. This paper describes a significant improvement of the algorithm which makes it very efficient. The improved efficiency allows us to use it for doing unsupervised learning of patterns occurring in small subsets in a large set of structures, a non-redundant subset of the Protein Data Bank (PDB) database of all known protein structures.  相似文献   

13.
14.
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.  相似文献   

15.
16.
17.
18.
We developed Trawler, the fastest computational pipeline to date, to efficiently discover over-represented motifs in chromatin immunoprecipitation (ChIP) experiments and to predict their functional instances. When we applied Trawler to data from yeast and mammals, 83% of the known binding sites were accurately called, often with other additional binding sites, providing hints of combinatorial input. Newly discovered motifs and their features (identity, conservation, position in sequence) are displayed on a web interface.  相似文献   

19.
Limitations and potentials of current motif discovery algorithms   总被引:10,自引:1,他引:9       下载免费PDF全文
Hu J  Li B  Kihara D 《Nucleic acids research》2005,33(15):4899-4913
  相似文献   

20.
MOTIVATION: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. RESULTS: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. AVAILABILITY: Gemoda is freely available at http://web.mit.edu/bamel/gemoda  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号