首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Subtle motifs: defining the limits of motif finding algorithms   总被引:4,自引:0,他引:4  
MOTIVATION: What constitutes a subtle motif? Intuitively, it is a motif that is almost indistinguishable, in the statistical sense, from random motifs. This question has important practical consequences: consider, for example, a biologist that is generating a sample of upstream regulatory sequences with the goal of finding a regulatory pattern that is shared by these sequences. If the sequences are too short then one risks losing some of the regulatory patterns that are located further upstream. Conversely, if the sequences are too long, the motif becomes too subtle and one is then likely to encounter random motifs which are at least as significant statistically as the regulatory pattern itself. In practical terms one would like to recognize the sequence length threshold, or the twilight zone, beyond which the motifs are in some sense too subtle. RESULTS: The paper defines the motif twilight zone where every motif finding algorithm would be exposed to random motifs which are as significant as the one which is sought. We also propose an objective tool for evaluating the performance of subtle motif finding algorithms. Finally we apply these tools to evaluate the success of our MULTIPROFILER algorithm to detect subtle motifs.  相似文献   

2.
3.

Background  

An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches.  相似文献   

4.
Many aspects of cell signalling, trafficking, and targeting are governed by interactions between globular protein domains and short peptide segments. These domains often bind multiple peptides that share a common sequence pattern, or “linear motif” (e.g., SH3 binding to PxxP). Many domains are known, though comparatively few linear motifs have been discovered. Their short length (three to eight residues), and the fact that they often reside in disordered regions in proteins makes them difficult to detect through sequence comparison or experiment. Nevertheless, each new motif provides critical molecular details of how interaction networks are constructed, and can explain how one protein is able to bind to very different partners. Here we show that binding motifs can be detected using data from genome-scale interaction studies, and thus avoid the normally slow discovery process. Our approach based on motif over-representation in non-homologous sequences, rediscovers known motifs and predicts dozens of others. Direct binding experiments reveal that two predicted motifs are indeed protein-binding modules: a DxxDxxxD protein phosphatase 1 binding motif with a KD of 22 μM and a VxxxRxYS motif that binds Translin with a KD of 43 μM. We estimate that there are dozens or even hundreds of linear motifs yet to be discovered that will give molecular insight into protein networks and greatly illuminate cellular processes.  相似文献   

5.
6.
7.
8.
9.
10.
The human immunodeficiency virus type 1 virulence protein Nef interacts with the endosomal sorting machinery via a leucine-based motif. Similar sequences within the cytoplasmic domains of cellular transmembrane proteins bind to the adaptor protein (AP) complexes of coated vesicles to modulate protein traffic, but the molecular basis of the interactions between these motifs and the heterotetrameric complexes is controversial. To identify the target of the Nef leucine motif, the native sequence was replaced with either leucine- or tyrosine-based AP-binding sequences from cellular proteins, and the interactions with AP subunits were correlated with function. Tyrosine motifs predictably modulated the interactions between Nef and the mu subunits of AP-1, AP-2, and AP-3; heterologous leucine motifs caused little change in these interactions. Conversely, leucine motifs mediated a ternary interaction between Nef and hemicomplexes containing the sigma1 plus gamma subunits of AP-1 or the sigma3 plus delta subunits of AP-3, whereas tyrosine motifs did not. Similarly, only leucine motifs supported the Nef-mediated association of AP-1 and AP-3 with endosomal membranes in cells treated with brefeldin A. Functionally, Nef proteins containing leucine motifs down-regulated CD4 from the cell surface and enhanced viral replication, whereas those containing tyrosine motifs were inactive. Apparently, the interaction of Nef with the mu subunits of AP complexes is insufficient for function. A leucine-specific mode of interaction that likely involves AP hemicomplexes is further required for Nef activity. The mu and hemicomplex interactions may cooperate to yield high avidity binding of AP complexes to Nef. This binding likely underlies the unusual ability of Nef to induce the stabilization of these complexes on endosomal membranes, an activity that correlates with enhancement of viral replication.  相似文献   

11.
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.  相似文献   

12.
13.
14.
Dfp1/Him1 protein of fission yeast, Schizosaccharomyces pombe, encodes the regulatory subunit for Hsk1 kinase, a homologue of budding yeast Cdc7 kinase essential for initiation and progression of the S phase of the cell cycle. This protein binds and activates Hsk1 kinase, which phosphorylates the MCM2 protein. Comparison of the amino acid sequences of the Cdc7 regulatory subunits from various eukaryotes revealed the presence of three small stretches of conserved amino acid sequences, namely Dbf4 motifs N, M, and C. We report here that the Dbf4 motif M, a unique proline-rich motif, and the Dbf4 motif C, a C(2)H(2)-type zinc finger motif, are essential for mitotic functions of Dfp1/Him1 protein as well as for full-level activation of Hsk1 kinase. In vitro, a small segment containing the Dbf4 motif M or C alone binds to and partially activates Hsk1. Co-expression of these two segments augments the extent of activation. Furthermore, a fused polypeptide containing only Dbf4 motifs M and C without any spacer can activate Hsk1 and is capable of rescuing the growth defect of him1 null cells. Insertion of a long stretch of amino acids between the motif M and motif C can be tolerated for mitotic functions. On the other hand, internal deletion of Dbf4 motif N, which has some similarity with the BRCA C-terminal domain motif, results in a defect in hydroxyurea-induced checkpoint responses and sensitivity to methyl methane sulfonate, yet mitotic functions and kinase activation are intact. In one-hybrid assays with budding yeast Dbf4, motif N mutants exhibit reduced interaction with a replication origin. Our observations suggest the molecular architecture of Cdc7.Dbf4-related kinase complexes at the origins, in which they are tethered to replication machinery through Dbf4 motif N and the catalytic subunits are activated through bipartite binding of Dbf4 motifs M and C of the regulatory subunits.  相似文献   

15.
16.
WW domains mediate protein-protein interactions through binding to short proline-rich sequences. Two distinct sequence motifs, PPXY and PPLP, are recognized by different classes of WW domains, and another class binds to phospho-Ser-Pro sequences. We now describe a novel Pro-Arg sequence motif recognized by a different class of WW domains using data from oriented peptide library screening, expression cloning, and in vitro binding experiments. The prototype member of this group is the WW domain of formin-binding protein 30 (FBP30), a p53-regulated molecule whose WW domains bind to Pro-Arg-rich cellular proteins. This new Pro-Arg sequence motif re-classifies the organization of WW domains based on ligand specificity, and the Pro-Arg class now includes the WW domains of FBP21 and FE65. A structural model is presented which rationalizes the distinct motifs selected by the WW domains of YAP, Pin1, and FBP30. The Pro-Arg motif identified for WW domains often overlaps with SH3 domain motifs within protein sequences, suggesting that the same extended proline-rich sequence could form discrete SH3 or WW domain complexes to transduce distinct cellular signals.  相似文献   

17.
Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods.  相似文献   

18.
19.
20.
Tao T  Zhai CX  Lu X  Fang H 《Applied bioinformatics》2004,3(2-3):115-124
Automatic discovery of new protein motifs (i.e. amino acid patterns) is one of the major challenges in bioinformatics. Several algorithms have been proposed that can extract statistically significant motif patterns from any set of protein sequences. With these methods, one can generate a large set of candidate motifs that may be biologically meaningful. This article examines methods to predict the functions of these candidate motifs. We use several statistical methods: a popularity method, a mutual information method and probabilistic translation models. These methods capture, from different perspectives, the correlations between the matched motifs of a protein and its assigned Gene Ontology terms that characterise the function of the protein. We evaluate these different methods using the known motifs in the InterPro database. Each method is used to rank candidate terms for each motif. We then use the expected mean reciprocal rank to evaluate the performance. The results show that, in general, all these methods perform well, suggesting that they can all be useful for predicting the function of an unknown motif. Among the methods tested, a probabilistic translation model with a popularity prior performs the best.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号