首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
2.
Xie BB  Chen XL  Zhang XY  He HL  Zhang YZ  Zhou BC 《Proteins》2008,71(3):1461-1474
Identification of protein interaction interfaces is very important for understanding the molecular mechanisms underlying biological phenomena. Here, we present a novel method for predicting protein interaction interfaces from sequences by using PAM matrix (PIFPAM). Sequence alignments for interacting proteins were constructed and parsed into segments using sliding windows. By calculating distance matrix for each segment, the correlation coefficients between segments were estimated. The interaction interfaces were predicted by extracting highly correlated segment pairs from the correlation map. The predictions achieved an accuracy 0.41-0.71 for eight intraprotein interaction examples, and 0.07-0.60 for four interprotein interaction examples. Compared with three previously published methods, PIFPAM predicted more contacting site pairs for 11 out of the 12 example proteins, and predicted at least 34% more contacting site pairs for eight proteins of them. The factors affecting the predictions were also analyzed. Since PIFPAM uses only the alignments of the two interacting proteins as input, it is especially useful when no three-dimensional protein structure data are available.  相似文献   

3.
4.
O-GlcNAcylation is an inducible, highly dynamic and reversible posttranslational modification, which regulates numerous cellular processes such as gene expression, translation, immune reactions, protein degradation, protein–protein interaction, apoptosis, and signal transduction. In contrast to N-linked glycosylation, O-GlcNAcylation does not display a strict amino acid consensus sequence, although serine or threonine residues flanked by proline and valine are preferred sites of O-GlcNAcylation. Based on this information, computational prediction tools of O-GlcNAc sites have been developed. Here, we retrospectively assessed the performance of two available O-GlcNAc prediction programs YinOYang 1.2 server and OGlcNAcScan by comparing their predictions for recently discovered experimentally validated O-GlcNAc sites. Both prediction programs efficiently identified O-GlcNAc sites situated in an environment resembling the consensus sequence P-P-V-[ST]-T-A. However, both prediction programs revealed numerous false negative O-GlcNAc predictions when the site of modification was located in an amino acid sequence differing from the known consensus sequence. By searching for a common sequence motif, we found that O-GlcNAcylation of nucleocytoplasmic proteins preferably occurs at serine and threonine residues flanked downstream by proline and valine and upstream by one to two alanines followed by a stretch of serine and threonine residues. However, O-GlcNAcylation of proteins located in the mitochondria or in the secretory lumen occurs at different sites and does not follow a distinct consensus sequence. Thus, our study indicates the limitations of the presently available computational prediction methods for O-GlcNAc sites and suggests that experimental validation is mandatory. Continuously update and further development of available databases will be the key to improve the performance of O-GlcNAc site prediction.  相似文献   

5.
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc.  相似文献   

6.
We present a new method for predicting protein–ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.  相似文献   

7.
Assembly of clathrin lattices is mediated by assembly/adaptor proteins that contain domains that bind lipids or membrane-bound cargo proteins and clathrin binding domains (CBDs) that recruit clathrin. Here, we characterize the interaction between clathrin and a large fragment of the CBD of the clathrin assembly protein AP180. Mutational, NMR chemical shift, and analytical ultracentrifugation analyses allowed us to precisely define two clathrin binding sites within this fragment, each of which is found to bind weakly to the N-terminal domain of the clathrin heavy chain (TD). The locations of the two clathrin binding sites are consistent with predictions from sequence alignments of previously identified clathrin binding elements and, by extension, indicate that the complete AP180 CBD contains ∼ 12 degenerate repeats, each containing a single clathrin binding site. Sequence and circular dichroism analyses have indicated that the AP180 CBD is predominantly unstructured and our NMR analyses confirm that this is largely the case for the AP180 fragment characterized here. Unexpectedly, unlike the many proteins that undergo binding-coupled folding upon interaction with their binding partners, the AP180 fragment is similarly unstructured in its bound and free states. Instead, we find that this fragment exhibits localized β-turn-like structures at the two clathrin binding sites both when free and when bound to clathrin. These observations are incorporated into a model in which weak binding by multiple, pre-structured clathrin binding elements regularly dispersed throughout a largely unstructured CBD allows efficient recruitment of clathrin to endocytic sites and dynamic assembly of the clathrin lattice.  相似文献   

8.
H L Monaco  G Zanotti 《Biopolymers》1992,32(4):457-465
We review our work on bovine and human retinol-binding protein (RBP), bovine beta lactoglobulin (BLG), and bovine odorant-binding protein (OBP). These three proteins share a sequence similarity high enough to justify the proposal that their three-dimensional structure ought to be quite similar, and they also share the function of similar or even identical hydrophobic ligand binding, although with a very different degree of specificity. Thus they constitute an ideal system to exhaustively explore the question of three-dimensional structure prediction from sequence similarity and the related question of binding site prediction for similar ligands. We have used x-ray diffraction techniques on single crystals of human and bovine RBP, bovine milk BLG, and bovine nasal mucosa OBP to investigate this problem. The results of these crystallographic studies indicate that to the level of resolution so far attained, the three-dimensional structure of these three proteins is reasonably predicted from the sequence similarity. The fold is the same and structural differences are rather subtle. Finally, we present experimental evidence that the binding sites of RBP, BLG, and OBP are in different regions of the molecules. Thus, it appears that although sequence alignment has correctly predicted the protein fold, it has incorrectly predicted the hydrophobic ligand-binding sites.  相似文献   

9.
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.  相似文献   

10.
Protein‐protein interactions control a large range of biological processes and their identification is essential to understand the underlying biological mechanisms. To complement experimental approaches, in silico methods are available to investigate protein‐protein interactions. Cross‐docking methods, in particular, can be used to predict protein binding sites. However, proteins can interact with numerous partners and can present multiple binding sites on their surface, which may alter the binding site prediction quality. We evaluate the binding site predictions obtained using complete cross‐docking simulations of 358 proteins with 2 different scoring schemes accounting for multiple binding sites. Despite overall good binding site prediction performances, 68 cases were still associated with very low prediction quality, presenting individual area under the specificity‐sensitivity ROC curve (AUC) values below the random AUC threshold of 0.5, since cross‐docking calculations can lead to the identification of alternate protein binding sites (that are different from the reference experimental sites). For the large majority of these proteins, we show that the predicted alternate binding sites correspond to interaction sites with hidden partners, that is, partners not included in the original cross‐docking dataset. Among those new partners, we find proteins, but also nucleic acid molecules. Finally, for proteins with multiple binding sites on their surface, we investigated the structural determinants associated with the binding sites the most targeted by the docking partners.  相似文献   

11.
12.
Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals.  相似文献   

13.
Despite significant methodological advances in protein structure determination high-resolution structures of membrane proteins are still rare, leaving sequence-based predictions as the only option for exploring the structural variability of membrane proteins at large scale. Here, a new structural classification approach for α-helical membrane proteins is introduced based on the similarity of predicted helix interaction patterns. Its application to proteins with known 3D structure showed that it is able to reliably detect structurally similar proteins even in the absence of any sequence similarity, reproducing the SCOP and CATH classifications with a sensitivity of 65% at a specificity of 90%. We applied the new approach to enhance our comprehensive structural classification of α-helical membrane proteins (CAMPS), which is primarily based on sequence and topology similarity, in order to find protein clusters that describe the same fold in the absence of sequence similarity. The total of 151 helix architectures were delineated for proteins with more than four transmembrane segments. Interestingly, we observed that proteins with 8 and more transmembrane helices correspond to fewer different architectures than proteins with up to 7 helices, suggesting that in large membrane proteins the evolutionary tendency to re-use already available folds is more pronounced.  相似文献   

14.
The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.  相似文献   

15.
Bhardwaj N  Lu H 《FEBS letters》2007,581(5):1058-1066
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.  相似文献   

16.
Alpha(2)-macroglobulin (alpha(2)M) and its receptor, low density lipoprotein receptor-related protein (LRP), function together to facilitate the cellular uptake and degradation of beta-amyloid peptide (Abeta). In this study, we demonstrate that Abeta binds selectively to alpha(2)M that has been induced to undergo conformational change by reaction with methylamine. Denatured alpha(2)M subunits, which were immobilized on polyvinylidene difluoride membranes, bound Abeta, suggesting that alpha(2)M tertiary and quaternary structure are not necessary. To determine whether a specific sequence in alpha(2)M is responsible for Abeta binding, we prepared and analyzed defined alpha(2)M fragments and glutathione S-transferase-alpha(2)M peptide fusion proteins. A single sequence, centered at amino acids (aa) 1314-1365, was identified as the only major Abeta-binding site. Importantly, Abeta did not bind to the previously characterized growth factor-binding site (aa 718-734). Although the Abeta binding sequence is adjacent to the binding site for LRP, the results of experiments with mutated fusion proteins indicate that the two sites are distinct. Furthermore, a saturating concentration of Abeta did not inhibit LRP-mediated clearance of alpha(2)M-MA in mice. Using various methods, we determined that the K(D) for the interaction of Abeta with its binding site in the individual alpha(2)M subunit is 0.7-2.4 microm. The capacity of alpha(2)M to bind Abeta and deliver it to LRP may be greater than that predicted by the K(D), because each alpha(2)M subunit may bind Abeta and the bound Abeta may multimerize. These studies suggest a model in which alpha(2)M has three protein interaction sites with distinct specificities, mediating the interaction with Abeta, growth factors, and LRP.  相似文献   

17.
Translin is a single-stranded RNA- and DNA-binding protein, which has been highly conserved in eukaryotes, from man to Schizosaccharomyces pombe. TRAX is a Translin paralog associated with Translin, which has coevolved with it. We generated structural models of the S. pombe Translin (spTranslin), based on the solved 3D structure of the human ortholog. Using several bioinformatics computation tools, we identified in the equatorial part of the protein a putative nucleic acids interaction surface, which includes many polar and positively charged residues, mostly arginines, surrounding a shallow cavity. Experimental verification of the bioinformatics predictions was obtained by assays of nucleic acids binding to amino acid substitution variants made in this region. Bioinformatics combined with yeast two-hybrid assays and proteomic analyses of deletion variants, also identified at the top of the spTranslin structure a region required for interaction with spTRAX, and for spTranslin dimerization. In addition, bioinformatics predicted the presence of a second protein-protein interaction site at the bottom of the spTranslin structure. Similar nucleic acid and protein interaction sites were also predicted for the human Translin. Thus, our results appear to generally apply to the Translin family of proteins, and are expected to contribute to a further elucidation of their functions.  相似文献   

18.
Structural genomics projects aim to provide a sharp increase in the number of structures of functionally unannotated, and largely unstudied, proteins. Algorithms and tools capable of deriving information about the nature, and location, of functional sites within a structure are increasingly useful therefore. Here, a neural network is trained to identify the catalytic residues found in enzymes, based on an analysis of the structure and sequence. The neural network output, and spatial clustering of the highly scoring residues are then used to predict the location of the active site.A comparison of the performance of differently trained neural networks is presented that shows how information from sequence and structure come together to improve the prediction accuracy of the network. Spatial clustering of the network results provides a reliable way of finding likely active sites. In over 69% of the test cases the active site is correctly predicted, and a further 25% are partially correctly predicted. The failures are generally due to the poor quality of the automatically generated sequence alignments.We also present predictions identifying the active site, and potential functional residues in five recently solved enzyme structures, not used in developing the method. The method correctly identifies the putative active site in each case. In most cases the likely functional residues are identified correctly, as well as some potentially novel functional groups.  相似文献   

19.
Computational prediction of protein functional sites can be a critical first step for analysis of large or complex proteins. Contemporary methods often require several homologous sequences and/or a known protein structure, but these resources are not available for many proteins. Leucine-rich repeats (LRRs) are ligand interaction domains found in numerous proteins across all taxonomic kingdoms, including immune system receptors in plants and animals. We devised Repeat Conservation Mapping (RCM), a computational method that predicts functional sites of LRR domains. RCM utilizes two or more homologous sequences and a generic representation of the LRR structure to identify conserved or diversified patches of amino acids on the predicted surface of the LRR. RCM was validated using solved LRR+ligand structures from multiple taxa, identifying ligand interaction sites. RCM was then used for de novo dissection of two plant microbe-associated molecular pattern (MAMP) receptors, EF-TU RECEPTOR (EFR) and FLAGELLIN-SENSING 2 (FLS2). In vivo testing of Arabidopsis thaliana EFR and FLS2 receptors mutagenized at sites identified by RCM demonstrated previously unknown functional sites. The RCM predictions for EFR, FLS2 and a third plant LRR protein, PGIP, compared favorably to predictions from ODA (optimal docking area), Consurf, and PAML (positive selection) analyses, but RCM also made valid functional site predictions not available from these other bioinformatic approaches. RCM analyses can be conducted with any LRR-containing proteins at www.plantpath.wisc.edu/RCM, and the approach should be modifiable for use with other types of repeat protein domains.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号