首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
We present a novel partner‐specific protein–protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter‐protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding‐associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/ . Proteins 2014; 82:1142–1155. © 2013 Wiley Periodicals, Inc.  相似文献   

2.
Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as “feedback” to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.  相似文献   

3.
Identifying protein–protein interfaces is crucial for structural biology. Because of the constraints in wet experiments, many computational methods have been proposed. Without knowing any information about the partner chains, a new method of predicting protein–protein interaction interface residues purely based on evolutionary information in heterocomplexes is proposed here. Unlike traditional approaches using multiple sequence alignment profiles to represent the conservation level for each residue, we make predictions based on the concept of residue conservation scores so that the dimension of the feature vector for each residue can be drastically reduced, at least 20 times less than conventional methods. Based on the representation approach, a simple linear discriminant function is used to make predictions, so the computational complexity of the whole prediction procedure can also be greatly decreased. By testing our approach on 69 heterocomplex chains, experimental results demonstrate the performance of our approach is indeed superior to current existing methods.  相似文献   

4.
Hamilton N  Burrage K  Ragan MA  Huber T 《Proteins》2004,56(4):679-684
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations.  相似文献   

5.
Protein–protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross‐docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross‐docking predictions using the area under the specificity‐sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding‐site predictions resulting from the cross‐docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408–1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.  相似文献   

6.
Kim WK  Ison JC 《Proteins》2005,61(4):1075-1088
Considering the limited success of the most sophisticated docking methods available and the amount of computation required for systematic docking, cataloging all the known interfaces may be an alternative basis for the prediction of protein tertiary and quaternary structures. We classify domain interfaces according to the geometry of domain-domain association. By applying a simple and efficient method called "interface tag clustering," more than 4,000 distinct types of domain interfaces are collected from Protein Quaternary Structure Server and Protein Data Bank. Given a pair of interacting domains, we define "face" as the set of interacting residues in each single domain and the pair of interacting faces as an "interface." We investigate how the geometry of interfaces relates to a network of interacting protein families, such as how many different binding orientations are possible between two families or whether a family uses distinct surfaces or the same surface when the family has diverse interaction partners from various families. We show there are, on average, 1.2-1.9 different types of interfaces between interacting domains and a significant number of family pairs associate in multiple orientations. In general, a family tends to use distinct faces for each partner when the family has diverse interaction partners. Each face is highly specific to its interaction partner and the binding orientation. The relative positions of interface residues are generally well conserved within the same type of interface even between remote homologs. The classification result is available at http://www.biotec.tu-dresden.de/~wkim/supplement.  相似文献   

7.
MOTIVATION: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS: The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.  相似文献   

8.
Vicatos S  Kaznessis YN 《Proteins》2008,70(2):539-552
We present a method that significantly improves the accuracy of predicted proximal residue pairs in protein molecules. Computational methods for predicting pairs of amino acids that are distant in the protein sequence but close in the protein 3D structure can benefit attempts to in silico recognize the fold of a protein molecule. Unfortunately, currently available methods suffer from low predictive accuracy. In this work, we use Monte Carlo simulations to fold protein molecules with proximal pair predictions used as additional energy constraints. To test our methods, we study molecules with known tertiary structures. With Monte Carlo, we generate ensembles of structures for each set of residues constraints. The distribution of the root mean square deviation of the folded structures from the known native structure reveals clear information about the accuracy of the constraint sets used. With recursive substitutions of constraints, false positive predictions are identified and filtered out and significant improvements in accuracy are observed.  相似文献   

9.
With the development of bioinformatics, more and more protein sequence information has become available. Meanwhile, the number of known protein–protein interactions (PPIs) is still very limited. In this article, we propose a new method for predicting interacting protein pairs using a Bayesian method based on a new feature representation. We trained our model using data on 6,459 PPI pairs from the yeast Saccharomyces cerevisiae core subset. Using six species of DIP database, our model demonstrates an average prediction accuracy of 93.67%. The result showed that our method is superior to other methods in both computing time and prediction accuracy.  相似文献   

10.
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.  相似文献   

11.
We developed a method called residue contact frequency (RCF), which uses the complex structures generated by the protein–protein docking algorithm ZDOCK to predict interface residues. Unlike interface prediction algorithms that are based on monomers alone, RCF is binding partner specific. We evaluated the performance of RCF using the area under the precision‐recall (PR) curve (AUC) on a large protein docking Benchmark. RCF (AUC = 0.44) performed as well as meta‐PPISP (AUC = 0.43), which is one of the best monomer‐based interface prediction methods. In addition, we test a support vector machine (SVM) to combine RCF with meta‐PPISP and another monomer‐based interface prediction algorithm Evolutionary Trace to further improve the performance. We found that the SVM that combined RCF and meta‐PPISP achieved the best performance (AUC = 0.47). We used RCF to predict the binding interfaces of proteins that can bind to multiple partners and RCF was able to correctly predict interface residues that are unique for the respective binding partners. Furthermore, we found that residues that contributed greatly to binding affinity (hotspot residues) had significantly higher RCF than other residues. Proteins 2014; 82:57–66. © 2013 Wiley Periodicals, Inc.  相似文献   

12.
Wang XF  Chen Z  Wang C  Yan RX  Zhang Z  Song J 《PloS one》2011,6(10):e26767
Integral membrane proteins constitute 25-30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp.  相似文献   

13.
mu-Conotoxin GIIIA (mu-CTX) is a high-affinity ligand for the outer vestibule of selected isoforms of the voltage-gated Na(+) channel. The detailed bases for the toxin's high affinity binding and isoform selectivity are unclear. The outer vestibule is lined by four pore-forming (P) loops, each with an acidic residue near the mouth of the vestibule. mu-CTX has seven positively charged residues that may interact with these acidic P-loop residues. Using pair-wise alanine replacement of charged toxin and channel residues, in conjunction with double mutant cycle analysis, we determined coupling energies for specific interactions between each P-loop acidic residue and selected toxin residues to systematically establish quantitative restraints on the toxin orientation in the outer vestibule. Xenopus oocytes were injected with the mutant or native Na(+) channel mRNA, and currents measured by two-electrode voltage clamp. Mutant cycle analysis revealed novel, strong, toxin-channel interactions between K9/E403, K11/D1241, K11/D1532, and R19/D1532. Experimentally determined coupling energies for interacting residue pairs provided restraints for molecular dynamics simulations of mu-CTX docking. Our simulations suggest a refined orientation of the toxin in the pore, with toxin basic side-chains playing key roles in high-affinity binding. This modeling also provides a set of testable predictions for toxin-channel interactions, hitherto not described, that may contribute to high-affinity binding and channel isoform selectivity.  相似文献   

14.
In this paper we address the problem of extracting features relevant for predicting protein--protein interaction sites from the three-dimensional structures of protein complexes. Our approach is based on information about evolutionary conservation and surface disposition. We implement a neural network based system, which uses a cross validation procedure and allows the correct detection of 73% of the residues involved in protein interactions in a selected database comprising 226 heterodimers. Our analysis confirms that the chemico-physical properties of interacting surfaces are difficult to distinguish from those of the whole protein surface. However neural networks trained with a reduced representation of the interacting patch and sequence profile are sufficient to generalize over the different features of the contact patches and to predict whether a residue in the protein surface is or is not in contact. By using a blind test, we report the prediction of the surface interacting sites of three structural components of the Dnak molecular chaperone system, and find close agreement with previously published experimental results. We propose that the predictor can significantly complement results from structural and functional proteomics.  相似文献   

15.
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.  相似文献   

16.

Background

Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate.

Results

We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/.

Conclusions

Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.  相似文献   

17.
Polyketides, a diverse group of heteropolymers with antibiotic and antitumor properties, are assembled in bacteria by multiprotein chains of modular polyketide synthase (PKS) proteins. Specific protein-protein interactions determine the order of proteins within a multiprotein chain, and thereby the order in which chemically distinct monomers are added to the growing polyketide product. Here we investigate the evolutionary and molecular origins of protein interaction specificity. We focus on the short, conserved N- and C-terminal docking domains that mediate interactions between modular PKS proteins. Our computational analysis, which combines protein sequence data with experimental protein interaction data, reveals a hierarchical interaction specificity code. PKS docking domains are descended from a single ancestral interacting pair, but have split into three phylogenetic classes that are mutually noninteracting. Specificity within one such compatibility class is determined by a few key residues, which can be used to define compatibility subclasses. We identify these residues using a novel, highly sensitive co-evolution detection algorithm called CRoSS (correlated residues of statistical significance). The residue pairs selected by CRoSS are involved in direct physical interactions in a docked-domain NMR structure. A single PKS system can use docking domain pairs from multiple classes, as well as domain pairs from multiple subclasses of any given class. The termini of individual proteins are frequently shuffled, but docking domain pairs straddling two interacting proteins are linked as an evolutionary module. The hierarchical and modular organization of the specificity code is intimately related to the processes by which bacteria generate new PKS pathways.  相似文献   

18.
To adequately deal with the inherent complexity of interactions between protein side-chains, we develop and describe here a novel method for characterizing protein packing within a fold family. Instead of approaching side-chain interactions absolutely from one residue to another, we instead consider the relative interactions of contacting residue pairs. The basic element, the pair-wise relative contact, is constructed from a sequence alignment and contact analysis of a set of structures and consists of a cluster of similarly oriented, interacting, side-chain pairs. To demonstrate this construct's usefulness in analyzing protein structure, we used the pair-wise relative contacts to analyze two sets of protein structures as defined by SCOP: the diverse globin-like superfamily (126 structures) and the more uniform heme binding globin family (a 94 structure subset of the globin-like superfamily). The superfamily structure set produced 1266 unique pair-wise relative contacts, whereas the family structure subset gave 1001 unique pair-wise relative contacts. For both sets, we show that these constructs can be used to accurately and automatically differentiate between fold classes. Furthermore, these pair-wise relative contacts correlate well with sequence identity and thus provide a direct relationship between changes in sequence and changes in structure. To capture the complexity of protein packing, these pair-wise relative contacts can be superimposed around a single residue to create a multi-body construct called a relative packing group. Construction of convex hulls around the individual packing groups provides a measure of the variation in packing around a residue and defines an approximate volume of space occupied by the groups interacting with a residue. We find that these relative packing groups are useful in understanding the structural quality of sequence or structure alignments. Moreover, they provide context to calculate a value for structural randomness, which is important in properly assessing the quality of a structural alignment. The results of this study provide the framework for future analysis for correlating sequence changes to specific structure changes.  相似文献   

19.
Development of sequence-based methods for predicting putative interfacial residues is an extremely important task in modeling 3D structures of protein–protein complexes. In the present paper we used non-gapped sequence segments to predict both interacting and interfacial residues. We demonstrated that continuous sequence segments do occur at the protein–protein interfaces and showed that continuous interacting interfacial segments (CIIS) of length nine are presented on average, in 37% of the complexes in our dataset. Our results indicate that CIIS consist mostly of interacting strands and/or loops, while the CIIS involving the helixes are scarce. We performed scoring of CIIS using four different scoring mechanisms and found that scores of CIIS differ significantly from the scores calculated for random stretches of residues. We argue that such statistical difference inferred thought the corresponding Z-scores could be used for detecting putative interfacial residue segments without using any structural information. This hypothesis was tested on our dataset and benchmarking resulted to 10–60% prediction accuracy depending on type of benchmarking and scoring scheme used in calculations. Such predictions that do not depend on the availability of the 3D structures of monomers can be quite valuable in modeling 3D structures of obligatory complexes, for which structures of separated monomers do not exist.  相似文献   

20.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号