首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A Critical Assessment of Information-guided Protein–Protein Docking Predictions
Authors:Edward S C Shih  Ming-Jing Hwang
Institution:From the ‡Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan
Abstract:The structures of protein complexes are increasingly predicted via protein–protein docking (PPD) using ambiguous interaction data to help guide the docking. These data often are incomplete and contain errors and therefore could lead to incorrect docking predictions. In this study, we performed a series of PPD simulations to examine the effects of incompletely and incorrectly assigned interface residues on the success rate of PPD predictions. The results for a widely used PPD benchmark dataset obtained using a new interface information-driven PPD (IPPD) method developed in this work showed that the success rate for an acceptable top-ranked model varied, depending on the information content used, from as high as 95% when contact relationships (though not contact distances) were known for all residues to 78% when only the interface/non-interface state of the residues was known. However, the success rates decreased rapidly to ∼40% when the interface/non-interface state of 20% of the residues was assigned incorrectly, and to less than 5% for a 40% incorrect assignment. Comparisons with results obtained by re-ranking a global search and with those reported for other data-guided PPD methods showed that, in general, IPPD performed better than re-ranking when the information used was more complete and more accurate, but worse when it was not, and that when using bioinformatics-predicted information on interface residues, IPPD and other data-guided PPD methods performed poorly, at a level similar to simulations with a 40% incorrect assignment. These results provide guidelines for using information about interface residues to improve PPD predictions and reveal a bottleneck for such improvement imposed by the low accuracy of current bioinformatic interface residue predictions.Proteins work in close association with other proteins to mediate the intricate functions of a cell. The atomic resolution of the structure of a protein complex can therefore help one understand a protein''s function in detail. Protein–protein docking (PPD),1 a computational approach that complements experimental structure determinations, has attracted increasing research interest (1, 2), in part because it remains challenging to determine most structures of protein complexes via experimental techniques (3).To improve the performance of PPD predictions, experimentally derived data (e.g. distances) and information (e.g. the identity of interface residues) have been used either as a filter allowing less plausible docking solutions to be disregarded (49) or as a constraint to guide the docking process (10, 11). Various types of data and information have been used to aid PPD (12); these range from distances between, or the relative orientation of, the two interacting proteins to simple identification of the amino acid residues directly involved in the binding of the two proteins (13). Despite considerable success, the caveat for all these data-guided PPD predictions is that the data or information used must be correct in order to avoid spurious results caused by misguiding (12). It is therefore pertinent and important to evaluate the effects of errors in the incorporated data or information on the quality of PPD solutions.We have recently shown that the use of just a few distance constraints can improve the success rates of PPD such that they rival, or are even better than, those of a global search ranked using a sophisticated energy function, and that errors in the distance data significantly decrease the success rates of prediction (11). However, because distance data for interacting proteins are usually hard to obtain, other types of data or information, even if “ambiguous” (10), are increasingly used in PPD predictions (12, 14). In this study, we investigated the effects of incompletely and incorrectly assigned interface/non-interface residues, a major source of the so-called ambiguous data, on information-guided PPD predictions.As illustrated in Fig. 1, the information content of interface/non-interface residues can be rich enough to reveal the identity of every pair of residues in contact, but not their contact distances, or so poor as to reveal the interface/non-interface state of these residues but not their pairing relationship, for one or both of the two interacting proteins. To determine how these different levels of residue information content can help PPD predictions and the extent to which the use of incorrectly assigned residues degrades prediction success rates, we have developed a new interface information-driven PPD method (IPPD) and carried out a series of PPD simulations on a well-tested benchmark dataset. The results showed that when the information content was rich, excellent predictions (success rates for producing an acceptable top-ranked model > 70%) could be made via IPPD or by re-ranking a global search''s solutions using the same interface information, and that, encouragingly, the success of predictions remained respectable (top-ranked success rates > 15%) when the content was poor. However, when enough of the interface residues were incorrectly assigned, as would be the case when using interface residues predicted by a state-of-the-art bioinformatics method such as CPORT (15), few models ranked first by IPPD or other PPD methods, including HADDOCK (10), a popular ambiguous data-driven PPD method, came close to being acceptable. These results suggest that we can greatly increase the power of PPD predictions for practical applications only if the accuracy of current bioinformatics methods for predicting the interface residues of protein complexes can be significantly improved.Open in a separate windowFig. 1.Contact matrix of two interacting proteins, A and B, and the contact vectors of their residues. In the contact matrix, Mij = 1 or 0, respectively, denotes contact or a lack of contact between residue i in protein A and residue j in protein B. In the contact vectors, VAi = 1 or 0, respectively, when residue Ai has, or does not have, at least one contact with any residue of protein B.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号