共查询到20条相似文献,搜索用时 941 毫秒
1.
We present version 3.0 of our publicly available protein-protein docking benchmark. This update includes 40 new test cases, representing a 48% increase from Benchmark 2.0. For all of the new cases, the crystal structures of both binding partners are available. As with Benchmark 2.0, Structural Classification of Proteins (Murzin et al., J Mol Biol 1995;247:536-540) was used to remove redundant test cases. The 124 unbound-unbound test cases in Benchmark 3.0 are classified into 88 rigid-body cases, 19 medium-difficulty cases, and 17 difficult cases, based on the degree of conformational change at the interface upon complex formation. In addition to providing the community with more test cases for evaluating docking methods, the expansion of Benchmark 3.0 will facilitate the development of new algorithms that require a large number of training examples. Benchmark 3.0 is available to the public at http://zlab.bu.edu/benchmark. 相似文献
2.
The prediction of the structure of the protein-protein complex is of great importance to better understand molecular recognition processes. During systematic protein-protein docking, the surface of a protein molecule is scanned for putative binding sites of a partner protein. The possibility to include external data based on either experiments or bioinformatic predictions on putative binding sites during docking has been systematically explored. The external data were included during docking with a coarse-grained protein model and on the basis of force field weights to bias the docking search towards a predicted or known binding region. The approach was tested on a large set of protein partners in unbound conformations. The significant improvement of the docking performance was found if reliable data on the native binding sites were available. This was possible even if data for single key amino acids at a binding interface are included. In case of binding site predictions with limited accuracy, only modest improvement compared with unbiased docking was found. The optimisation of the protocol to bias the search towards predicted binding sites was found to further improve the docking performance resulting in approximately 40% acceptable solutions within the top 10 docking predictions compared with 22% in case of unbiased docking of unbound protein structures. 相似文献
3.
pK(a) values of ionizable residues have been calculated using the PROPKA method and structures of 75 protein-protein complexes and their corresponding free forms. These pK(a) values were used to compute changes in protonation state of individual residues, net changes in protonation state of the complex relative to the uncomplexed proteins, and the correction to a binding energy calculated assuming standard protonation states at pH 7. For each complex, two different structures for the uncomplexed form of the proteins were used: the X-ray structures determined for the proteins in the absence of the other protein and the individual protein structures taken from the structure of the complex (referred to as unbound and bound structures, respectively). In 28 and 77% of the cases considered here, protein-protein binding is accompanied by a complete (>95%) or significant (>50%) change in protonation state of at least one residue using unbound structures. Furthermore, in 36 and 61% of the cases, protein-protein binding is accompanied by a complete or significant net change in protonation state of the complex relative to the separated monomers. Using bound structures, the corresponding values are 12, 51, 20, and 48%. Comparison to experimental data suggest that using unbound and bound structures lead to over- and underestimation of binding-induced protonation state changes, respectively. Thus, we conclude that protein-protein binding is often associated with changes in protonation state of amino acid residues and with changes in the net protonation state of the proteins. The pH-dependent correction to the binding energy contributes at least one order of magnitude to the binding constant in 45 and 23%, using unbound and bound structures, respectively. 相似文献
4.
The number of structures of protein-protein complexes deposited to the Protein Data Bank is growing rapidly. These structures embed important information for predicting structures of new protein complexes. This motivated us to develop the PPISP method for predicting interface residues in protein-protein complexes. In PPISP, sequence profiles and solvent accessibility of spatially neighboring surface residues were used as input to a neural network. The network was trained on native interface residues collected from the Protein Data Bank. The prediction accuracy at the time was 70% with 47% coverage of native interface residues. Now we have extensively improved PPISP. The training set now consisted of 1156 nonhomologous protein chains. Test on a set of 100 nonhomologous protein chains showed that the prediction accuracy is now increased to 80% with 51% coverage. To solve the problem of over-prediction and under-prediction associated with individual neural network models, we developed a consensus method that combines predictions from multiple models with different levels of accuracy and coverage. Applied on a benchmark set of 68 proteins for protein-protein docking, the consensus approach outperformed the best individual models by 3-8 percentage points in accuracy. To demonstrate the predictive power of cons-PPISP, eight complex-forming proteins with interfaces characterized by NMR were tested. These proteins are nonhomologous to the training set and have a total of 144 interface residues identified by chemical shift perturbation. cons-PPISP predicted 174 interface residues with 69% accuracy and 47% coverage and promises to complement experimental techniques in characterizing protein-protein interfaces. . 相似文献
5.
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects. 相似文献
6.
In this article we introduce a new method for the identification and the accurate characterization of protein surface cavities. The method is encoded in the program SCREEN (Surface Cavity REcognition and EvaluatioN). As a first test of the utility of our approach we used SCREEN to locate and analyze the surface cavities of a nonredundant set of 99 proteins cocrystallized with drugs. We find that this set of proteins has on average about 14 distinct cavities per protein. In all cases, a drug is bound at one (and sometimes more than one) of these cavities. Using cavity size alone as a criterion for predicting drug-binding sites yields a high balanced error rate of 15.7%, with only 71.7% coverage. Here we characterize each surface cavity by computing a comprehensive set of 408 physicochemical, structural, and geometric attributes. By applying modern machine learning techniques (Random Forests) we were able to develop a classifier that can identify drug-binding cavities with a balanced error rate of 7.2% and coverage of 88.9%. Only 18 of the 408 cavity attributes had a statistically significant role in the prediction. Of these 18 important attributes, almost all involved size and shape rather than physicochemical properties of the surface cavity. The implications of these results are discussed. A SCREEN Web server is available at http://interface.bioc.columbia.edu/screen. 相似文献
7.
Protein-protein docking (PPD) is a computational process that predicts the structure of a complex of two interacting proteins from their unbound structures. The accuracy of PPD predictions is low, but can be greatly enhanced if experimentally determined distance data are available for incorporation into the prediction. However, the specific effects of distance constraints on PPD predictions are largely uncharacterized. In this study, we systematically simulated the effects of using distance constraints both on a new distance constraint-driven PPD approach, called DPPD, and also, by re-ranking, on a well-established grid-based global search approach. Our results for a PPD benchmark dataset of 84 protein complexes of known structures showed that near 100% docking success rates could be obtained when the number of distance constraints exceeded six, the degrees of freedom of the system, but the success rate was significantly reduced by long distance constraints, large binding-induced conformational changes, and large errors in the distance data. Our results also showed that, under most conditions simulated, even two or three distance constraints were sufficient to achieve a much better success rate than those using a sophisticated physicochemical function to re-rank the results of the global search. Our study provides guidelines for the practical incorporation of experimental distance data to aid PPD predictions. 相似文献
8.
Vreven T Hwang H Pierce BG Weng Z 《Protein science : a publication of the Protein Society》2012,21(3):396-404
We present an energy function for predicting binding free energies of protein-protein complexes, using the three-dimensional structures of the complex and unbound proteins as input. Our function is a linear combination of nine terms and achieves a correlation coefficient of 0.63 with experimental measurements when tested on a benchmark of 144 complexes using leave-one-out cross validation. Although we systematically tested both atomic and residue-based scoring functions, the selected function is dominated by residue-based terms. Our function is stable for subsets of the benchmark stratified by experimental pH and extent of conformational change upon complex formation, with correlation coefficients ranging from 0.61 to 0.66. 相似文献
9.
The polyproline II (PPII) conformation of protein backbone is an important secondary structure type. It is unusual in that, due to steric constraints, its main-chain hydrogen-bond donors and acceptors cannot easily be satisfied. It is unable to make local hydrogen bonds, in a manner similar to that of alpha-helices, and it cannot easily satisfy the hydrogen-bonding potential of neighboring residues in polyproline conformation in a manner analogous to beta-strands. Here we describe an analysis of polyproline conformations using the HOMSTRAD database of structurally aligned proteins. This allows us not only to determine amino acid propensities from a much larger database than previously but also to investigate conservation of amino acids in polyproline conformations, and the conservation of the conformation itself. Although proline is common in polyproline helices, helices without proline represent 46% of the total. No other amino acid appears to be greatly preferred; glycine and aromatic amino acids have low propensities for PPII. Accordingly, the hydrogen-bonding potential of PPII main-chain is mainly satisfied by water molecules and by other parts of the main-chain. Side-chain to main-chain interactions are mostly nonlocal. Interestingly, the increased number of nonsatisfied H-bond donors and acceptors (as compared with alpha-helices and beta-strands) makes PPII conformers well suited to take part in protein-protein interactions. 相似文献
10.
The tertiary structures of protein complexes provide a crucial insight about the molecular mechanisms that regulate their functions and assembly. However, solving protein complex structures by experimental methods is often more difficult than single protein structures. Here, we have developed a novel computational multiple protein docking algorithm, Multi‐LZerD, that builds models of multimeric complexes by effectively reusing pairwise docking predictions of component proteins. A genetic algorithm is applied to explore the conformational space followed by a structure refinement procedure. Benchmark on eleven hetero‐multimeric complexes resulted in near‐native conformations for all but one of them (a root mean square deviation smaller than 2.5Å). We also show that our method copes with unbound docking cases well, outperforming the methodology that can be directly compared with our approach. Multi‐LZerD was able to predict near‐native structures for multimeric complexes of various topologies.Proteins 2012; © 2012 Wiley Periodicals, Inc. 相似文献
11.
Despite the increasing number of successful determinations of complex protein structures the understanding of their dynamics properties is still rather limited. Using X-ray crystallography, we demonstrate that ribonuclease A (RNase A) undergoes significant domain motions upon ligand binding. In particular, when cytidine 2'-monophosphate binds to RNase A, the structure of the enzyme becomes more compact. Interestingly, our data also show that these structural alterations are fully reversible in the crystal state. These findings provide structural bases for the dynamic behavior of RNase A in the binding of the substrate shown by Petsko and coworkers (Rasmussen et al. Nature 1992;357:423-424). These subtle domain motions may assume functional relevance for more complex system and may play a significant role in the cooperativity of oligomeric enzymes. 相似文献
12.
Using an efficient iterative method, we have developed a distance-dependent knowledge-based scoring function to predict protein-protein interactions. The function, referred to as ITScore-PP, was derived using the crystal structures of a training set of 851 protein-protein dimeric complexes containing true biological interfaces. The key idea of the iterative method for deriving ITScore-PP is to improve the interatomic pair potentials by iteration, until the pair potentials can distinguish true binding modes from decoy modes for the protein-protein complexes in the training set. The iterative method circumvents the challenging reference state problem in deriving knowledge-based potentials. The derived scoring function was used to evaluate the ligand orientations generated by ZDOCK 2.1 and the native ligand structures on a diverse set of 91 protein-protein complexes. For the bound test cases, ITScore-PP yielded a success rate of 98.9% if the top 10 ranked orientations were considered. For the more realistic unbound test cases, the corresponding success rate was 40.7%. Furthermore, for faster orientational sampling purpose, several residue-level knowledge-based scoring functions were also derived following the similar iterative procedure. Among them, the scoring function that uses the side-chain center of mass (SCM) to represent a residue, referred to as ITScore-PP(SCM), showed the best performance and yielded success rates of 71.4% and 30.8% for the bound and unbound cases, respectively, when the top 10 orientations were considered. ITScore-PP was further tested using two other published protein-protein docking decoy sets, the ZDOCK decoy set and the RosettaDock decoy set. In addition to binding mode prediction, the binding scores predicted by ITScore-PP also correlated well with the experimentally determined binding affinities, yielding a correlation coefficient of R = 0.71 on a test set of 74 protein-protein complexes with known affinities. ITScore-PP is computationally efficient. The average run time for ITScore-PP was about 0.03 second per orientation (including optimization) on a personal computer with 3.2 GHz Pentium IV CPU and 3.0 GB RAM. The computational speed of ITScore-PP(SCM) is about an order of magnitude faster than that of ITScore-PP. ITScore-PP and/or ITScore-PP(SCM) can be combined with efficient protein docking software to study protein-protein recognition. 相似文献
13.
Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/. 相似文献
14.
A genetic algorithm (GA) for protein-protein docking is described, in which the proteins are represented by dot surfaces calculated using the Connolly program. The GA is used to move the surface of one protein relative to the other to locate the area of greatest surface complementarity between the two. Surface dots are deemed complementary if their normals are opposed, their Connolly shape type is complementary, and their hydrogen bonding or hydrophobic potential is fulfilled. Overlap of the protein interiors is penalized. The GA is tested on 34 large protein-protein complexes where one or both proteins has been crystallized separately. Parameters are established for which 30 of the complexes have at least one near-native solution ranked in the top 100. We have also successfully reassembled a 1,400-residue heptamer based on the top-ranking GA solution obtained when docking two bound subunits. 相似文献
15.
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins. 相似文献
16.
17.
Wollacott AM Zanghellini A Murphy P Baker D 《Protein science : a publication of the Protein Society》2007,16(2):165-175
We describe the development of a method for assembling structures of multidomain proteins from structures of isolated domains. The method consists of an initial low-resolution search in which the conformational space of the domain linker is explored using the Rosetta de novo structure prediction method, followed by a high-resolution search in which all atoms are treated explicitly and backbone and side chain degrees of freedom are simultaneously optimized. The method recapitulates, often with very high accuracy, the structures of existing multidomain proteins. 相似文献
18.
Vreven T Hwang H Weng Z 《Protein science : a publication of the Protein Society》2011,20(9):1576-1586
Most scoring functions for protein-protein docking algorithms are either atom-based or residue-based, with the former being able to produce higher quality structures and latter more tolerant to conformational changes upon binding. Earlier, we developed the ZRANK algorithm for reranking docking predictions, with a scoring function that contained only atom-based terms. Here we combine ZRANK's atom-based potentials with five residue-based potentials published by other labs, as well as an atom-based potential IFACE that we published after ZRANK. We simultaneously optimized the weights for selected combinations of terms in the scoring function, using decoys generated with the protein-protein docking algorithm ZDOCK. We performed rigorous cross validation of the combinations using 96 test cases from a docking benchmark. Judged by the integrative success rate of making 1000 predictions per complex, addition of IFACE and the best residue-based pair potential reduced the number of cases without a correct prediction by 38 and 27% relative to ZDOCK and ZRANK, respectively. Thus combination of residue-based and atom-based potentials into a scoring function can improve performance for protein-protein docking. The resulting scoring function is called IRAD (integration of residue- and atom-based potentials for docking) and is available at http://zlab.umassmed.edu. 相似文献
19.
In this study, the X-ray crystal structure of the complex between Escherichia coli thioredoxin reductase (EC TrxR) and its substrate thioredoxin (Trx) was used as a guide to design a Deinococcus radiodurans TrxR (DR TrxR) mutant with altered Trx specificity. Previous studies have shown that TrxRs have higher affinity for cognate Trxs (same species) than that for Trxs from different species. Computational alanine scanning mutagenesis and visual inspection of the EC TrxR-Trx interface suggested that only four residues (F81, R130, F141, and F142) account for the majority of the EC TrxR-Trx interface stability. Individual replacement of equivalent residues in DR TrxR (M84, K137, F148, and F149) with alanine resulted in drastic changes in binding affinity, confirming that the four residues account for most of TrxR-Trx interface stability. When M84 and K137 were changed to match equivalent EC TrxR residues (K137R and M84F), the DR TrxR substrate specificity was altered from its own Trx to that of EC Trx. The results suggest that a small subset of the TrxR-Trx interface residues is responsible for the majority of Trx binding affinity and species-specific recognition. 相似文献
20.
《Biological reviews of the Cambridge Philosophical Society》2018,93(2):1014-1031
Whole‐genome or whole‐exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease‐causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general‐purpose discussion of important issues related to pathogenic gene identification based on trio‐based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio‐based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges. 相似文献