首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Docking represents a versatile and powerful method to predict the geometry of protein–protein complexes. However, despite significant methodical advances, the identification of good docking solutions among a large number of false solutions still remains a difficult task. We have previously demonstrated that the formalism of mutual information (MI) from information theory can be adapted to protein docking, and we have now extended this approach to enhance its robustness and applicability. A large dataset consisting of 22,934 docking decoys derived from 203 different protein–protein complexes was used for an MI-based optimization of reduced amino acid alphabets representing the protein–protein interfaces. This optimization relied on a clustering analysis that allows one to estimate the mutual information of whole amino acid alphabets by considering all structural features simultaneously, rather than by treating them individually. This clustering approach is fast and can be applied in a similar fashion to the generation of reduced alphabets for other biological problems like fold recognition, sequence data mining, or secondary structure prediction. The reduced alphabets derived from the present work were converted into a scoring function for the evaluation of docking solutions, which is available for public use via the web service score-MI: http://score-MI.biochem.uni-erlangen.de  相似文献   

2.
Müller W  Sticht H 《Proteins》2007,67(1):98-111
In this work, we developed a protein-specifically adapted scoring function and applied it to the reranking of protein-protein docking solutions generated with a conventional docking program. The approach was validated using experimentally determined structures of the bacterial HPr-protein in complex with four structurally nonhomologous binding partners as an example. A sufficiently large data basis for the generation of protein-specifically adapted pair potentials was generated by modeling all orthologous complexes for each type of interaction resulting in a total of 224 complexes. The parameters for potential generation were systematically varied and resulted in a total of 66,132 different scoring functions that were tested for their ability of successful reranking of 1000 docking solutions generated from modeled structures of the unbound binding partners. Parameters that proved critical for the generation of good scoring functions were the distance cutoff used for the generation of the pair potential, and an additional cutoff that allows a proper weighting of conserved and nonconserved contacts in the interface. Compared to the original scoring function, application of this novel type of scoring functions resulted in a significant accumulation of acceptable docking solutions within the first 10 ranks. Depending on the type of complex investigated one to five acceptable complex geometries are found among the 10 highest-ranked solutions and for three of the four systems tested, an acceptable solution was placed on the first rank.  相似文献   

3.
The accurate scoring of rigid-body docking orientations represents one of the major difficulties in protein-protein docking prediction. Other challenges are the development of faster and more efficient sampling methods and the introduction of receptor and ligand flexibility during simulations. Overall, good discrimination of near-native docking poses from the very early stages of rigid-body protein docking is essential step before applying more costly interface refinement to the correct docking solutions. Here we explore a simple approach to scoring of rigid-body docking poses, which has been implemented in a program called pyDock. The scheme is based on Coulombic electrostatics with distance dependent dielectric constant, and implicit desolvation energy with atomic solvation parameters previously adjusted for rigid-body protein-protein docking. This scoring function is not highly dependent on specific geometry of the docking poses and therefore can be used in rigid-body docking sets generated by a variety of methods. We have tested the procedure in a large benchmark set of 80 unbound docking cases. The method is able to detect a near-native solution from 12,000 docking poses and place it within the 100 lowest-energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information. More specifically, a near-native solution will lie within the top 20 solutions in 37% of the cases. The simplicity of the approach allows for a better understanding of the physical principles behind protein-protein association, and provides a fast tool for the evaluation of large sets of rigid-body docking poses in search of the near-native orientation.  相似文献   

4.
Qian Wang  Luhua Lai 《Proteins》2014,82(10):2472-2482
Target structure‐based virtual screening, which employs protein‐small molecule docking to identify potential ligands, has been widely used in small‐molecule drug discovery. In the present study, we used a protein–protein docking program to identify proteins that bind to a specific target protein. In the testing phase, an all‐to‐all protein–protein docking run on a large dataset was performed. The three‐dimensional rigid docking program SDOCK was used to examine protein–protein docking on all protein pairs in the dataset. Both the binding affinity and features of the binding energy landscape were considered in the scoring function in order to distinguish positive binding pairs from negative binding pairs. Thus, the lowest docking score, the average Z‐score, and convergency of the low‐score solutions were incorporated in the analysis. The hybrid scoring function was optimized in the all‐to‐all docking test. The docking method and the hybrid scoring function were then used to screen for proteins that bind to tumor necrosis factor‐α (TNFα), which is a well‐known therapeutic target for rheumatoid arthritis and other autoimmune diseases. A protein library containing 677 proteins was used for the screen. Proteins with scores among the top 20% were further examined. Sixteen proteins from the top‐ranking 67 proteins were selected for experimental study. Two of these proteins showed significant binding to TNFα in an in vitro binding study. The results of the present study demonstrate the power and potential application of protein–protein docking for the discovery of novel binding proteins for specific protein targets. Proteins 2014; 82:2472–2482. © 2014 Wiley Periodicals, Inc.  相似文献   

5.
This work describes for the first time the structure of purine nucleoside phosphorylase from Mycobacterium tuberculosis (MtPNP) in complex with sulfate and its natural substrate, 2′-deoxyguanosine, and its application to virtual screening. We report docking studies of a set of molecules against this structure. Application of polynomial empirical scoring function was able to rank docking solutions with good predicting power which opens the possibility to apply this new criterion to analyze docking solutions and screen small-molecule databases for new chemical entities to inhibit MtPNP.  相似文献   

6.
Paul N  Rognan D 《Proteins》2002,47(4):521-533
Protein-based virtual screening of chemical libraries is a powerful technique for identifying new molecules that may interact with a macromolecular target of interest. Because of docking and scoring limitations, it is more difficult to apply as a lead optimization method because it requires that the docking/scoring tool is able to propose as few solutions as possible and all of them with a very good accuracy for both the protein-bound orientation and the conformation of the ligand. In the present study, we present a consensus docking approach (ConsDock) that takes advantage of three widely used docking tools (Dock, FlexX, and Gold). The consensus analysis of all possible poses generated by several docking tools is performed sequentially in four steps: (i) hierarchical clustering of all poses generated by a docking tool into families represented by a leader; (ii) definition of all consensus pairs from leaders generated by different docking programs; (iii) clustering of consensus pairs into classes, represented by a mean structure; and (iv) ranking the different means starting from the most populated class of consensus pairs. When applied to a test set of 100 protein-ligand complexes from the Protein Data Bank, ConsDock significantly outperforms single docking with respect to the docking accuracy of the top-ranked pose. In 60% of the cases investigated here, ConsDock was able to rank as top solution a pose within 2 A RMSD of the X-ray structure. It can be applied as a postprocessing filter to either single- or multiple-docking programs to prioritize three-dimensional guided lead optimization from the most likely docking solution.  相似文献   

7.
A thorough evaluation of some of the most advanced docking and scoring methods currently available is described, and guidelines for the choice of an appropriate protocol for docking and virtual screening are defined. The generation of a large and highly curated test set of pharmaceutically relevant protein-ligand complexes with known binding affinities is described, and three highly regarded docking programs (Glide, GOLD, and ICM) are evaluated on the same set with respect to their ability to reproduce crystallographic binding orientations. Glide correctly identified the crystallographic pose within 2.0 A in 61% of the cases, versus 48% for GOLD and 45% for ICM. In general Glide appears to perform most consistently with respect to diversity of binding sites and ligand flexibility, while the performance of ICM and GOLD is more binding site-dependent and it is significantly poorer when binding is predominantly driven by hydrophobic interactions. The results also show that energy minimization and reranking of the top N poses can be an effective means to overcome some of the limitations of a given docking function. The same docking programs are evaluated in conjunction with three different scoring functions for their ability to discriminate actives from inactives in virtual screening. The evaluation, performed on three different systems (HIV-1 protease, IMPDH, and p38 MAP kinase), confirms that the relative performance of different docking and scoring methods is to some extent binding site-dependent. GlideScore appears to be an effective scoring function for database screening, with consistent performance across several types of binding sites, while ChemScore appears to be most useful in sterically demanding sites since it is more forgiving of repulsive interactions. Energy minimization of docked poses can significantly improve the enrichments in systems with sterically demanding binding sites. Overall Glide appears to be a safe general choice for docking, while the choice of the best scoring tool remains to a larger extent system-dependent and should be evaluated on a case-by-case basis.  相似文献   

8.
Structural characterization of protein-protein interactions is essential for our ability to study life processes at the molecular level. Computational modeling of protein complexes (protein docking) is important as the source of their structure and as a way to understand the principles of protein interaction. Rapidly evolving comparative docking approaches utilize target/template similarity metrics, which are often based on the protein structure. Although the structural similarity, generally, yields good performance, other characteristics of the interacting proteins (eg, function, biological process, and localization) may improve the prediction quality, especially in the case of weak target/template structural similarity. For the ranking of a pool of models for each target, we tested scoring functions that quantify similarity of Gene Ontology (GO) terms assigned to target and template proteins in three ontology domains—biological process, molecular function, and cellular component (GO-score). The scoring functions were tested in docking of bound, unbound, and modeled proteins. The results indicate that the combined structural and GO-terms functions improve the scoring, especially in the twilight zone of structural similarity, typical for protein models of limited accuracy.  相似文献   

9.
Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly‐SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly‐SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/ .  相似文献   

10.

Background  

The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.  相似文献   

11.
May A  Zacharias M 《Proteins》2007,69(4):774-780
A reduced protein model combined with a systematic docking approach has been employed to predict protein-protein complex structures in CAPRI rounds 6-11. The docking approach termed ATTRACT is based on energy minimization in translational and rotational degrees of freedom of one protein with respect to the second protein starting from many thousand initial protein partner placements. It also allows for approximate inclusion of global flexibility of protein partners during systematic docking by conformational relaxation of the partner proteins in precalculated soft collective backbone degrees of freedom. We have submitted models for six targets, achieved acceptable docking solutions for two targets, and predicted >20% correct contacts for five targets. Possible improvements of the docking approach in particular at the scoring and refinement steps are discussed.  相似文献   

12.
Protein‐protein interactions control a large range of biological processes and their identification is essential to understand the underlying biological mechanisms. To complement experimental approaches, in silico methods are available to investigate protein‐protein interactions. Cross‐docking methods, in particular, can be used to predict protein binding sites. However, proteins can interact with numerous partners and can present multiple binding sites on their surface, which may alter the binding site prediction quality. We evaluate the binding site predictions obtained using complete cross‐docking simulations of 358 proteins with 2 different scoring schemes accounting for multiple binding sites. Despite overall good binding site prediction performances, 68 cases were still associated with very low prediction quality, presenting individual area under the specificity‐sensitivity ROC curve (AUC) values below the random AUC threshold of 0.5, since cross‐docking calculations can lead to the identification of alternate protein binding sites (that are different from the reference experimental sites). For the large majority of these proteins, we show that the predicted alternate binding sites correspond to interaction sites with hidden partners, that is, partners not included in the original cross‐docking dataset. Among those new partners, we find proteins, but also nucleic acid molecules. Finally, for proteins with multiple binding sites on their surface, we investigated the structural determinants associated with the binding sites the most targeted by the docking partners.  相似文献   

13.
The design of sulfated, small, nonsaccharide molecules as modulators of proteins is still in its infancy as standard drug discovery tools such as library of diverse sulfated molecules and in silico docking and scoring protocol have not been firmly established. Databases, such as ZINC, contain too few sulfate-containing nonsaccharide molecules, which severely limits the identification of new hits. Lack of a generally applicable protocol for scaffold hopping limits the development of sulfated small molecules as synthetic mimetics of the highly sulfated glycosaminoglycans. We explored a sequential ligand-based (LBVS) and structure-based virtual screening (SBVS) approach starting from our initial discovery of monosulfated benzofurans to discover alternative scaffolds as allosteric modulators of thrombin, a key coagulation enzyme. Screening the ZINC database containing nearly 1 million nonsulfated small molecules using a pharmacophore developed from the parent sulfated benzofurans followed by a genetic algorithm-based dual-filter docking and scoring screening identified a group of 10 promising hits, of which three top-scoring hits were synthesized. Each was found to selectively inhibit human alpha-thrombin suggesting the possibility of this approach for scaffold hopping. Michaelis–Menten kinetics showed allosteric inhibition mechanism for the best molecule and human plasma studies confirmed good anticoagulation potential as expected. Our simple sequential LBVS and SBVS approach is likely to be useful as a general strategy for identification of sulfated small molecules hits as modulators of glycosaminoglycan–protein interactions.  相似文献   

14.
Yue Cao  Yang Shen 《Proteins》2020,88(8):1091-1099
Structural information about protein-protein interactions, often missing at the interactome scale, is important for mechanistic understanding of cells and rational discovery of therapeutics. Protein docking provides a computational alternative for such information. However, ranking near-native docked models high among a large number of candidates, often known as the scoring problem, remains a critical challenge. Moreover, estimating model quality, also known as the quality assessment problem, is rarely addressed in protein docking. In this study, the two challenging problems in protein docking are regarded as relative and absolute scoring, respectively, and addressed in one physics-inspired deep learning framework. We represent protein and complex structures as intra- and inter-molecular residue contact graphs with atom-resolution node and edge features. And we propose a novel graph convolutional kernel that aggregates interacting nodes’ features through edges so that generalized interaction energies can be learned directly from 3D data. The resulting energy-based graph convolutional networks (EGCN) with multihead attention are trained to predict intra- and inter-molecular energies, binding affinities, and quality measures (interface RMSD) for encounter complexes. Compared to a state-of-the-art scoring function for model ranking, EGCN significantly improves ranking for a critical assessment of predicted interactions (CAPRI) test set involving homology docking; and is comparable or slightly better for Score_set, a CAPRI benchmark set generated by diverse community-wide docking protocols not known to training data. For Score_set quality assessment, EGCN shows about 27% improvement to our previous efforts. Directly learning from 3D structure data in graph representation, EGCN represents the first successful development of graph convolutional networks for protein docking.  相似文献   

15.
Fradera X  Knegtel RM  Mestres J 《Proteins》2000,40(4):623-636
A similarity-driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking. Two different algorithms have been implemented, namely, a similarity-penalized docking (SP-DOCK) and a similarity-guided docking (SG-DOCK). The basic idea is to maximally exploit the structural information about the ligand binding mode present in cases where ligand-bound protein structures are available, information that is usually ignored in standard docking procedures. SP-DOCK and SG-DOCK have been derived as modified versions of the program DOCK 4.0, where the similarity program MIMIC acts as a module for the calculation of similarity indices that correct docking energy scores at certain steps of the calculation. SP-DOCK applies similarity corrections to the set of ligand orientations at the end of the ligand incremental construction process, penalizing the docking energy and, thus, having only an effect on the relative ordering of the final solutions. SG-DOCK applies similarity corrections throughout the entire ligand incremental construction process, thus affecting not only the relative ordering of solutions but also actively guiding the ligand docking. The performance of SP-DOCK and SG-DOCK for binding mode assessment and molecular database screening is discussed. When applied to a set of 32 thrombin ligands for which crystal structures are available, SG-DOCK improves the average RMSD by ca. 1 A when compared with DOCK. When those 32 thrombin ligands are included into a set of 1,000 diverse molecules from the ACD, DIV, and WDI databases, SP-DOCK significantly improves the retrieval of thrombin ligands within the first 10% of each of the three databases with respect to DOCK, with minimal additional computational cost. In all cases, comparison of SP-DOCK and SG-DOCK results with those obtained by DOCK and MIMIC is performed.  相似文献   

16.
Stratmann D  Boelens R  Bonvin AM 《Proteins》2011,79(9):2662-2670
Despite recent advances in the modeling of protein-protein complexes by docking, additional information is often required to identify the best solutions. For this purpose, NMR data deliver valuable restraints that can be used in the sampling and/or the scoring stage, like in the data-driven docking approach HADDOCK that can make use of NMR chemical shift perturbation (CSP) data to define the binding site on each protein and drive the docking. We show here that a quantitative use of chemical shifts (CS) in the scoring stage can help to resolve ambiguities. A quantitative CS-RMSD score based on (1) H(α) ,(13) C(α) , and (15) N chemical shifts ranks the best solutions always at the top, as demonstrated on a small benchmark of four complexes. It is implemented in a new docking protocol, CS-HADDOCK, which combines CSP data as ambiguous interaction restraints in the sampling stage with the CS-RMSD score in the final scoring stage. This combination of qualitative and quantitative use of chemical shifts increases the reliability of data-driven docking for the structure determination of complexes from limited NMR data.  相似文献   

17.
Mass spectrometry combined with database searching has become the preferred method for identifying proteins in proteomics projects. Proteins are digested by one or several enzymes to obtain peptides, which are analyzed by mass spectrometry. We introduce a new family of scoring schemes, named OLAV, aimed at identifying peptides in a database from their tandem mass spectra. OLAV scoring schemes are based on signal detection theory, and exploit mass spectrometry information more extensively than previously existing schemes. We also introduce a new concept of structural matching that uses pattern detection methods to better separate true from false positives. We show the superiority of OLAV scoring schemes compared to MASCOT, a widely used identification program. We believe that this work introduces a new way of designing scoring schemes that are especially adapted to high-throughput projects such as GeneProt large-scale human plasma project, where it is impractical to check all identifications manually.  相似文献   

18.
Automated docking of ligands to antibodies: methods and applications   总被引:2,自引:0,他引:2  
Many approaches to studying protein-ligand interactions by computational docking are currently available. Given the structures of a protein and a ligand, the ultimate goal of all docking methods is to predict the structure of the resulting complex. This requires a suitable representation of molecular structures and properties, search algorithms to efficiently scan the configuration space for favorable interaction geometries, and accurate scoring functions to evaluate and rank the generated orientations. For many of the available methods, tests on experimentally known antibody-antigen or antibody-hapten complexes have appeared in the literature. In addition, some of them have been used in predictive studies on antibody-ligand interactions to provide structural insights where adequate experimental information is missing. The AutoDock program is presented as example of a method for flexibly docking ligands to antibodies. Applying parameters of the second-generation AMBER force field, three antibody-hapten complexes (AN02, DB3, NC6.8) are used as new test cases to analyze the ability of the method to reproduce experimental findings. The X-ray structures could be reconstituted and the corresponding solutions were ranked with best energy score in all cases. Docking to the free instead of the complexed NC6.8 structure indicated the limits of the rigid protein treatment, although fairly good guesses about the location of the binding site and the contact residues could still be obtained if conformational flexibility was allowed at least in the ligand.  相似文献   

19.
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号