首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant “valid” ligands from “invalid” small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.  相似文献   

2.
《Journal of molecular biology》2019,431(13):2423-2433
The goal of Binding MOAD is to provide users with a data set focused on high-quality x-ray crystal structures that have been solved with biologically relevant ligands bound. Where available, experimental binding affinities (Ka, Kd, Ki, IC50) are provided from the primary literature of the crystal structure. The database has been updated regularly since 2005, and this most recent update has added nearly 7000 new structures (growth of 21%). MOAD currently contains 32,747 structures, composed of 9117 protein families and 16,044 unique ligands. The data are freely available on www.BindingMOAD.org. This paper outlines updates to the data in Binding MOAD as well as improvements made to both the website and its contents. The NGL viewer has been added to improve visualization of the ligands and protein structures. MarvinJS has been implemented, over the outdated MarvinView, to work with JChem for small molecule searching in the database. To add tools for predicting polypharmacology, we have added information about sequence, binding-site, and ligand similarity between entries in the database. A main premise behind polypharmacology is that similar binding sites will bind similar ligands. The large amount of protein–ligand information available in Binding MOAD allows us to compute pairwise ligand and binding-site similarities. Lists of similar ligands and similar binding sites have been added to allow users to identify potential polypharmacology pairs. To show the utility of the polypharmacology data, we detail a few examples from Binding MOAD of drug repurposing targets with their respective similarities.  相似文献   

3.
Jain T  Jayaram B 《FEBS letters》2005,579(29):6659-6666
We report here a computationally fast protocol for predicting binding affinities of non-metallo protein-ligand complexes. The protocol builds in an all atom energy based empirical scoring function comprising electrostatics, van der Waals, hydrophobicity and loss of conformational entropy of protein side chains upon ligand binding. The method is designed to ensure transferability across diverse systems and has been validated on a heterogenous dataset of 161 complexes consisting of 55 unique protein targets. The scoring function trained on a dataset of 61 complexes yielded a correlation of r=0.92 for the predicted binding free energies against the experimental binding affinities. Model validation and parameter analysis studies ensure the predictive ability of the scoring function. When tested on the remaining 100 protein-ligand complexes a correlation of r=0.92 was recovered. The high correlation obtained underscores the potential applicability of the methodology in drug design endeavors. The scoring function has been web enabled at as binding affinity prediction of protein-ligand (BAPPL) server.  相似文献   

4.
Protein-ligand docking is a computational method to identify the binding mode of a ligand and a target protein, and predict the corresponding binding affinity using a scoring function. This method has great value in drug design. After decades of development, scoring functions nowadays typically can identify the true binding mode, but the prediction of binding affinity still remains a major problem. Here we present CScore, a data-driven scoring function using a modified Cerebellar Model Articulation Controller (CMAC) learning architecture, for accurate binding affinity prediction. The performance of CScore in terms of correlation between predicted and experimental binding affinities is benchmarked under different validation approaches. CScore achieves a prediction with R = 0.7668 and RMSE = 1.4540 when tested on an independent dataset. To the best of our knowledge, this result outperforms other scoring functions tested on the same dataset. The performance of CScore varies on different clusters under the leave-cluster-out validation approach, but still achieves competitive result. Lastly, the target-specified CScore achieves an even better result with R = 0.8237 and RMSE = 1.0872, trained on a much smaller but more relevant dataset for each target. The large dataset of protein-ligand complexes structural information and advances of machine learning techniques enable the data-driven approach in binding affinity prediction. CScore is capable of accurate binding affinity prediction. It is also shown that CScore will perform better if sufficient and relevant data is presented. As there is growth of publicly available structural data, further improvement of this scoring scheme can be expected.  相似文献   

5.
We propose a self-consistent approach to analyze knowledge-based atom-atom potentials used to calculate protein-ligand binding energies. Ligands complexed to actual protein structures were first built using the SMoG growth procedure (DeWitte & Shakhnovich, 1996) with a chosen input potential. These model protein-ligand complexes were used to construct databases from which knowledge-based protein-ligand potentials were derived. We then tested several different modifications to such potentials and evaluated their performance on their ability to reconstruct the input potential using the statistical information available from a database composed of model complexes. Our data indicate that the most significant improvement resulted from properly accounting for the following key issues when estimating the reference state: (1) the presence of significant nonenergetic effects that influence the contact frequencies and (2) the presence of correlations in contact patterns due to chemical structure. The most successful procedure was applied to derive an atom-atom potential for real protein-ligand complexes. Despite the simplicity of the model (pairwise contact potential with a single interaction distance), the derived binding free energies showed a statistically significant correlation (approximately 0.65) with experimental binding scores for a diverse set of complexes.  相似文献   

6.
Most protein chains interact with only one ligand but a small number of protein chains can bind several ligands, and many examples are available in the protein-ligand complex database of PDB. Among these proteins, some show preferences for the ligands or types of ligands they bind; however, so far we have only poor understanding of what determines protein-ligand binding and its specificity. Here we investigate the structural and functional properties of proteins in protein-ligand complexes. Analysis of the protein-ligand complex dataset from the PDB structure database reveals that proteins with more interactions have more disordered contact residues. Those proteins containing few disordered contact residues that bind multiple ligands have a tendency to consist of several domains. Analysis of physicochemical properties of hub contact residues binding multiple ligands indicates that they are enriched for hydrophilic, charged, polar and His-Asp catalytic triad residues. Finally, in order to differentiate proteins binding different classes of ligands, we mapped the three most prominent classes of ligands onto different superfamily domains. Our results demonstrate that contact residue disorder and ordered multiple domains are complementary factors that play a crucial role in determining ligand binding specificity and promiscuity.  相似文献   

7.
Cardiolipins (CL) represent unique phospholipids of bacteria and eukaryotic mitochondria with four acyl chains and two phosphate groups that have been implicated in numerous functions from energy metabolism to apoptosis. Many proteins are known to interact with CL, and several cocrystal structures of protein-CL complexes exist. In this work, we describe the collection of the first systematic and, to the best of our knowledge, the comprehensive gold standard data set of all known CL-binding proteins. There are 62 proteins in this data set, 21 of which have nonredundant crystal structures with bound CL molecules available. Using binding patch analysis of amino acid frequencies, secondary structures and loop supersecondary structures considering phosphate and acyl chain binding regions together and separately, we gained a detailed understanding of the general structural and dynamic features involved in CL binding to proteins. Exhaustive docking of CL to all known structures of proteins experimentally shown to interact with CL demonstrated the validity of the docking approach, and provides a rich source of information for experimentalists who may wish to validate predictions.  相似文献   

8.
9.

Background

Models that are capable of reliably predicting binding affinities for protein-ligand complexes play an important role the field of structure-guided drug design.

Methods

Here, we begin by applying the computational geometry technique of Delaunay tessellation to each set of atomic coordinates for over 1400 diverse macromolecular structures, for the purpose of deriving a four-body statistical potential that serves as a topological scoring function. Next, we identify a second, independent set of three hundred protein-ligand complexes, having both high-resolution structures and known dissociation constants. Two-thirds of these complexes are randomly selected to train a predictive model of binding affinity as follows: two tessellations are generated in each case, one for the entire complex and another strictly for the isolated protein without its bound ligand, and a topological score is computed for each tessellation with the four-body potential. Predicted protein-ligand binding affinity is then based on an empirically derived linear function of the difference between both topological scores, one that appropriately scales the value of this difference.

Results

A comparison between experimental and calculated binding affinity values over the two hundred complexes reveals a Pearson's correlation coefficient of r = 0.79 with a standard error of SE = 1.98 kcal/mol. To validate the method, we similarly generated two tessellations for each of the remaining protein-ligand complexes, computed their topological scores and the difference between the two scores for each complex, and applied the previously derived linear transformation of this topological score difference to predict binding affinities. For these one hundred complexes, we again observe a correlation of r = 0.79 (SE = 1.93 kcal/mol) between known and calculated binding affinities. Applying our model to an independent test set of high-resolution structures for three hundred diverse enzyme-inhibitor complexes, each with an experimentally known inhibition constant, also yields a correlation of r = 0.79 (SE = 2.39 kcal/mol) between experimental and calculated binding energies.

Conclusions

Lastly, we generate predictions with our model on a diverse test set of one hundred protein-ligand complexes previously used to benchmark 15 related methods, and our correlation of r = 0.66 between the calculated and experimental binding energies for this dataset exceeds those of the other approaches. Compared with these related prediction methods, our approach stands out based on salient features that include the reliability of our model, combined with the rapidity of the generated predictions, which are less than one second for an average sized complex.
  相似文献   

10.
Harris R  Olson AJ  Goodsell DS 《Proteins》2008,70(4):1506-1517
We present a method, termed AutoLigand, for the prediction of ligand-binding sites in proteins of known structure. The method searches the space surrounding the protein and finds the contiguous envelope with the specified volume of atoms, which has the largest possible interaction energy with the protein. It uses a full atomic representation, with atom types for carbon, hydrogen, oxygen, nitrogen and sulfur (and others, if desired), and is designed to minimize the need for artificial geometry. Testing on a set of 187 diverse protein-ligand complexes has shown that the method is successful in predicting the location and approximate volume of the binding site in 73% of cases. Additional testing was performed on a set of 96 protein-ligand complexes with crystallographic structures of apo and holo forms, and AutoLigand was able to predict the binding site in 80% of the apo structures.  相似文献   

11.
The modulation of protein-protein interactions (PPIs) by small drug-like molecules is a relatively new area of research and has opened up new opportunities in drug discovery. However, the progress made in this area is limited to a handful of known cases of small molecules that target specific diseases. With the increasing availability of protein structure complexes, it is highly important to devise strategies exploiting homologous structure space on a large scale for discovering putative PPIs that could be attractive drug targets. Here, we propose a scheme that allows performing large-scale screening of all protein complexes and finding putative small-molecule and/or peptide binding sites overlapping with protein-protein binding sites (so-called "multibinding sites"). We find more than 600 nonredundant proteins from 60 protein families with multibinding sites. Moreover, we show that the multibinding sites are mostly observed in transient complexes, largely overlap with the binding hotspots and are more evolutionarily conserved than other interface sites. We investigate possible mechanisms of how small molecules may modulate protein-protein binding and discuss examples of new candidates for drug design.  相似文献   

12.
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins.  相似文献   

13.
Structural genomics (SG) has significantly increased the number of novel protein structures of targets with medical relevance. In the protein kinase area, SG has contributed >50% of all novel kinases structures during the past three years and determined more than 30 novel catalytic domain structures. Many of the released structures are inhibitor complexes and a number of them have identified new inhibitor binding modes and scaffolds. In addition, generated reagents, assays, and inhibitor screening data provide a diversity of chemogenomic data that can be utilized for early drug development. Here we discuss the currently available structural data for the kinase family considering novel structures as well as inhibitor complexes. Our analysis revealed that the structural coverage of many kinases families is still rather poor, and inhibitor complexes with diverse inhibitors are only available for a few kinases. However, we anticipate that with the current rate of structure determination and high throughput technologies developed by SG programs these gaps will be closed soon. In addition, the generated reagents will put SG initiatives in a unique position providing data beyond protein structure determination by identifying chemical probes, determining their binding modes and target specificity.  相似文献   

14.
Analysis of the spatial arrangement of protein and water atoms that form polar interactions with ribose has been performed for a structurally non-redundant dataset of ATP, ADP and FAD-protein complexes. The 26 ligand-protein structures were separated into two groups corresponding to the most populated furanose ring conformations (N and S-domains). Four conserved positions were found for S-domain protein-ligand complexes and five for N-domain complexes. Multiple protein folds and secondary structural elements were represented at a single conserved position. The following novel points were revealed: (i) Two complementary positions sometimes combine to describe a putative atomic spatial location for a specific conserved binding spot. (ii) More than one third of the interactions scored were water-mediated. Thus, conserved spatial positions rich in water atoms are a significant feature of ribose-protein complexes.  相似文献   

15.
Selection of representative protein data sets.   总被引:37,自引:17,他引:20       下载免费PDF全文
The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.  相似文献   

16.
A wide range of regulatory processes in the cell are mediated by flexible peptides that fold upon binding to globular proteins. Computational efforts to model these interactions are hindered by the large number of rotatable bonds in flexible peptides relative to typical ligand molecules, and the fact that different peptides assume different backbone conformations within the same binding site. In this study, we present Rosetta FlexPepDock, a novel tool for refining coarse peptide–protein models that allows significant changes in both peptide backbone and side chains. We obtain high resolution models, often of sub‐angstrom backbone quality, over an extensive and general benchmark that is based on a large nonredundant dataset of 89 peptide–protein interactions. Importantly, side chains of known binding motifs are modeled particularly well, typically with atomic accuracy. In addition, our protocol has improved modeling quality for the important application of cross docking to PDZ domains. We anticipate that the ability to create high resolution models for a wide range of peptide–protein complexes will have significant impact on structure‐based functional characterization, controlled manipulation of peptide interactions, and on peptide‐based drug design. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

17.
Identifying elements of protein structures that create differences in protein-ligand binding specificity is an essential method for explaining the molecular mechanisms underlying preferential binding. In some cases, influential mechanisms can be visually identified by experts in structural biology, but subtler mechanisms, whose significance may only be apparent from the analysis of many structures, are harder to find. To assist this process, we present a geometric algorithm and two statistical models for identifying significant structural differences in protein-ligand binding cavities. We demonstrate these methods in an analysis of sequentially nonredundant structural representatives of the canonical serine proteases and the enolase superfamily. Here, we observed that statistically significant structural variations identified experimentally established determinants of specificity. We also observed that an analysis of individual regions inside cavities can reveal areas where small differences in shape can correspond to differences in specificity.  相似文献   

18.
Pei J  Wang Q  Liu Z  Li Q  Yang K  Lai L 《Proteins》2006,62(4):934-946
We have developed a new docking method, Pose-Sensitive Inclined (PSI)-DOCK, for flexible ligand docking. An improved SCORE function has been developed and used in PSI-DOCK for binding free energy evaluation. The improved SCORE function was able to reproduce the absolute binding free energies of a training set of 200 protein-ligand complexes with a correlation coefficient of 0.788 and a standard error of 8.13 kJ/mol. For ligand binding pose exploration, a unique searching strategy was designed in PSI-DOCK. In the first step, a tabu-enhanced genetic algorithm with a rapid shape-complementary scoring function is used to roughly explore and store potential binding poses of the ligand. Then, these predicted binding poses are optimized and compete against each other by using a genetic algorithm with the accurate SCORE function to determine the binding pose with the lowest docking energy. The PSI-DOCK 1.0 program is highly efficient in identifying the experimental binding pose. For a test dataset of 194 complexes, PSI-DOCK 1.0 achieved a 67% success rate (RMSD < 2.0 A) for only one run and a 74% success rate for 10 runs. PSI-DOCK can also predict the docking binding free energy with high accuracy. For a test set of 64 complexes, the correlation between the experimentally observed binding free energies and the docking binding free energies for 64 complexes is r = 0.777 with a standard deviation of 7.96 kJ/mol. Moreover, compared with other docking methods, PSI-DOCK 1.0 is extremely easy to use and requires minimum docking preparations. There is no requirement for the users to add hydrogen atoms to proteins because all protein hydrogen atoms and the flexibility of the terminal protein atoms are intrinsically taken into account in PSI-DOCK. There is also no requirement for the users to calculate partial atomic charges because PSI-DOCK does not calculate an electrostatic energy term. These features are not only convenient for the users but also help to avoid the influence of different preparation methods.  相似文献   

19.

Background  

A relevant problem in drug design is the comparison and recognition of protein binding sites. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for the interaction with a specific ligand. Several matching strategies have been designed for the recognition of protein-ligand binding sites and of protein-protein interfaces but the problem cannot be considered solved.  相似文献   

20.
NMR spectroscopy in structure-based drug design   总被引:2,自引:0,他引:2  
NMR methods for the study of motion in proteins continue to improve, and a number of studies of protein-ligand complexes relevant to drug design have been reported over the past year, for example, studies of fatty-acid-binding protein and SH2 and SH3 domains. These studies have begun to give a picture of the structural dynamics of protein-ligand complexes and to relate the changes in dynamics on ligand binding to the origins of specificity. NMR is also valuable in locating binding sites, both qualitatively from changes in chemical shift and more precisely from distances measured from relaxation effects. The conformation of the bound ligand can provide useful information for drug design, and over the past year improvements in methods have made it easier to obtain quantitative information from transferred nuclear Overhauser effect experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号