首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Deng H  Chen G  Yang W  Yang JJ 《Proteins》2006,64(1):34-42
Identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems for protein structure and function studies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding protein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algorithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High performance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results suggest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information.  相似文献   

2.
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.  相似文献   

3.
MOTIVATION: Binding site identification is a classical problem that is important for a range of applications, including the structure-based prediction of function, the elucidation of functional relationships among proteins, protein engineering and drug design. We describe an accurate method of binding site identification, namely FTSite. This method is based on experimental evidence that ligand binding sites also bind small organic molecules of various shapes and polarity. The FTSite algorithm does not rely on any evolutionary or statistical information, but achieves near experimental accuracy: it is capable of identifying the binding sites in over 94% of apo proteins from established test sets that have been used to evaluate many other binding site prediction methods. AVAILABILITY: FTSite is freely available as a web-based server at http://ftsite.bu.edu.  相似文献   

4.
SuperStar is an empirical method for identifying interaction sites in proteins, based entirely on experimental information about non-bonded interactions occurring in small-molecule crystal structures, taken from the IsoStar database. We describe recent modifications and additions to SuperStar, validating the results on a test set of 122 X-ray structures of protein-ligand complexes. In this validation, propensity maps are generated for all the binding sites of these proteins, using four different probes: a charged NH(+)(3) nitrogen atom, a carbonyl oxygen atom, a hydroxyl oxygen atom and a methyl carbon atom. Next, the maps are compared with the experimentally observed positions of ligand atoms of these types. A peak-searching algorithm is introduced that highlights potential interaction hot spots. For the three hydrogen-bonding probes - NH(+)(3) nitrogen atom, carbonyl oxygen atom and hydroxyl oxygen atom - the average distance from the ligand atom to the nearest SuperStar peak is 1.0-1.2 A (0.8-1.0 A for solvent-inaccessible ligand atoms). For the methyl carbon atom probe, this distance is about 1.5 A, probably because interactions to methyl groups are much less directional.The most important addition to SuperStar is the enabling of propensity maps around metal centres - Ca(2+), Mg(2+) and Zn(2+) - in protein binding sites. The results are validated on a test set of 24 protein-ligand complexes that have a metal ion in their binding site. Coordination geometries are derived automatically, using only the protein atoms that coordinate to the metal ion. The correct coordination geometry is derived in approximately 75 % of the cases. If the derived geometry is assumed during the SuperStar calculation, the average distance from a ligand atom coordinating to the metal ion to the nearest peak in the propensity map for an oxygen probe is 0.87(7) A. If the correct coordination geometry is imposed, this distance reduces to 0.59(7)A. This indicates that the SuperStar predictions around metal-binding sites are at least as good as those around other protein groups. Using clustering techniques, a non-redundant set of probes is selected from the set of probes available in the IsoStar database. The performance in SuperStar of all these probes is tested on the test set of protein-ligand complexes. With the exception of the "ether oxygen" probe and the "any NH(+)" probe, all new probes perform as well as the four probes introduced first.  相似文献   

5.
The similarity comparison of binding sites based on amino acid between different proteins can facilitate protein function identification. However, Binding site usually consists of several crucial amino acids which are frequently dispersed among different regions of a protein and consequently make the comparison of binding sites difficult. In this study, we introduce a new method, named as chemical and geometric similarity of binding site (CGS-BSite), to compute the ligand binding site similarity based on discrete amino acids with maximum-weight bipartite matching algorithm. The principle of computing the similarity is to find a Euclidean Transformation which makes the similar amino acids approximate to each other in a geometry space, and vice versa. CGS-BSite permits site and ligand flexibilities, provides a stable prediction performance on the flexible ligand binding sites. Binding site prediction on three test datasets with CGS-BSite method has similar performance to Patch-Surfer method but outperforms other five tested methods, reaching to 0.80, 0.71 and 0.85 based on the area under the receiver operating characteristic curve, respectively. It performs a marginally better than Patch-Surfer on the binding sites with small volume and higher hydrophobicity, and presents good robustness to the variance of the volume and hydrophobicity of ligand binding sites. Overall, our method provides an alternative approach to compute the ligand binding site similarity and predict potential special ligand binding sites from the existing ligand targets based on the target similarity.  相似文献   

6.
Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR-Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well-defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR-Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source .  相似文献   

7.
Due to Ca2+‐dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet‐lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet‐lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large‐margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM‐binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome‐wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif‐based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub‐sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels .  相似文献   

8.
Harris R  Olson AJ  Goodsell DS 《Proteins》2008,70(4):1506-1517
We present a method, termed AutoLigand, for the prediction of ligand-binding sites in proteins of known structure. The method searches the space surrounding the protein and finds the contiguous envelope with the specified volume of atoms, which has the largest possible interaction energy with the protein. It uses a full atomic representation, with atom types for carbon, hydrogen, oxygen, nitrogen and sulfur (and others, if desired), and is designed to minimize the need for artificial geometry. Testing on a set of 187 diverse protein-ligand complexes has shown that the method is successful in predicting the location and approximate volume of the binding site in 73% of cases. Additional testing was performed on a set of 96 protein-ligand complexes with crystallographic structures of apo and holo forms, and AutoLigand was able to predict the binding site in 80% of the apo structures.  相似文献   

9.
The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrostatic surface of functional-site in proteins). Using this system, functional sites similar to those of phosphoenoylpyruvate carboxy kinase were detected in several mononucleotide-binding proteins, which have different folds. We also applied our method to a hypothetical protein, MJ0226 from Methanococcus jannaschii, and detected the mononucleotide binding site from the similarity to other proteins having different folds.  相似文献   

10.
11.
Summary One of the key ingredients in drug discovery is the derivation of conceptual templates called pharmacophores. A pharmacophore model characterizes the physicochemical properties common to all active molecules, called ligands, bound to a particular protein receptor, together with their relative spatial arrangement. Motivated by this important application, we develop a Bayesian hierarchical model for the derivation of pharmacophore templates from multiple configurations of point sets, partially labeled by the atom type of each point. The model is implemented through a multistage template hunting algorithm that produces a series of templates that capture the geometrical relationship of atoms matched across multiple configurations. Chemical information is incorporated by distinguishing between atoms of different elements, whereby different elements are less likely to be matched than atoms of the same element. We illustrate our method through examples of deriving templates from sets of ligands that all bind structurally related protein active sites and show that the model is able to retrieve the key pharmacophore features in two test cases.  相似文献   

12.
Here, a protein atom-ligand fragment interaction library is described. The library is based on experimentally solved structures of protein-ligand and protein-protein complexes deposited in the Protein Data Bank (PDB) and it is able to characterize binding sites given a ligand structure suitable for a protein. A set of 30 ligand fragment types were defined to include three or more atoms in order to unambiguously define a frame of reference for interactions of ligand atoms with their receptor proteins. Interactions between ligand fragments and 24 classes of protein target atoms plus a water oxygen atom were collected and segregated according to type. The spatial distributions of individual fragment - target atom pairs were visually inspected in order to obtain rough-grained constraints on the interaction volumes. Data fulfilling these constraints were given as input to an iterative expectation-maximization algorithm that produces as output maximum likelihood estimates of the parameters of the finite Gaussian mixture models. Concepts of statistical pattern recognition and the resulting mixture model densities are used (i) to predict the detailed interactions between Chlorella virus DNA ligase and the adenine ring of its ligand and (ii) to evaluate the "error" in prediction for both the training and validation sets of protein-ligand interaction found in the PDB. These analyses demonstrate that this approach can successfully narrow down the possibilities for both the interacting protein atom type and its location relative to a ligand fragment.  相似文献   

13.
In drug discovery process, improvement of ADME/Tox properties of lead compounds including metabolic stability is critically important. Cytochrome P450 (CYP) is one of the major metabolizing enzymes and the prediction of sites of metabolism (SOM) on the given lead compounds is key information to modify the compounds to be more stable against metabolism. There are two factors essentially important in SOM prediction. First is accessibility of each substrate atom to the oxygenated Fe atom of heme in a CYP protein, and the other is the oxidative reactivity of each substrate atom. To predict accessibility of substrate atoms to the heme iron, conventional protein-rigid docking simulations have been applied. However, the docking simulations without consideration of protein flexibility often lead to incorrect answers in the case of very flexible proteins such as CYP3A4. In this study, we demonstrated an approach utilizing molecular dynamics (MD) simulation for SOM prediction in which multiple MD runs were executed using different initial structures. We applied this strategy to CYP3A4 and carbamazepine (CBZ) complex. Through 10 ns MD simulations started from five different CYP3A4-CBZ complex models, our approach correctly predicted SOM observed in experiments. The experimentally known epoxidized sites of CBZ by CYP3A4 were successfully predicted as the most accessible sites to the heme iron that was judged from a numerical analysis of calculated ΔG(binding) and the frequency of appearance. In contrast, the predictions using protein-rigid docking methods hardly provided the correct SOM due to protein flexibility or inaccuracy of the scoring functions. Our strategy using MD simulation with multiple initial structures will be one of the reliable methods for SOM prediction.  相似文献   

14.
Protein‐protein interactions control a large range of biological processes and their identification is essential to understand the underlying biological mechanisms. To complement experimental approaches, in silico methods are available to investigate protein‐protein interactions. Cross‐docking methods, in particular, can be used to predict protein binding sites. However, proteins can interact with numerous partners and can present multiple binding sites on their surface, which may alter the binding site prediction quality. We evaluate the binding site predictions obtained using complete cross‐docking simulations of 358 proteins with 2 different scoring schemes accounting for multiple binding sites. Despite overall good binding site prediction performances, 68 cases were still associated with very low prediction quality, presenting individual area under the specificity‐sensitivity ROC curve (AUC) values below the random AUC threshold of 0.5, since cross‐docking calculations can lead to the identification of alternate protein binding sites (that are different from the reference experimental sites). For the large majority of these proteins, we show that the predicted alternate binding sites correspond to interaction sites with hidden partners, that is, partners not included in the original cross‐docking dataset. Among those new partners, we find proteins, but also nucleic acid molecules. Finally, for proteins with multiple binding sites on their surface, we investigated the structural determinants associated with the binding sites the most targeted by the docking partners.  相似文献   

15.
Allosteric regulation involves conformational transitions or fluctuations between a few closely related states, caused by the binding of effector molecules. We introduce a quantity called binding leverage that measures the ability of a binding site to couple to the intrinsic motions of a protein. We use Monte Carlo simulations to generate potential binding sites and either normal modes or pairs of crystal structures to describe relevant motions. We analyze single catalytic domains and multimeric allosteric enzymes with complex regulation. For the majority of the analyzed proteins, we find that both catalytic and allosteric sites have high binding leverage. Furthermore, our analysis of the catabolite activator protein, which is allosteric without conformational change, shows that its regulation involves other types of motion than those modulated at sites with high binding leverage. Our results point to the importance of incorporating dynamic information when predicting functional sites. Because it is possible to calculate binding leverage from a single crystal structure it can be used for characterizing proteins of unknown function and predicting latent allosteric sites in any protein, with implications for drug design.  相似文献   

16.
Phage display enables the presentation of a large number of peptides on the surface of phage particles. Such libraries can be tested for binding to target molecules of interest by means of affinity selection. Here we present SiteLight, a novel computational tool for binding site prediction using phage display libraries. SiteLight is an algorithm that maps the 1D peptide library onto a three-dimensional (3D) protein surface. It is applicable to complexes made up of a protein Template and any type of molecule termed Target. Given the three-dimensional structure of a Template and a collection of sequences derived from biopanning against the Target, the Template interaction site with the Target is predicted. We have created a large diverse data set for assessing the ability of SiteLight to correctly predict binding sites. SiteLight predictive mapping enables discrimination between the binding and nonbinding parts of the surface. This prediction can be used to effectively reduce the surface by 75% without excluding the binding site. In 63% of the cases we have tested, there is at least one binding site prediction that overlaps the interface by at least 50%. These results suggest the applicability of phage display libraries for automated binding site prediction on three-dimensional structures. For most effective binding site prediction we propose using a random phage display library twice, to scan both binding partners of a given complex. The derived peptides are mapped to the other binding partner (now used as a Template). Here, the surface of each partner is reduced by 75%, focusing their relative positions with respect to each other significantly. Such information can be utilized to improve docking algorithms and scoring functions.  相似文献   

17.
This is the first of four papers that begin to explore the possibility of automated site-directed drug design. A general outline is given of the logical steps involved in approaching the problem. The starting point is the process of knowledge acquisition about the site. An algorithm is described here for the construction of a map of hydrogen-bonding regions at protein surfaces directly from the Brookhaven Protein Data Bank coordinates. Hydrogen-bonding atoms are located, intramolecular bonds are searched for, hydrogen-bonding atoms at the surface are found and hydrogen-bonding regions are computed at the accessible surface. A grid is placed within each region discovered and the probability of hydrogen bonding at each grid point is computed. The output of the program is a map of hydrogen-bonding regions displayed within a user-defined window. This information can be used as part of a knowledge base for the automatic construction of novel ligands to fit specified binding sites.  相似文献   

18.
Cellular functions are regulated by molecules that interact with proteins and alter their activities. To enable such control, protein activity, and therefore protein conformational distributions, must be susceptible to alteration by molecular interactions at functional sites. Here we investigate whether interactions at functional sites cause a large change in the protein conformational distribution. We apply a computational method, called dynamics perturbation analysis (DPA), to identify sites at which interactions have a large allosteric potential D(x), which is the Kullback-Leibler divergence between protein conformational distributions with and without an interaction. In DPA, a protein is decorated with surface points that interact with neighboring protein atoms, and D(x) is calculated for each of the points in a coarse-grained model of protein vibrations. We use DPA to examine hundreds of protein structures from a standard small-molecule docking test set, and find that ligand-binding sites have elevated values of D(x): for 95% of proteins, the probability of randomly obtaining values as high as those in the binding site is 10(-3) or smaller. We then use DPA to develop a computational method to predict functional sites in proteins, and find that the method accurately predicts ligand-binding-site residues for proteins in the test set. The performance of this method compares favorably with that of a cleft analysis method. The results confirm that interactions at small-molecule binding sites cause a large change in the protein conformational distribution, and motivate using DPA for large-scale prediction of functional sites in proteins. They also suggest that natural selection favors proteins whose activities are capable of being regulated by molecular interactions.  相似文献   

19.
The binding sites of Mn2+, Co2+, and Gd3+ have been determined in triclinic lysozyme at pH 4.5 to 4.6. Mn2+ and Co2+ bind a site approximately 2.5 A from 1 of the oxygen atoms of the Glu-35 chain. The occupancy of the Mn2+ site is 0.22, corresponding to 1 bound ion for each 4.6 protein molecules. The occupancy of the Co2+ site is much lower, about 0.048. Gd3+ appears to be bound at two sites, the main one 2.5 A from an oxygen atom of the Glu-35 side chain, the other 3.1 A from an oxygen atom of the Asp-52 chain. The occupancy of both Gd3+ sites is low, 0.036 and 0.016, the latter being so low that the presence of the ion at this site is in doubt. The binding site of Mn2+ in the di(N-acetylglucosamine)-lysozyme complex has also been determined. It does not differ significantly from the Mn2+ binding site in the native protein, but the occupancy is lower, 0.16.  相似文献   

20.
An automated computer-based method for mapping of protein surface cavities was developed and applied to a set of 176 metalloproteinases containing zinc cations in their active sites. With very few exceptions, the cavity search routine detected the active site among the five largest cavities and produced reasonable active site surfaces. Cavities were described by means of solvent-accessible surface patches. For a given protein, these patches were calculated in three steps: (i) definition of cavity atoms forming surface cavities by a grid-based technique; (ii) generation of solvent accessible surfaces; (iii) assignment of an accessibility value and a generalized atom type to each surface point. Topological correlation vectors were generated from the set of surface points forming the cavities, and projected onto the plane by a self-organizing network. The resulting map of 865 enzyme cavities displays clusters of active sites that are clearly separated from the other cavities. It is demonstrated that both fully automated recognition of active sites, and prediction of enzyme class can be performed for novel protein structures at high accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号