首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Nucleoside triphosphate (NTP) ligands are of high biological importance and are essential for all life forms. A pre‐requisite for them to participate in diverse biochemical processes is their recognition by diverse proteins. It is thus of great interest to understand the basis for such recognition in different proteins. Towards this, we have used a structural bioinformatics approach and analyze structures of 4677 NTP complexes available in Protein Data Bank (PDB). Binding sites were extracted and compared exhaustively using PocketMatch, a sensitive in‐house site comparison algorithm, which resulted in grouping the entire dataset into 27 site‐types. Each of these site‐types represent a structural motif comprised of two or more residue conservations, derived using another in‐house tool for superposing binding sites, PocketAlign. The 27 site‐types could be grouped further into 9 super‐types by considering partial similarities in the sites, which indicated that the individual site‐types comprise different combinations of one or more site features. A scan across PDB using the 27 structural motifs determined the motifs to be specific to NTP binding sites, and a computational alanine mutagenesis indicated that residues identified to be highly conserved in the motifs are also most contributing to binding. Alternate orientations of the ligand in several site‐types were observed and rationalized, indicating the possibility of some residues serving as anchors for NTP recognition. The presence of multiple site‐types and the grouping of multiple folds into each site‐type is strongly suggestive of convergent evolution. Knowledge of determinants obtained from this study will be useful for detecting function in unknown proteins. Proteins 2017; 85:1699–1712. © 2017 Wiley Periodicals, Inc.  相似文献   

2.
3.
The recognition of cryptic small-molecular binding sites in protein structures is important for understanding off-target side effects and for recognizing potential new indications for existing drugs. Current methods focus on the geometry and detailed chemical interactions within putative binding pockets, but may not recognize distant similarities where dynamics or modified interactions allow one ligand to bind apparently divergent binding pockets. In this paper, we introduce an algorithm that seeks similar microenvironments within two binding sites, and assesses overall binding site similarity by the presence of multiple shared microenvironments. The method has relatively weak geometric requirements (to allow for conformational change or dynamics in both the ligand and the pocket) and uses multiple biophysical and biochemical measures to characterize the microenvironments (to allow for diverse modes of ligand binding). We term the algorithm PocketFEATURE, since it focuses on pockets using the FEATURE system for characterizing microenvironments. We validate PocketFEATURE first by showing that it can better discriminate sites that bind similar ligands from those that do not, and by showing that we can recognize FAD-binding sites on a proteome scale with Area Under the Curve (AUC) of 92%. We then apply PocketFEATURE to evolutionarily distant kinases, for which the method recognizes several proven distant relationships, and predicts unexpected shared ligand binding. Using experimental data from ChEMBL and Ambit, we show that at high significance level, 40 kinase pairs are predicted to share ligands. Some of these pairs offer new opportunities for inhibiting two proteins in a single pathway.  相似文献   

4.
Lack of crystal structure data of folate binding proteins has left so many questions unanswered (for example, important residues in active site, binding domain, important amino acid residues involved in interactions between ligand and receptor). With sequence alignment and PROSITE motif identification, we attempted to answer evolutionarily significant residues that are of functional importance for ligand binding and that form catalytic sites. We have analyzed 46 different FRs and FBP sequences of various organisms obtained from Genbank. Multiple sequence alignment identified 44 highly conserved identical amino acid residues with 10 cysteine residues and 12 motifs including ECSPNLGPW (which might help in the structural stability of FR).  相似文献   

5.
Brakoulias A  Jackson RM 《Proteins》2004,56(2):250-260
A method is described for the rapid comparison of protein binding sites using geometric matching to detect similar three-dimensional structure. The geometric matching detects common atomic features through identification of the maximum common sub-graph or clique. These features are not necessarily evident from sequence or from global structural similarity giving additional insight into molecular recognition not evident from current sequence or structural classification schemes. Here we use the method to produce an all-against-all comparison of phosphate binding sites in a number of different nucleotide phosphate-binding proteins. The similarity search is combined with clustering of similar sites to allow a preliminary structural classification. Clustering by site similarity produces a classification of binding sites for the 476 representative local environments producing ten main clusters representing half of the representative environments. The similarities make sense in terms of both structural and functional classification schemes. The ten main clusters represent a very limited number of unique structural binding motifs for phosphate. These are the structural P-loop, di-nucleotide binding motif [FAD/NAD(P)-binding and Rossman-like fold] and FAD-binding motif. Similar classification schemes for nucleotide binding proteins have also been arrived at independently by others using different methods.  相似文献   

6.
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.  相似文献   

7.
Drug repositioning applies established drugs to new disease indications with increasing success. A pre-requisite for drug repurposing is drug promiscuity (polypharmacology) – a drug’s ability to bind to several targets. There is a long standing debate on the reasons for drug promiscuity. Based on large compound screens, hydrophobicity and molecular weight have been suggested as key reasons. However, the results are sometimes contradictory and leave space for further analysis. Protein structures offer a structural dimension to explain promiscuity: Can a drug bind multiple targets because the drug is flexible or because the targets are structurally similar or even share similar binding sites? We present a systematic study of drug promiscuity based on structural data of PDB target proteins with a set of 164 promiscuous drugs. We show that there is no correlation between the degree of promiscuity and ligand properties such as hydrophobicity or molecular weight but a weak correlation to conformational flexibility. However, we do find a correlation between promiscuity and structural similarity as well as binding site similarity of protein targets. In particular, 71% of the drugs have at least two targets with similar binding sites. In order to overcome issues in detection of remotely similar binding sites, we employed a score for binding site similarity: LigandRMSD measures the similarity of the aligned ligands and uncovers remote local similarities in proteins. It can be applied to arbitrary structural binding site alignments. Three representative examples, namely the anti-cancer drug methotrexate, the natural product quercetin and the anti-diabetic drug acarbose are discussed in detail. Our findings suggest that global structural and binding site similarity play a more important role to explain the observed drug promiscuity in the PDB than physicochemical drug properties like hydrophobicity or molecular weight. Additionally, we find ligand flexibility to have a minor influence.  相似文献   

8.
We applied an automatic and unsupervised system to a nearly complete database of mammalian odor receptor genes. The generated motifs and gene classification were subjected to extensive and systematic downstream analysis to obtain biological insights. Two major results from this analysis were: (1) a map of sequence motifs that may correlate with function and (2) the corresponding receptor classes in which members of each class are likely to share specific functions. We have discovered motifs that have been implicated in structural integrity and posttranslational modification, as well as motifs very likely to be directly involved in ligand binding. We further propose a combinatorial molecular hypothesis, based on unique combinations of the observed motifs, that provides a foundation for understanding the generation of a large number of ligand binding sites.  相似文献   

9.
Knowing the ligand or peptide binding site in proteins is highly important to guide drug discovery, but experimental elucidation of the binding site is difficult. Therefore, various computational approaches have been developed to identify potential binding sites in protein structures. However, protein and ligand flexibility are often neglected in these methods due to efficiency considerations despite the recognition that protein–ligand interactions can be strongly affected by mutual structural adaptations. This is particularly true if the binding site is unknown, as the screening will typically be performed based on an unbound protein structure. Herein we present DynaBiS, a hierarchical sampling algorithm to identify flexible binding sites for a target ligand with explicit consideration of protein and ligand flexibility, inspired by our previously presented flexible docking algorithm DynaDock. DynaBiS applies soft-core potentials between the ligand and the protein, thereby allowing a certain protein–ligand overlap resulting in efficient sampling of conformational adaptation effects. We evaluated DynaBiS and other commonly used binding site identification algorithms against a diverse evaluation set consisting of 26 proteins featuring peptide as well as small ligand binding sites. We show that DynaBiS outperforms the other evaluated methods for the identification of protein binding sites for large and highly flexible ligands such as peptides, both with a holo or apo structure used as input.  相似文献   

10.
Recognition templates encapsulate the structural and energetic features for the specific recognition of a given ligand by a protein active site. These templates identify the major interactions used for specific recognition and may be used to find specific binding sites in proteins of unknown function. We present a grid-based method for deriving recognition templates for adenylate groups from a set of diverse nucleotide-binding proteins. The templates reveal the basis of specific binding of adenylate, including tight shape complementarity, specific hydrogen bonds, and underscoring the importance of a key steric contact for excluding guanylate from adenylate-specific sites. We demonstrate the utility of recognition templates in identifying specific adenylate-binding sites in a diverse set of dinucleotide-binding proteins.  相似文献   

11.
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.  相似文献   

12.
Sujatha MS  Balaji PV 《Proteins》2004,55(1):44-65
Galactose-binding proteins characterize an important subgroup of sugar-binding proteins that are involved in a variety of biological processes. Structural studies have shown that the Gal-specific proteins encompass a diverse range of primary and tertiary structures. The binding sites for galactose also seem to vary in different protein-galactose complexes. No common binding site features that are shared by the Gal-specific proteins to achieve ligand specificity are so far known. With the assumption that common recognition principles will exist for common substrate recognition, the present study was undertaken to identify and characterize any unique galactose-binding site signature by analyzing the three-dimensional (3D) structures of 18 protein-galactose complexes. These proteins belong to 7 nonhomologous families; thus, there is no sequence or structural similarity across the families. Within each family, the binding site residues and their relative distances were well conserved, but there were no similarities across families. A novel, yet simple, approach was adopted to characterize the binding site residues by representing their relative spatial dispositions in polar coordinates. A combination of the deduced geometrical features with the structural characteristics, such as solvent accessibility and secondary structure type, furnished a potential galactose-binding site signature. The signature was evaluated by incorporation into the program COTRAN to search for potential galactose-binding sites in proteins that share the same fold as the known galactose-binding proteins. COTRAN is able to detect galactose-binding sites with a very high specificity and sensitivity. The deduced galactose-binding site signature is strongly validated and can be used to search for galactose-binding sites in proteins. PROSITE-type signature sequences have also been inferred for galectin and C-type animal lectin-like fold families of Gal-binding proteins.  相似文献   

13.
Targeting non‐native‐ligand binding sites for potential investigative and therapeutic applications is an attractive strategy in proteins that share common native ligands, as in Rab1 protein. Rab1 is a subfamily member of Rab proteins, which are members of Ras GTPase superfamily. All Ras GTPase superfamily members bind to native ligands GTP and GDP, that switch on and off the proteins, respectively. Rab1 is physiologically essential for autophagy and transport between endoplasmic reticulum and Golgi apparatus. Pathologically, Rab1 is implicated in human cancers, a neurodegenerative disease, cardiomyopathy, and bacteria‐caused infectious diseases. We have performed structural analyses on Rab1 protein using a unique ensemble of clustering methods, including multi‐step principal component analysis, non‐negative matrix factorization, and independent component analysis, to better identify representative Rab1 proteins than the application of a single clustering method alone does. We then used the identified representative Rab1 structures, resolved in multiple ligand states, to map their known and novel binding sites. We report here at least a novel binding site on Rab1, involving Rab1‐specific residues that could be further explored for the rational design and development of investigative probes and/or therapeutic small molecules against the Rab1 protein. Proteins 2017; 85:859–871. © 2016 Wiley Periodicals, Inc.  相似文献   

14.
Due to Ca2+‐dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet‐lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet‐lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large‐margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM‐binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome‐wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif‐based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub‐sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels .  相似文献   

15.
MOTIVATION: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. RESULTS: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5'-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects. AVAILABILITY: MASIA WEB site: http://www.scsb.utmb.edu/masia/masia.html SUPPLEMENTARY INFORMATION: The dendrogram of 42 APE sequences used to derive motifs is available on http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html  相似文献   

16.
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the C(α) atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA(+), FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.  相似文献   

17.
18.
Zhang Z  Grigorov MG 《Proteins》2006,62(2):470-478
An increasing attention has been dedicated to the characterization of complex networks within the protein world. This work is reporting how we uncovered networked structures that reflected the structural similarities among protein binding sites. First, a 211 binding sites dataset has been compiled by removing the redundant proteins in the Protein Ligand Database (PLD) (http://www-mitchell.ch.cam.ac.uk/pld/). Using a clique detection algorithm we have performed all-against-all binding site comparisons among the 211 available ones. Within the set of nodes representing each binding site an edge was added whenever a pair of binding sites had a similarity higher than a threshold value. The generated similarity networks revealed that many nodes had few links and only few were highly connected, but due to the limited data available it was not possible to definitively prove a scale-free architecture. Within the same dataset, the binding site similarity networks were compared with the networks of sequence and fold similarity networks. In the protein world, indications were found that structure is better conserved than sequence, but on its own, sequence was better conserved than the subset of functional residues forming the binding site. Because a binding site is strongly linked with protein function, the identification of protein binding site similarity networks could accelerate the functional annotation of newly identified genes. In view of this we have discussed several potential applications of binding site similarity networks, such as the construction of novel binding site classification databases, as well as the implications for protein molecular design in general and computational chemogenomics in particular.  相似文献   

19.
H L Monaco  G Zanotti 《Biopolymers》1992,32(4):457-465
We review our work on bovine and human retinol-binding protein (RBP), bovine beta lactoglobulin (BLG), and bovine odorant-binding protein (OBP). These three proteins share a sequence similarity high enough to justify the proposal that their three-dimensional structure ought to be quite similar, and they also share the function of similar or even identical hydrophobic ligand binding, although with a very different degree of specificity. Thus they constitute an ideal system to exhaustively explore the question of three-dimensional structure prediction from sequence similarity and the related question of binding site prediction for similar ligands. We have used x-ray diffraction techniques on single crystals of human and bovine RBP, bovine milk BLG, and bovine nasal mucosa OBP to investigate this problem. The results of these crystallographic studies indicate that to the level of resolution so far attained, the three-dimensional structure of these three proteins is reasonably predicted from the sequence similarity. The fold is the same and structural differences are rather subtle. Finally, we present experimental evidence that the binding sites of RBP, BLG, and OBP are in different regions of the molecules. Thus, it appears that although sequence alignment has correctly predicted the protein fold, it has incorrectly predicted the hydrophobic ligand-binding sites.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号