共查询到20条相似文献,搜索用时 31 毫秒
1.
Metal ions are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Current tools for predicting metal-protein interactions are based on proteins crystallized with their metal ions present (holo forms). However, a majority of resolved structures are free of metal ions (apo forms). Moreover, metal binding is a dynamic process, often involving conformational rearrangement of the binding pocket. Thus, effective predictions need to be based on the structure of the apo state. Here, we report an approach that identifies transition metal-binding sites in apo forms with a resulting selectivity >95%. Applying the approach to apo forms in the Protein Data Bank and structural genomics initiative identifies a large number of previously unknown, putative metal-binding sites, and their amino acid residues, in some cases providing a first clue to the function of the protein. 相似文献
2.
Sodhi JS Bryson K McGuffin LJ Ward JJ Wernisch L Jones DT 《Journal of molecular biology》2004,342(1):307-320
The accurate prediction of the biochemical function of a protein is becoming increasingly important, given the unprecedented growth of both structural and sequence databanks. Consequently, computational methods are required to analyse such data in an automated manner to ensure genomes are annotated accurately. Protein structure prediction methods, for example, are capable of generating approximate structural models on a genome-wide scale. However, the detection of functionally important regions in such crude models, as well as structural genomics targets, remains an extremely important problem. The method described in the current study, MetSite, represents a fully automatic approach for the detection of metal-binding residue clusters applicable to protein models of moderate quality. The method involves using sequence profile information in combination with approximate structural data. Several neural network classifiers are shown to be able to distinguish metal sites from non-sites with a mean accuracy of 94.5%. The method was demonstrated to identify metal-binding sites correctly in LiveBench targets where no obvious metal-binding sequence motifs were detectable using InterPro. Accurate detection of metal sites was shown to be feasible for low-resolution predicted structures generated using mGenTHREADER where no side-chain information was available. High-scoring predictions were observed for a recently solved hypothetical protein from Haemophilus influenzae, indicating a putative metal-binding site. 相似文献
3.
Regions of rare conformation were located in 300 protein crystal structures representing seven major protein folds. A distance matrix algorithm was used to search rapidly for 9-residue fragments of rare backbone conformation using a comparison to a relational database of encoded fragments derived from the database of nonredundant structures. Rare fragments were found in 61% of the analyzed protein structures. Detailed analysis was performed for 78 proteins of different folds. The rare fragments were located near functional sites in 72% of the protein structures. The rare fragments often formed parts of ligand-binding sites (59%), protein-protein interfaces (8%), and domain-domain contacts (5%). Of the remaining structures, 5% had a high average B-factor or high local B-factors. Statistical analysis suggests that the association between ligands and rare regions does not occur by chance alone. The present study is likely to underestimate the number of functional sites, because not all analyzed protein structures contained a ligand. The results suggest that rapid searches for regions with rare local backbone conformations can assist in prediction of functional sites in novel proteins. 相似文献
4.
Identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems for protein structure and function studies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding protein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algorithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High performance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results suggest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information. 相似文献
5.
Protein function prediction using local 3D templates 总被引:8,自引:0,他引:8
The prediction of a protein's function from its 3D structure is becoming more and more important as the worldwide structural genomics initiatives gather pace and continue to solve 3D structures, many of which are of proteins of unknown function. Here, we present a methodology for predicting function from structure that shows great promise. It is based on 3D templates that are defined as specific 3D conformations of small numbers of residues. We use four types of template, covering enzyme active sites, ligand-binding residues, DNA-binding residues and reverse templates. The latter are templates generated from the target structure itself and scanned against a representative subset of all known protein structures. Together, the templates provide a fairly thorough coverage of the known structures and ensure that if there is a match to a known structure it is unlikely to be missed. A new scoring scheme provides a highly sensitive means of discriminating between true positive and false positive template matches. In all, the methodology provides a powerful new tool for function prediction to complement those already in use. 相似文献
6.
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions. 相似文献
7.
The predictability of catalytic and binding sites from apo structures is addressed for proteins that undergo significant conformational change upon binding. Theoretical microscopic titration curves (THEMATICS), an electrostatics-based method for the prediction of functional sites, is performed on a test set of 24 proteins with both apo and holo structures available. For 23 of these 24 proteins (96%), THEMATICS predicts the correct catalytic or binding site for both the apo and holo forms. For only one of the 24 proteins, THEMATICS makes the correct prediction for the holo structure but fails for the apo structure. The metrics used by THEMATICS to identify functional residues generally are larger in absolute value for the functional residues in the holo forms compared to the corresponding residues in the apo forms. However, even in the apo forms, these identifying metrics are still statistically significantly larger for functional residues than for residues not involved in catalysis or binding. This indicates that some of the unusual electrostatic properties of functional residues are preserved in the apo conformation. Evidence is presented that certain residues immediately surrounding the active catalytic and binding residues impart functionally important chemical and electrostatic properties to the active residues. At least parts of these microenvironments exist in the unbound conformations, such that THEMATICS is able to distinguish the functional residues even in the apo structures. 相似文献
8.
Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals. 相似文献
9.
The geometry of metal coordination by proteins is well understood, but the evolution of metal binding sites has been less studied. Here we present a study on a small number of well-documented structural calcium and zinc binding sites, concerning how the geometry diverges between relatives, how often nonrelatives converge towards the same structure, and how often these metal binding sites are lost in the course of evolution. Both calcium and zinc binding site structure is observed to be conserved; structural differences between those atoms directly involved in metal binding in related proteins are typically less than 0.5 A root mean square deviation, even in distant relatives. Structural templates representing these conserved calcium and zinc binding sites were used to search the Protein Data Bank for cases where unrelated proteins have converged upon the same residue selection and geometry for metal binding. This allowed us to identify six "archetypal" metal binding site structures: two archetypal zinc binding sites, both of which had independently evolved on a large number of occasions, and four diverse archetypal calcium binding sites, where each had evolved independently on only a handful of occasions. We found that it was common for distant relatives of metal-binding proteins to lack metal-binding capacity. This occurred for 13 of the 18 metal binding sites we studied, even though in some of these cases the original metal had been classified as "essential for protein folding." For most of the calcium binding sites studied (seven out of eleven cases), the lack of metal binding in relatives was due to point mutation of the metal-binding residues, whilst for zinc binding sites, lack of metal binding in relatives always involved more extensive changes, with loss of secondary structural elements or loops around the binding site. 相似文献
10.
Catalytic site structure is normally highly conserved between distantly related enzymes. As a consequence, templates representing catalytic sites have the potential to succeed at function prediction in cases where methods based on sequence or overall structure fail. There are many methods for searching protein structures for matches to structural templates, but few validated template libraries to use with these methods. We present a library of structural templates representing catalytic sites, based on information from the scientific literature. Furthermore, we analyse homologous template families to discover the diversity within families and the utility of templates for active site recognition. Templates representing the catalytic sites of homologous proteins mostly differ by less than 1A root mean square deviation, even when the sequence similarity between the two proteins is low. Within these sets of homologues there is usually no discernible relationship between catalytic site structure similarity and sequence similarity. Because of this structural conservation of catalytic sites, the templates can discriminate between matches to related proteins and random matches with over 85% sensitivity and predictive accuracy. Templates based on protein backbone positions are more discriminating than those based on side-chain atoms. These analyses show encouraging prospects for prediction of functional sites in structural genomics structures of unknown function, and will be of use in analyses of convergent evolution and exploring relationships between active site geometry and chemistry. The template library can be queried via a web server at and is available for download. 相似文献
11.
To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions. 相似文献
12.
PIER: protein interface recognition for structural proteomics 总被引:1,自引:0,他引:1
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects. 相似文献
13.
An innovative bioinformatic method has been designed and implemented to detect similar three-dimensional (3D) sites in proteins. This approach allows the comparison of protein structures or substructures and detects local spatial similarities: this method is completely independent from the amino acid sequence and from the backbone structure. In contrast to already existing tools, the basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed. The implementation of this heuristic constitutes a software named SuMo (Surfing the Molecules), which allows the dynamic definition of chemical groups, the selection of sites in the proteins, and the management and screening of databases. To show the relevance of this approach, we focused on two extreme examples illustrating convergent and divergent evolution. In two unrelated serine proteases, SuMo detects one common site, which corresponds to the catalytic triad. In the legume lectins family composed of >100 structures that share similar sequences and folds but may have lost their ability to bind a carbohydrate molecule, SuMo discriminates between functional and non-functional lectins with a selectivity of 96%. The time needed for searching a given site in a protein structure is typically 0.1 s on a PIII 800MHz/Linux computer; thus, in further studies, SuMo will be used to screen the PDB. 相似文献
14.
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly. 相似文献
15.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates. 相似文献
16.
A novel variation in electrophoretic phenotype is described for a mouse salivary androgen binding protein (Abp). Crosses show that the variation is inherited in an autosomal codominant manner and protein characterization studies show that the variant Abp differs in isoelectric point from the common form of the protein. Those observations suggest that the variation involves the structural gene for the mouse salivary Abp. The genetic studies also show that the electrophoretic mobility of the variant Abp can be influenced by the sex-limited saliva pattern (Ssp) gene. The Ssp
S
allele alters the electrophoretic mobility of Abp in males at puberty or in females which have received exogenous testosterone [Karn, R. C., Dlouhy, S. R., Springer, K. R., Hjorth, J. P., and Nielsen, J. T. (1982). Biochem Genet. 20:493]. This study shows that Abp and Ssp are distinct genes which are not closely linked and that Ssp
S
is trans active in F1 (Abp
a
/Abp
b
, Ssp
S
/Ssp
F
) males.SRD was supported in part by PHS General Medical Training Grant T32 GMO7468 and the Indiana University School of Medicine Research Program in Academic Medicine. RCK was supported in part by PHS Career Development Award 1 KO4 AMOO284. 相似文献
17.
The identification of metal-binding ligand residues in metalloproteins using nuclear magnetic resonance spectroscopy. 下载免费PDF全文
S. D. Scrofani P. E. Wright H. J. Dyson 《Protein science : a publication of the Protein Society》1998,7(11):2476-2479
The identification of metal-binding ligands in metalloproteins is an important step in gaining detailed information regarding the environment of the active site. Traditionally, techniques such as 13Cd-substitution for the active metal followed by isotope-filtered NMR techniques have been used to this end. However, for medium to high molecular weight proteins (>20 kDa), these experiments may not be beneficial due to extensive 1H spectral overlap. Here, we describe an alternative approach, where metal-binding ligands such as histidine and cysteine are specifically 15N backbone labeled, excess EDTA is added and changes to (1H-15N) HSQC spectra are followed. Under these conditions, the amide groups of all 15N labeled histidine and cysteine residues, which were either ligands or residues close to the active site, were identified unambiguously for metallo-beta-lactamase from Bacteroides fragilis. 相似文献
18.
Virtual drug screening using protein-ligand docking techniques is a time-consuming process, which requires high computational power for binding affinity calculation. There are millions of chemical compounds available for docking. Eliminating compounds that are unlikely to exhibit high binding affinity from the screening set should speed-up the virtual drug screening procedure. We performed docking of 6353 ligands against twenty-one protein X-ray crystal structures. The docked ligands were ranked according to their calculated binding affinities, from which the top five hundred and the bottom five hundred were selected. We found that the volume and number of rotatable bonds of the top five hundred docked ligands are similar to those found in the crystal structures and corresponded with the volume of the binding sites. In contrast, the bottom five hundred set contains ligands that are either too large to enter the binding site, or too small to bind with high specificity and affinity to the binding site. A pre-docking filter that takes into account shapes and volumes of the binding sites as well as ligand volumes and flexibilities can filter out low binding affinity ligands from the screening sets. Thus, the virtual drug screening procedure speed is increased. 相似文献
19.
Phosphorylation is a crucial step in many cellular processes, ranging from metabolic reactions involved in energy transformation to signaling cascades. In many instances, protein domains specifically recognize the phosphogroup. Knowledge of the binding site provides insights into the interaction, and it can also be exploited for therapeutic purposes. Previous studies have shown that proteins interacting with phosphogroups are highly heterogeneous, and no single property can be used to reliably identify the binding site. Here we present an energy‐based computational procedure that exploits the protein three‐dimensional structure to identify binding sites involved in the recognition of phosphogroups. The procedure is validated on three datasets containing more than 200 proteins binding to ATP, phosphopeptides, and phosphosugars. A comparison against other three generic binding site identification approaches shows higher accuracy values for our method, with a correct identification rate in the 80–90% range for the top three predicted sites. Addition of conservation information further improves the performance. The method presented here can be used as a first step in functional annotation or to guide mutagenesis experiments and further studies such as molecular docking. Proteins 2012;. © 2012 Wiley Periodicals, Inc. 相似文献
20.
Understanding and characterizing the biochemical and evolutionary information within the wealth of protein sequence and structural data, particularly at functionally important sites, is very important. A comprehensive analysis of physico-chemical properties and evolutionary conservation patterns at the molecular and biological function level is expected to yield important clues for identifying similar sites in as-yet uncharacterized proteins. We present a library of protein functional templates (PFTs) designed to represent the compositional and evolutionary conservation patterns of functional sites at the molecular and biological function level. Subsequently we developed LIMACS (LInear MAtching of Conservation Scores), a software tool that uses the template library for the prediction of functionally important sites in a multiple sequence alignment, transferring the molecular function annotation from the most-similar functional site in the template library to a predicted site. 相似文献