首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.  相似文献   

2.
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.  相似文献   

3.
L Han  YJ Zhang  J Song  MS Liu  Z Zhang 《PloS one》2012,7(7):e41370
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.  相似文献   

4.
Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well-annotated set of protein structures, we found that top-ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residue's structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.  相似文献   

5.
The catalytic or functionally important residues of a protein are known to exist in evolutionarily constrained regions. However, the patterns of residue conservation alone are sometimes not very informative, depending on the homologous sequences available for a given query protein. Here, we present an integrated method to locate the catalytic residues in an enzyme from its sequence and structure. Mutations of functional residues usually decrease the activity, but concurrently often increase stability. Also, catalytic residues tend to occupy partially buried sites in holes or clefts on the molecular surface. After confirming these general tendencies by carrying out statistical analyses on 49 representative enzymes, these data together with amino acid conservation were evaluated. This novel method exhibited better sensitivity in the prediction accuracy than traditional methods that consider only the residue conservation. We applied it to some so-called "hypothetical" proteins, with known structures but undefined functions. The relationships among the catalytic, conserved, and destabilizing residues in enzymatic proteins are discussed.  相似文献   

6.
A long-standing goal in biology is to establish the link between function, structure, and dynamics of proteins. Considering that protein function at the molecular level is understood by the ability of proteins to bind to other molecules, the limited structural data of proteins in association with other bio-molecules represents a major hurdle to understanding protein function at the structural level. Recent reports show that protein function can be linked to protein structure and dynamics through network centrality analysis, suggesting that the structures of proteins bound to natural ligands may be inferred computationally. In the present work, a new method is described to discriminate protein conformations relevant to the specific recognition of a ligand. The method relies on a scoring system that matches critical residues with central residues in different structures of a given protein. Central residues are the most traversed residues with the same frequency in networks derived from protein structures. We tested our method in a set of 24 different proteins and more than 260,000 structures of these in the absence of a ligand or bound to it. To illustrate the usefulness of our method in the study of the structure/dynamics/function relationship of proteins, we analyzed mutants of the yeast TATA-binding protein with impaired DNA binding. Our results indicate that critical residues for an interaction are preferentially found as central residues of protein structures in complex with a ligand. Thus, our scoring system effectively distinguishes protein conformations relevant to the function of interest.  相似文献   

7.
The novel method allowing identification of protein structure elements responsible for catalytic activity manifestation is proposed. Structural organization of various hydrolases was studied using the ANIS (ANalysis of Informational Structure) method. ANIS allows to reveal a hierarchy of the ELements of Information Structure (ELIS) using protein amino acid sequence. The ELIS corresponds to the variable length sites with an increased density of structural information. The amino acid residues forming the enzyme catalytic site were shown to belong to the different top-ranking ELIS located in the contact area of the corresponding spatial structure clusters. In the protein spatial structure catalytic sites are located in the area of contact between fragments of polypeptide chain (structural blocs) allocation to the different top-ranking ELIS. According to our results we concluded that structural blocks corresponding to top-ranking ELIS are crucial for protein functioning. Such regions are structurally independent, and their determinate mobility relative to each other is vital for an efficient enzymatic reaction to occur.  相似文献   

8.
A method is presented that positions polar hydrogen atoms in protein structures by optimizing the total hydrogen bond energy. For this goal, an empirical hydrogen bond force field was derived from small molecule crystal structures. Bifurcated hydrogen bonds are taken into account. The procedure also predicts ionization states of His, Asp, and Glu residues. During optimization, sidechain conformations of His, Gln, and Asn residues are allowed to change their last χ angle by 180° to compensate for crystallographic misassignments. Crystal structure symmetry is taken into account where appropriate. The results can have significant implications for molecular dynamics simulations, protein engineering, and docking studies. The largest impact, however, is in protein structure verification: over 85% of protein structures tested can be improved by using our procedure. Proteins 26:363–376 © 1996 Wiley-Liss, Inc.  相似文献   

9.
The human APOBEC3G (A3G) protein is a cellular polynucleotide cytidine deaminase that acts as a host restriction factor of retroviruses, including HIV-1 and various transposable elements. Recently, three NMR and two crystal structures of the catalytic deaminase domain of A3G have been reported, but these are in disagreement over the conformation of a terminal β-strand, β2, as well as the identification of a putative DNA binding site. We here report molecular dynamics simulations with all of the solved A3G catalytic domain structures, taking into account solubility enhancing mutations that were introduced during derivation of three out of the five structures. In the course of these simulations, we observed a general trend towards increased definition of the β2 strand for those structures that have a distorted starting conformation of β2. Solvent density maps around the protein as calculated from MD simulations indicated that this distortion is dependent on preferential hydration of residues within the β2 strand. We also demonstrate that the identification of a pre-defined DNA binding site is prevented by the inherent flexibility of loops that determine access to the deaminase catalytic core. We discuss the implications of our analyses for the as yet unresolved structure of the full-length A3G protein and its biological functions with regard to hypermutation of DNA.  相似文献   

10.
L N Gastinel  C Cambillau    Y Bourne 《The EMBO journal》1999,18(13):3546-3557
beta1,4-galactosyltransferase T1 (beta4Gal-T1, EC 2.4.1.90/38), a Golgi resident membrane-bound enzyme, transfers galactose from uridine diphosphogalactose to the terminal beta-N-acetylglucosamine residues forming the poly-N-acetyllactosamine core structures present in glycoproteins and glycosphingolipids. In mammals, beta4Gal-T1 binds to alpha-lactalbumin, a protein that is structurally homologous to lyzozyme, to produce lactose. beta4Gal-T1 is a member of a large family of homologous beta4galactosyltransferases that use different types of glycoproteins and glycolipids as substrates. Here we solved and refined the crystal structures of recombinant bovine beta4Gal-T1 to 2.4 A resolution in the presence and absence of the substrate uridine diphosphogalactose. The crystal structure of the bovine substrate-free beta4Gal-T1 catalytic domain showed a new fold consisting of a single conical domain with a large open pocket at its base. In the substrate-bound complex, the pocket encompassed residues interacting with uridine diphosphogalactose. The structure of the complex contained clear regions of electron density for the uridine diphosphate portion of the substrate, where its beta-phosphate group was stabilized by hydrogen-bonding contacts with conserved residues including the Asp252ValAsp254 motif. These results help the interpretation of engineered beta4Gal-T1 point mutations. They suggest a mechanism possibly involved in galactose transfer and enable identification of the critical amino acids involved in alpha-lactalbumin interactions.  相似文献   

11.
The automatic identification of catalytic residues still remains an important challenge in structural bioinformatics. Sequence-based methods are good alternatives when the query shares a high percentage of identity with a well-annotated enzyme. However, when the homology is not apparent, which occurs with many structures from the structural genome initiative, structural information should be exploited. A local structural comparison is preferred to a global structural comparison when predicting functional residues. CMASA is a recently proposed method for predicting catalytic residues based on a local structure comparison. The method achieves high accuracy and a high value for the Matthews correlation coefficient. However, point substitutions or a lack of relevant data strongly affect the performance of the method. In the present study, we propose a simple extension to the CMASA method to overcome this difficulty. Extensive computational experiments are shown as proof of concept instances, as well as for a few real cases. The results show that the extension performs well when the catalytic site contains mutated residues or when some residues are missing. The proposed modification could correctly predict the catalytic residues of a mutant thymidylate synthase, 1EVF. It also successfully predicted the catalytic residues for 3HRC despite the lack of information for a relevant side chain atom in the PDB file.  相似文献   

12.
The crystal structures of alpha-galactosidase from the mesophilic fungus Trichoderma reesei and its complex with the competitive inhibitor, beta-d-galactose, have been determined at 1.54 A and 2.0 A resolution, respectively. The alpha-galactosidase structure was solved by the quick cryo-soaking method using a single Cs derivative. The refined crystallographic model of the alpha-galactosidase consists of two domains, an N-terminal catalytic domain of the (beta/alpha)8 barrel topology and a C-terminal domain which is formed by an antiparallel beta-structure. The protein contains four N-glycosylation sites located in the catalytic domain. Some of the oligosaccharides were found to participate in inter-domain contacts. The galactose molecule binds to the active site pocket located in the center of the barrel of the catalytic domain. Analysis of the alpha-galactosidase- galactose complex reveals the residues of the active site and offers a structural basis for identification of the putative mechanism of the enzymatic reaction. The structure of the alpha-galactosidase closely resembles those of the glycoside hydrolase family 27. The conservation of two catalytic Asp residues, identified for this family, is consistent with a double-displacement reaction mechanism for the alpha-galactosidase. Modeling of possible substrates into the active site reveals specific hydrogen bonds and hydrophobic interactions that could explain peculiarities of the enzyme kinetics.  相似文献   

13.
Abstract

The novel method allowing identification of protein structure elements responsible for catalytic activity manifestation is proposed. Structural organization of various hydrolases was studied using the ANIS (ANalysis of Informational Structure) method. ANIS allows to reveal a hierarchy of the ELements of Information Structure (ELIS) using protein amino acid sequence. The ELIS corresponds to the variable length sites with an increased density of structural information. The amino acid residues forming the enzyme catalytic site were shown to belong to the different top-ranking ELIS located in the contact area of the corresponding spatial structure clusters. In the protein spatial structure catalytic sites are located in the area of contact between fragments of polypeptide chain (structural blocs) allocation to the different top-ranking ELIS. According to our results we concluded that structural blocks corresponding to top-ranking ELIS are crucial for protein functioning. Such regions are structurally independent, and their determinate mobility relative to each other is vital for an efficient enzymatic reaction to occur.  相似文献   

14.
Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins.  相似文献   

15.
Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects.  相似文献   

16.
We have applied the calculation of mechanical properties to a dataset of almost 100 enzymes to determine the extent to which catalytic residues have distinct properties. Specifically, we have calculated force constants describing the ease of moving any given amino acid residue with respect to the other residues in the protein. The results show that catalytic residues are invariably associated with high force constants. Choosing an appropriate cutoff enables the detection of roughly 80% of catalytic residues with only 25% of false positives. It is shown that neither multidomain structures, nor the presence or absence of bound ligands hinder successful detections. It is however noted that active sites near the protein surface are more difficult to detect and that non-catalytic, but structurally key residues may also exhibit high force constants.  相似文献   

17.
Protein Kinase-Like Non-Kinases (PKLNKs), commonly known as “pseudokinases”, are homologous to eukaryotic Ser/Thr/Tyr protein kinases (PKs) but lack the crucial aspartate residue in the catalytic loop, indispensable for phosphotransferase activity. Therefore, they are predicted to be “catalytically inactive” enzyme homologs. Analysis of protein-kinase like sequences from Arabidopsis thaliana led to the identification of more than 120 pseudokinases lacking catalytic aspartate, majority of which are closely related to the plant-specific receptor-like kinase family. These pseudokinases engage in different biological processes, enabled by their diverse domain architectures and specific subcellular localizations. Structural comparison of pseudokinases with active and inactive conformations of canonical PKs, belonging to both plant and animal origin, revealed unique structural differences. The currently available crystal structures of pseudokinases show that the loop topologically equivalent to activation segment of PKs adopts a distinct-folded conformation, packing against the pseudoenzyme core, in contrast to the extended and inhibitory geometries observed for active and inactive states, respectively, of catalytic PKs. Salt-bridge between ATP-binding Lys and DFG-Asp as well as hydrophobic interactions between the conserved nonpolar residue C-terminal to the equivalent DFG motif and nonpolar residues in C-helix mediate such a conformation in pseudokinases. This results in enhanced solvent accessibility of the pseudocatalytic loop in pseudokinases that can possibly serve as an interacting surface while associating with other proteins. Specifically, our analysis identified several residues that may be involved in pseudokinase regulation and hints at the repurposing of pseudocatalytic residues to achieve mechanistic control over noncatalytic functions of pseudoenzymes.  相似文献   

18.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

19.
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/~meshi/functionPrediction.  相似文献   

20.
Pairs of helices in transmembrane (TM) proteins are often tightly packed. We present a scoring function and a computational methodology for predicting the tertiary fold of a pair of alpha-helices such that its chances of being tightly packed are maximized. Since the number of TM protein structures solved to date is small, it seems unlikely that a reliable scoring function derived statistically from the known set of TM protein structures will be available in the near future. We therefore constructed a scoring function based on the qualitative insights gained in the past two decades from the solved structures of TM and soluble proteins. In brief, we reward the formation of contacts between small amino acid residues such as Gly, Cys, and Ser, that are known to promote dimerization of helices, and penalize the burial of large amino acid residues such as Arg and Trp. As a case study, we show that our method predicts the native structure of the TM homodimer glycophorin A (GpA) to be, in essence, at the global score optimum. In addition, by correlating our results with empirical point mutations on this homodimer, we demonstrate that our method can be a helpful adjunct to mutation analysis. We present a data set of canonical alpha-helices from the solved structures of TM proteins and provide a set of programs for analyzing it (http://ashtoret.tau.ac.il/~sarel). From this data set we derived 11 helix pairs, and conducted searches around their native states as a further test of our method. Approximately 73% of our predictions showed a reasonable fit (RMS deviation <2A) with the native structures compared to the success rate of 8% expected by chance. The search method we employ is less effective for helix pairs that are connected via short loops (<20 amino acid residues), indicating that short loops may play an important role in determining the conformation of alpha-helices in TM proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号