首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a new method for predicting protein–ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.  相似文献   

2.
MOTIVATION: An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. RESULTS: We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. AVAILABILITY: Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID  相似文献   

3.
Semi-supervised protein classification using cluster kernels   总被引:2,自引:0,他引:2  
MOTIVATION: Building an accurate protein classification system depends critically upon choosing a good representation of the input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data--examples with known 3D structures, organized into structural classes--whereas in practice, unlabeled data are far more plentiful. RESULTS: In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods and at the same time achieving far greater computational efficiency. AVAILABILITY: Source code is available at www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot. The Spider matlab package is available at www.kyb.tuebingen.mpg.de/bs/people/spider. SUPPLEMENTARY INFORMATION: www.kyb.tuebingen.mpg.de/bs/people/weston/semiprot.  相似文献   

4.
Chen YC  Wu CY  Lim C 《Proteins》2007,67(3):671-680
Binding of polyanionic DNA depends on the cluster of electropositive atoms in the binding site of a DNA-binding protein. Such a cluster of electropositive protein atoms would be electrostatically unfavorable without stabilizing interactions from the respective electronegative DNA atoms and would likely be evolutionary conserved due to its critical biological role. Consequently, our strategy for predicting DNA-binding residues is based on detecting a cluster of evolutionary conserved surface residues that are electrostatically stabilized upon mutation to negatively charged Asp/Glu residues. The method requires as input the protein structure and sufficient sequence homologs to define each residue's relative conservation, and it yields as output experimentally testable residues that are predicted to bind DNA. By incorporating characteristic DNA-binding site features (i.e., electrostatic strain and amino acid conservation), the new method yields a prediction accuracy of 83%, which is much higher than methods based on only electrostatic strain (57%) or conservation alone (50%). It is also less sensitive to protein conformational changes upon DNA binding than methods that mainly depend on the 3D protein structure.  相似文献   

5.
Identification of catalytic residues can help unveil interesting attributes of enzyme function for various therapeutic and industrial applications. Based on their biochemical roles, the number of catalytic residues and sequence lengths of enzymes vary. This article describes a prediction approach (PINGU) for such a scenario. It uses models trained using physicochemical properties and evolutionary information of 650 non-redundant enzymes (2136 catalytic residues) in a support vector machines architecture. Independent testing on 200 non-redundant enzymes (683 catalytic residues) in predefined prediction settings, i.e., with non-catalytic per catalytic residue ranging from 1 to 30, suggested that the prediction approach was highly sensitive and specific, i.e., 80% or above, over the incremental challenges. To learn more about the discriminatory power of PINGU in real scenarios, where the prediction challenge is variable and susceptible to high false positives, the best model from independent testing was used on 60 diverse enzymes. Results suggested that PINGU was able to identify most catalytic residues and non-catalytic residues properly with 80% or above accuracy, sensitivity and specificity. The effect of false positives on precision was addressed in this study by application of predicted ligand-binding residue information as a post-processing filter. An overall improvement of 20% in F-measure and 0.138 in Correlation Coefficient with 16% enhanced precision could be achieved. On account of its encouraging performance, PINGU is hoped to have eventual applications in boosting enzyme engineering and novel drug discovery.  相似文献   

6.
Peptides containing fewer than 50 amino acids show little ordered structure under physiological conditions. In this paper it is shown that in the receptor environment, secondary structure could be induced in small peptides that involves 87% of all the amino acid residues. The statistical methods of Chou and Fasman are used to predict the conformation of 41 peptide hormones or neuromodulators in the proteinaceous environment of the receptor, and four distinct conformational groupings are elucidated. beta-bend, beta-structure and alpha-helical conformation are possible for distinct groups of linear peptides, and disulfide bridge containing peptides show a common beta-bend beta-structure conformation at the receptor. In the predicted receptor conformation, the peptides show hydrophobic and hydrophilic domains that must reflect the distribution of corresponding regions in the ligand-binding site of the receptor. The predicted ligand conformation should allow a more rational approach to interpreting existing structure activity studies and the design of new analogs of pharmacological interest.  相似文献   

7.
The catalytic or functionally important residues of a protein are known to exist in evolutionarily constrained regions. However, the patterns of residue conservation alone are sometimes not very informative, depending on the homologous sequences available for a given query protein. Here, we present an integrated method to locate the catalytic residues in an enzyme from its sequence and structure. Mutations of functional residues usually decrease the activity, but concurrently often increase stability. Also, catalytic residues tend to occupy partially buried sites in holes or clefts on the molecular surface. After confirming these general tendencies by carrying out statistical analyses on 49 representative enzymes, these data together with amino acid conservation were evaluated. This novel method exhibited better sensitivity in the prediction accuracy than traditional methods that consider only the residue conservation. We applied it to some so-called "hypothetical" proteins, with known structures but undefined functions. The relationships among the catalytic, conserved, and destabilizing residues in enzymatic proteins are discussed.  相似文献   

8.
The type 1 sigma receptor (sigmaR1) has been shown to participate in a variety of functions in the central nervous system. To identify the specific regions of the brain that are involved in sigmaR1 function, we analyzed the expression pattern of the receptor mRNA in the mouse brain by in situ hybridization. SigmaR1 mRNA was detectable primarily in the cerebral cortex, hippocampus, and Purkinje cells of cerebellum. To identify the critical anionic amino acid residues in the ligand-binding domain of sigmaR1, we employed two different approaches: chemical modification of anionic amino acid residues and site-directed mutagenesis. Chemical modification of anionic amino acids in sigmaR1 with 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide reduced the ligand-binding activity markedly. Since it is known that a splice variant of this receptor which lacks exon 3 does not have the ability to bind sigma ligands, the ligand-binding domain with its critical anionic amino acid residues is likely to be present in or around the region coded by exon 3. Therefore, each of the anionic amino acids in this region was mutated individually and the influence of each mutation on ligand binding was assessed. These studies have identified two anionic amino acids, D126 and E172, that are obligatory for ligand binding. Even though the ligand-binding function was abolished by these two mutations, the expression of these mutants was normal at the protein level. These results show that sigmaR1 is expressed at high levels in specific areas of the brain that are involved in memory, emotion and motor functions. The results also provide important information on the chemical nature of the ligand-binding site of sigmaR1 that may be of use in the design of sigmaR1-specific ligands with potential for modulation of sigmaR1-related brain functions.  相似文献   

9.
Ligand-gated ion channels of the Cys loop family are receptors for small amine-containing neurotransmitters. Charged amino acids are strongly conserved in the ligand-binding domain of these receptor proteins. To investigate the role of particular residues in ligand binding of the serotonin 5-HT3AS receptor (5-HT3R), glutamate amino acid residues at three different positions, Glu97, Glu224, and Glu235, in the extracellular N-terminal domain were substituted with aspartate and glutamine using site-directed mutagenesis. Wild type and mutant receptor proteins were expressed in HEK293 cells and analyzed by electrophysiology, radioligand binding, fluorescence measurements, and immunochemistry. A structural model of the ligand-binding domain of the 5-HT3R based on the acetylcholine binding protein revealed the position of the mutated amino acids. Our results demonstrate that mutations of Glu97, distant from the ligand-binding site, had little effect on the receptor, whereas mutations Glu224 and Glu235, close to the predicted binding site, are indeed important for ligand binding. Mutations E224Q, E224D, and E235Q decreased EC50 and Kd values 5-20-fold, whereas E235D was functionally expressed at a low level and had a more than 100-fold increased EC50 value. Comparison of the fluorescence properties of a fluorescein-labeled antagonist upon binding to wild type 5-HT3R and E235Q, allowed us to localize Glu235 within a distance of 1 nm around the ligand-binding site, as proposed by our model.  相似文献   

10.
SUMMARY: There are many resources that contain information about binary interactions between proteins. However, protein interactions are defined by only a subset of residues in any protein. We have implemented a web resource that allows the investigation of protein interactions in the Protein Data Bank structures at the level of Pfam domains and amino acid residues. This detailed knowledge relies on the fact that there are a large number of multidomain proteins and protein complexes being deposited in the structure databases. The resource called iPfam is hosted within the Pfam UK website. Most resources focus on the interactions between proteins; iPfam includes these as well as interactions between domains in a single protein. AVAILABILITY: iPfam is available on the Web for browsing at http://www.sanger.ac.uk/Software/Pfam/iPfam/; the source-data for iPfam is freely available in relational tables via the ftp site ftp://ftp.sanger.ac.uk/pub/databases/Pfam/database_files/.  相似文献   

11.
This study describes the further extension of the resonant recognition model for the analysis and prediction of protein--protein and protein--DNA structure/function dependencies. The model is based on the significant correlation between spectra of numerical presentations of the amino acid or nucleotide sequences of proteins and their coded biological activity. According to this physico-mathematical method, it is possible to define amino acids in the sequence which are predicted to be the most critical for protein function. Using sperm whale myoglobin, human hemoglobin and hen egg white lysozyme as model protein examples, sets of predicted amino acids, or so-called 'hot spots', have been identified within the tertiary structure. It was found for each protein that the predicted 'hot spots', which are distributed along the primary sequence, are spatially grouped in a dome-like arrangement over the active site. The identified amino acids did not correspond to the amino acid residues which are involved in the chemical reaction site of these proteins. It is thus proposed that the resonant recognition model helps to identify amino acid residues which are important for the creation of the molecular structure around the catalytic active site and also the associated physical field conditions required for biorecognition, docking of the specific substrate and full biological activity.  相似文献   

12.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

13.
The gene, spsB, encoding a type I signal peptidase has been cloned from the gram-positive eubacterium Staphylococcus aureus. The gene encodes a protein of 191 amino acid residues with a calculated molecular mass of 21,692 Da. Comparison of the protein sequence with those of known type I signal peptidases indicates conservation of amino acid residues known to be important or essential for catalytic activity. The enzyme has been expressed to high levels in Escherichia coli and has been demonstrated to possess enzymatic activity against E. coli preproteins in vivo. Experiments whereby the spsB gene was transferred to a plasmid that is temperature sensitive for replication indicate that spsB is an essential gene. We identified an open reading frame immediately upstream of the spsB gene which encodes a type I signal peptidase homolog of 174 amino acid residues with a calculated molecular mass of 20,146 Da that is predicted to be devoid of catalytic activity.  相似文献   

14.
The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) is a free, one-stop web service for protein bioinformatic analysis. It currently offers 34 interconnected external and in-house tools, whose functionality covers sequence similarity searching, alignment construction, detection of sequence features, structure prediction, and sequence classification. This breadth has made the Toolkit an important resource for experimental biology and for teaching bioinformatic inquiry. Recently, we replaced the first version of the Toolkit, which was released in 2005 and had served around 2.5 million queries, with an entirely new version, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching. For instance, our popular remote homology detection server, HHpred, now allows pairwise comparison of two sequences or alignments and offers additional profile HMMs for several model organisms and domain databases. Here, we introduce the new version of our Toolkit and its application to the analysis of proteins.  相似文献   

15.
AAindex: amino acid index database   总被引:12,自引:0,他引:12  
AAindex is a database of amino acid indices and amino acid mutation matrices. An amino acid index is a set of 20 numerical values representing various physico-chemical and biochemical properties of amino acids. An amino acid mutation matrix is generally 20 × 20 numerical values representing similarity of amino acids. AAindex consists of two sections: AAindex1 for the collection of published amino acid indices and AAindex2 for the collection of published amino acid mutation matrices. Each entry of either AAindex1 or AAindex2 consists of the definition, the reference information, a list of related entries in terms of the correlation coefficient and the actual data. The database may be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.ad.jp/aaindex/ ) or may be downloaded by anonymous FTP (ftp://ftp.genome.ad.jp/db/genomenet/aaindex/ ).  相似文献   

16.
The Escherichia coli RrmJ gene product has recently been shown to be the 23S rRNA:U2552 specific 2'-O-ribose methyltransferase (MTase) (RrmJ). Its structure has been solved and refined to 1.5 A resolution, demonstrating conservation of the three-dimensional fold and key catalytic side chains with the vaccinia virus VP39 protein, which functions as an mRNA 5'm(7)G-cap-N-specific 2'-O-ribose MTase. Using the amino acid sequence of RrmJ as an initial probe in an iterative search of sequence databases, we identified a homologous domain in the sequence of the L protein of non-segmented, negative-sense, single-stranded RNA viruses. The plausibility of the prediction was confirmed by homology modeling and checking whether important residues at substrate/ligand-binding sites were conserved. The predicted structural compatibility and the conservation of the active site between the novel putative MTase domain and genuine 2'-O-ribose MTases, together with the available results of biochemical studies, strongly suggest that this domain is a 5'm(7)G-cap-N-specific 2'-O-ribose MTase (i.e. the cap 1 MTase). Evolutionary relationships between these proteins are also discussed.  相似文献   

17.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

18.
Analysis of catalytic residues in enzyme active sites   总被引:13,自引:0,他引:13  
We present an analysis of the residues directly involved in catalysis in 178 enzyme active sites. Specific criteria were derived to define a catalytic residue, and used to create a catalytic residue dataset, which was then analysed in terms of properties including secondary structure, solvent accessibility, flexibility, conservation, quaternary structure and function. The results indicate the dominance of a small set of amino acid residues in catalysis and give a picture of a general active site environment. It is hoped that this information will provide a better understanding of the molecular mechanisms involved in catalysis and a heuristic basis for predicting catalytic residues in enzymes of unknown function.  相似文献   

19.
20.
Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号