首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.  相似文献   

2.
The O(R) regions from several lambdoid bacteriophages contain the three regulatory sites O(R)1, O(R)2 and O(R)3, to which the Cro and CI proteins can bind. These sites show imperfect dyad symmetry, have similar sequences, and generally lie on the same face of the DNA double helix. We have developed a computational method, which analyzes the O(R) regions of additional phages and predicts the location of these three sites. After tuning the method to predict known O(R) sites accurately, we used it to predict unknown sites, and ultimately compiled a database of 32 known and predicted O(R) binding site sets. We then identified sequences of the recognition helices (RH) for the cognate Cro proteins through manual inspection of multiple sequence alignments. Comparison of Cro RH and consensus O(R) half-site sequences revealed strong one-to-one correlations between two amino acids at each of three RH positions and two bases at each of three half-site positions (H1-->2, H3-->5 and H6-->6). In each of these three cases, one of the two amino acid/base-pairings corresponds to a contact observed in the crystal structure of a lambda Cro/consensus operator complex. The alternate amino acid/base combinations were rationalized using structural models. We suggest that the pairs of amino acid residues act as binary switches that efficiently modulate specificity for different consensus half-site variants during evolution. The observation of structurally reasonable amino acid-to-base correlations suggests that Cro proteins share some common rules of recognition despite their functional and structural diversity.  相似文献   

3.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

4.
Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites collected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the composition of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.  相似文献   

5.
Characterization of in vitro substrates of protein kinases by peptide library screening provides a wealth of information on the substrate specificity of kinases for amino acids at particular positions relative to the site of phosphorylation, but provides no information concerning interdependence among positions. High-throughput techniques have recently made it feasible to identify large numbers of in vivo kinase substrates. We used data from experiments on the kinases ATM/ATR and CDK1, and curated CK2 substrates to evaluate the prevalence of interactions between substrate positions within a motif and the utility of these interactions in predicting kinase substrates. Among these data, evidence of interpositional sequence dependencies is strikingly rare, and what dependency exists does little to aid in the prediction of novel kinase substrates. Significant increases in the ability of models to predict kinase-substrate specificity beyond position-independent models must come largely from inclusion of elements of biological and cellular context, rather than further analysis of substrate sequences alone. Our results suggest that, evolutionarily, kinase substrate fitness exists in a smooth energetic landscape. Taken with results from others indicating that phosphopeptide-binding domains do exhibit interpositional dependence, our data suggest that incorporation of new substrate molecules into phospho-signalling networks may be rate-limited by the evolution of suitability for binding by phosphopeptide-binding domains.  相似文献   

6.
7.
A sequence-coupled (Markov chain) model is proposed to predict the cleavage sites in proteins by proteases with extended specificity subsites. In addition to the probability of an amino acid occurring at each of these subsites as observed from a training set of oligopeptides known cleavable by HIV protease, the conditional probabilities as reflected by the neighbor-coupled effect along the subsite sequence are also taken into account. These conditional probabilities are derived from an expanded training set consisting of sufficiently large peptide sequences generated by the Monte Carlo sampling process. Very high accuracy was obtained in predicting protein cleavage sites by both HIV-1 and HIV-2 proteases. The new method provides a rapid and accurate means for analyzing the specificity of HIV protease, and hence can be used to help find effective inhibitors of HIV protease as potential drugs against AIDS. The principle of this method can also be used to study the specificity of any multisubsite enzyme.  相似文献   

8.
Protein tyrosine sulfation is an important post-translational modification of proteins that go through the secretory pathway. No clear-cut acceptor motif can be defined that allows the prediction of tyrosine sulfation sites in polypeptide chains. The Sulfinator is a software tool that can be used to predict tyrosine sulfation sites in protein sequences with an overall accuracy of 98%. Four different Hidden Markov Models were constructed, each of them specialized to recognize sulfated tyrosine residues depending on their location within the sequence: near the N-terminus, near the C-terminus, in the center of a window with a size of at least 25 amino acids, as well as in windows containing several tyrosine residues. AVAILABILITY: The Sulfinator is accessible at (http://www.expasy.org/tools/sulfinator/). Supplementary information: Sulfinator documentation is accessible at (http://www.expasy.org/tools/sulfinator/sulfinator-doc.html).  相似文献   

9.
The amino acid sequences of five monoclonal antibodies (designated mAbs A-E) which bind to the dopaminergic D-2 antagonist, haloperidol, with a variety of affinities (Kd = 4-810 nM), have been used to build theoretical, three-dimensional, computer models of the variable region combining sites. Physiocochemical interactions which have been previously determined from in vitro binding data have been used to orient the drug molecule within the combining site model. The results indicate that hydrophobic, aromatic, and ionic amino acids are involved in specific interactions with the antagonist molecule. For example, fluorescence quenching data suggests that a tryptophan residue is intimately involved in the binding of haloperidol by mAb A. Examination of the modeled structure reveals five tryptophans within the variable fragment, only one of which (H-50) is within the classical beta-barrel binding pocket and is readily accessible to the antigen. Haloperidol's relatively electron poor fluorophenyl ring system stacks with the electron-rich tryptophan ring system at a distance of 3.3 A and in so doing, places haloperidol's positively charged piperidinyl nitrogen atom within hydrogen bond distance of the negatively charged Glu-95 and Asp-100A residues of the H3 loop (Glu-H-95 and Asp-H-100A). This type of analysis for each antibody provides an interesting profile of changes in amino acid composition and hypervariable loop length which markedly effect binding affinity and specificity for a series of proteins which have similar combining site.  相似文献   

10.
Sequence analysis of the group of proteins known to be associated with hereditary diseases allows the detection of key distinctive features shared within this group. The disease proteins are characterized by greater length of their amino acid sequence, a broader phylogenetic extent, and specific conservation and paralogy profiles compared with all human proteins. This unique property pattern provides insights into the global nature of hereditary diseases and moreover can be used to predict novel disease genes. We have developed a computational method that allows the detection of genes likely to be involved in hereditary disease in the human genome. The probability score assignments for the human genome are accessible at http://maine.ebi. ac.uk:8000/services/dgp.  相似文献   

11.
Currently there exist several computational methods for predicting the functional sites in a set of homologous proteins based on their sequences. Due to difficulties in defining the functional site in a protein, it is not trivial to compare the performance of these methods, evaluate their limitations and quantify improvements by new approaches. Here, we use extensive mutation data from two proteins, Lac repressor and subtilisin, to perform such an analysis. Along with the evaluation of existing approaches, we describe a site class model of evolution as a tool to predict functional sites in proteins. The results indicate that this model, which simulates the evolution process at the amino acid level using site-specific substitution matrices, provides the most accurate information on functional sites in a given protein family. Secondly, we present an application of this model to neurotransmitter transporters, a superfamily of proteins of which we have limited experimental knowledge. Based on this application we present testable hypotheses regarding the mechanism of action of these proteins.  相似文献   

12.
Identifying local conformational changes induced by subtle differences on amino acid sequences is critical in exploring the functional variations of the proteins. In this study, we designed a computational scheme to predict the dihedral angle variations for different amino acid sequences by using conditional random field. This computational tool achieved an accuracy of 87% and 84% in 10-fold cross validation in a large data set for φ and Ψ, respectively. The prediction accuracies of φ and Ψ are positively correlated to each other for most of the 20 types of amino acids. Helical amino acids can achieve higher prediction accuracy in general, while amino acids in beet sheet have higher accuracy at specific angular regions. The prediction accuracy of φ is negatively correlated with amino acid flexibility represented by Vihinen Index. The prediction accuracy of φ can also be negatively correlated with angle distribution dispersion.  相似文献   

13.
D'Amico S  Gerday C  Feller G 《Gene》2000,253(1):95-105
The alpha-amylase sequences contained in databanks were screened for the presence of amino acid residues Arg195, Asn298 and Arg/Lys337 forming the chloride-binding site of several specialized alpha-amylases allosterically activated by this anion. This search provides 38 alpha-amylases potentially binding a chloride ion. All belong to animals, including mammals, birds, insects, acari, nematodes, molluscs, crustaceans and are also found in three extremophilic Gram-negative bacteria. An evolutionary distance tree based on complete amino acid sequences was constructed, revealing four distinct clusters of species. On the basis of multiple sequence alignment and homology modeling, invariable structural elements were defined, corresponding to the active site, the substrate binding site, the accessory binding sites, the Ca(2+) and Cl(-) binding sites, a protease-like catalytic triad and disulfide bonds. The sequence variations within functional elements allowed engineering strategies to be proposed, aimed at identifying and modifying the specificity, activity and stability of chloride-dependent alpha-amylases.  相似文献   

14.
Several algorithms have been developed that use amino acid sequences to predict whether or not a protein or a region of a protein is disordered. These algorithms make accurate predictions for disordered regions that are 30 amino acids or longer, but it is unclear whether the predictions can be directly related to the backbone dynamics of individual amino acid residues. The nuclear Overhauser effect between the amide nitrogen and hydrogen (NHNOE) provides an unambiguous measure of backbone dynamics at single residue resolution and is an excellent tool for characterizing the dynamic behavior of disordered proteins. In this report, we show that the NHNOE values for several members of a family of disordered proteins are highly correlated with the output from three popular algorithms used to predict disordered regions from amino acid sequence. This is the first test between an experimental measure of residue specific backbone dynamics and disorder predictions. The results suggest that some disorder predictors can accurately estimate the backbone dynamics of individual amino acids in a long disordered region.  相似文献   

15.
Computational design of protein-ligand interfaces finds optimal amino acid sequences within a small-molecule binding site of a protein for tight binding of a specific small molecule. It requires a search algorithm that can rapidly sample the vast sequence and conformational space, and a scoring function that can identify low energy designs. This review focuses on recent advances in computational design methods and their application to protein-small molecule binding sites. Strategies for increasing affinity, altering specificity, creating broad-spectrum binding, and building novel enzymes from scratch are described. Future prospects for applications in drug development are discussed, including limitations that will need to be overcome to achieve computational design of protein therapeutics with novel modes of action.  相似文献   

16.
Haloalkane dehalogenases catalyse environmentally important dehalogenation reactions. These microbial enzymes represent objects of interest for protein engineering studies, attempting to improve their catalytic efficiency or broaden their substrate specificity towards environmental pollutants. This paper presents the results of a comparative study of haloalkane dehalogenases originating from different organisms. Protein sequences and the models of tertiary structures of haloalkane dehalogenases were compared to investigate the protein fold, reaction mechanism and substrate specificity of these enzymes. Haloalkane dehalogenases contain the structural motifs of alpha/beta-hydrolases and epoxidases within their sequences. They contain a catalytic triad with two different topological arrangements. The presence of a structurally conserved oxyanion hole suggests the two-step reaction mechanism previously described for haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. The differences in substrate specificity of haloalkane dehalogenases originating from different species might be related to the size and geometry of an active site and its entrance and the efficiency of the transition state and halide ion stabilization by active site residues. Structurally conserved motifs identified within the sequences can be used for the design of specific primers for the experimental screening of haloalkane dehalogenases. Those amino acids which were predicted to be functionally important represent possible targets for future site-directed mutagenesis experiments.  相似文献   

17.
We have studied the relationship between amino acid sequence and substrate specificity in a DNA glycosylase family by characterizing experimentally the specificity of four new members of the family. We show that principal component analysis (PCA) of the sequence family correctly predicts the substrate specificity of one of the novel homologs even though conventional sequence analysis methods fail to group this homolog with other sequences of the same specificity. PCA also suggested, correctly, that another homolog characterized previously differs in its specificity from those sequences with which it clusters by conventional criteria. These results suggest that principal component analysis of sequence families can be a useful tool in annotating genome sequences when there is ambiguity concerning which subfamily a new homolog belongs to. Published 2000 Wiley-Liss, Inc.  相似文献   

18.
We present a new support vector machine (SVM)-based approach to predict the substrate specificity of subtypes of a given protein sequence family. We demonstrate the usefulness of this method on the example of aryl acid-activating and amino acid-activating adenylation domains (A domains) of nonribosomal peptide synthetases (NRPS). The residues of gramicidin synthetase A that are 8 A around the substrate amino acid and corresponding positions of other adenylation domain sequences with 397 known and unknown specificities were extracted and used to encode this physico-chemical fingerprint into normalized real-valued feature vectors based on the physico-chemical properties of the amino acids. The SVM software package SVM(light) was used for training and classification, with transductive SVMs to take advantage of the information inherent in unlabeled data. Specificities for very similar substrates that frequently show cross-specificities were pooled to the so-called composite specificities and predictive models were built for them. The reliability of the models was confirmed in cross-validations and in comparison with a currently used sequence-comparison-based method. When comparing the predictions for 1230 NRPS A domains that are currently detectable in UniProt, the new method was able to give a specificity prediction in an additional 18% of the cases compared with the old method. For 70% of the sequences both methods agreed, for <6% they did not, mainly on low-confidence predictions by the existing method. None of the predictive methods could infer any specificity for 2.4% of the sequences, suggesting completely new types of specificity.  相似文献   

19.
The advent of whole genome sequencing leads to increasing number of proteins with known amino acid sequences. Despite many efforts, the number of proteins with resolved three dimensional structures is still low. One of the challenging tasks the structural biologists face is the prediction of the interaction of metal ion with any protein for which the structure is unknown. Based on the information available in Protein Data Bank, a site (METALACTIVE INTERACTION) has been generated which displays information for significant high preferential and low‐preferential combination of endogenous ligands for 49 metal ions. User can also gain information about the residues present in the first and second coordination sphere as it plays a major role in maintaining the structure and function of metalloproteins in biological system. In this paper, a novel computational tool (ZINCCLUSTER) is developed, which can predict the zinc metal binding sites of proteins even if only the primary sequence is known. The purpose of this tool is to predict the active site cluster of an uncharacterized protein based on its primary sequence or a 3D structure. The tool can predict amino acids interacting with a metal or vice versa. This tool is based on the occurrence of significant triplets and it is tested to have higher prediction accuracy when compared to that of other available techniques.  相似文献   

20.
HIV-1 protease is a small homodimeric enzyme that ensures maturation of HIV virions by cleaving the viral precursor Gag and Gag-Pol polyproteins into structural and functional elements. The cleavage sites in the viral polyproteins share neither sequence homology nor binding motif and the specificity of the HIV-1 protease is therefore only partially understood. Using an extensive data set collected from 16 years of HIV proteome research we have here created a general and predictive rule-based model for HIV-1 protease specificity based on rough sets. We demonstrate that HIV-1 protease specificity is much more complex than previously anticipated, which cannot be defined based solely on the amino acids at the substrate's scissile bond or by any other single substrate amino acid position only. Our results show that the combination of at least three particular amino acids is needed in the substrate for a cleavage event to occur. Only by combining and analyzing massive amounts of HIV proteome data it was possible to discover these novel and general patterns of physico-chemical substrate cleavage determinants. Our study is an example how computational biology methods can advance the understanding of the viral interactomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号