首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Phosphorylation is a crucial way to control the activity of proteins in many eukaryotic organisms in vivo. Experimental methods to determine phosphorylation sites in substrates are usually restricted by the in vitro condition of enzymes and very intensive in time and labor. Although some in silico methods and web servers have been introduced for automatic detection of phosphorylation sites, sophisticated methods are still in urgent demand to further improve prediction performances. Protein primary se-quences can help predict phosphorylation sites catalyzed by different protein kinase and most com-putational approaches use a short local peptide to make prediction. However, the useful information may be lost if only the conservative residues that are not close to the phosphorylation site are consid-ered in prediction, which would hamper the prediction results. A novel prediction method named IEPP (Information-Entropy based Phosphorylation Prediction) is presented in this paper for automatic de-tection of potential phosphorylation sites. In prediction, the sites around the phosphorylation sites are selected or excluded by their entropy values. The algorithm was compared with other methods such as GSP and PPSP on the ABL, MAPK and PKA PK families. The superior prediction accuracies were ob-tained in various measurements such as sensitivity (Sn) and specificity (Sp). Furthermore, compared with some online prediction web servers on the new discovered phosphorylation sites, IEPP also yielded the best performance. IEPP is another useful computational resource for identification of PK-specific phosphorylation sites and it also has the advantages of simpleness, efficiency and con-venience.  相似文献   

2.
Phosphorylation is one of the most important post-translational modifications, and the identification of protein phosphorylation sites is particularly important for studying disease diagnosis. However, experimental detection of phosphorylation sites is labor intensive. It would be beneficial if computational methods are available to provide an extra reference for the phosphorylation sites. Here we developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction. Nearest Neighbor algorithm was employed as the prediction engine. The peptides around the phosphorylation sites with a fixed length of thirteen amino acid residues were extracted via a sliding window along the protein chains concerned. Each of such peptides was coded into a vector with 6,072 features, derived from Amino Acid Index (AAIndex) database, for the classification/detection. Incremental Feature Selection, a feature selection algorithm based on the Maximum Relevancy Minimum Redundancy (mRMR) method was used to select a compact feature set for a further improvement of the classification performance. Three predictors were established for identifying the three types of phosphorylation sites, achieving the overall accuracies of 66.64%, 66.11%% and 66.69%, respectively. These rates were obtained by rigorous jackknife cross-validation tests.  相似文献   

3.
4.
EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods.  相似文献   

5.
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.  相似文献   

6.
In the last decade, directed evolution has become a routine approach for engineering proteins with novel or altered properties. Concurrently, a trend away from purely 'blind' randomization strategies and towards more 'semi-rational' approaches has also become apparent. In this review, we discuss ways in which structural information and predictive computational tools are playing an increasingly important role in guiding the design of randomized libraries: web servers such as ConSurf-HSSP and SCHEMA allow the prediction of sites to target for producing functional variants, while algorithms such as GLUE, PEDEL and DRIVeR are useful for estimating library completeness and diversity. In addition, we review recent methodological developments that facilitate the construction of unbiased libraries, which are inherently more diverse than biased libraries and therefore more likely to yield improved variants.  相似文献   

7.

Background  

Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.  相似文献   

8.
Protein phosphorylation, mediated by a family of enzymes called cyclin-dependent kinases (Cdks), plays a central role in the cell-division cycle of eukaryotes. Phosphorylation by Cdks directs the cell cycle by modifying the function of regulators of key processes such as DNA replication and mitotic progression. Here, we present a novel computational procedure to predict substrates of the cyclin-dependent kinase Cdc28 (Cdk1) in the Saccharomyces cerevisiae. Currently, most computational phosphorylation site prediction procedures focus solely on local sequence characteristics. In the present procedure, we model Cdk substrates based on both local and global characteristics of the substrates. Thus, we define the local sequence motifs that represent the Cdc28 phosphorylation sites and subsequently model clustering of these motifs within the protein sequences. This restraint reflects the observation that many known Cdk substrates contain multiple clustered phosphorylation sites. The present strategy defines a subset of the proteome that is highly enriched for Cdk substrates, as validated by comparing it to a set of bona fide, published, experimentally characterized Cdk substrates which was to our knowledge, comprehensive at the time of writing. To corroborate our model, we compared its predictions with three experimentally independent Cdk proteomic datasets and found significant overlap. Finally, we directly detected in vivo phosphorylation at Cdk motifs for selected putative substrates using mass spectrometry.  相似文献   

9.
Binding of short antigenic peptides to major histocompatibility complex (MHC) molecules is a core step in adaptive immune response. Precise identification of MHC-restricted peptides is of great significance for understanding the mechanism of immune response and promoting the discovery of immunogenic epitopes. However, due to the extremely high MHC polymorphism and huge cost of biochemical experiments, there is no experimentally measured binding data for most MHC molecules. To address the problem of predicting peptides binding to these MHC molecules, recently computational approaches, called pan-specific methods, have received keen interest. Pan-specific methods make use of experimentally obtained binding data of multiple alleles, by which binding peptides (binders) of not only these alleles but also those alleles with no known binders can be predicted. To investigate the possibility of further improvement in performance and usability of pan-specific methods, this article extensively reviews existing pan-specific methods and their web servers. We first present a general framework of pan-specific methods. Then, the strategies and performance as well as utilities of web servers are compared. Finally, we discuss the future direction to improve pan-specific methods for MHC-peptide binding prediction.  相似文献   

10.
Phosphorylation is catalyzed by protein kinases and is irreplaceable in regulating biological processes. Identification of phosphorylation sites with their corresponding kinases contributes to the understanding of molecular mechanisms. Mass spectrometry analysis of phosphor-proteomes generates a large number of phosphorylated sites. However, experimental methods are costly and time-consuming, and most phosphorylation sites determined by experimental methods lack kinase information. Therefore, computational methods are urgently needed to address the kinase identification problem. To this end, we propose a new kernel-based machine learning method called Supervised Laplacian Regularized Least Squares (SLapRLS), which adopts a new method to construct kernels based on the similarity matrix and minimizes both structure risk and overall inconsistency between labels and similarities. The results predicted using both Phospho.ELM and an additional independent test dataset indicate that SLapRLS can more effectively identify kinases compared to other existing algorithms.  相似文献   

11.
Huang JH  Cao DS  Yan J  Xu QS  Hu QN  Liang YZ 《Biochimie》2012,94(8):1697-1704
As the most frequent drug target, G protein-coupled receptors (GPCRs) are a large family of seven trans-membrane receptors that sense molecules outside the cell and activate inside signal transduction pathways. The activity and lifetime of activated receptors are regulated by receptor phosphorylation. Therefore, investigating the exact positions of phosphorylation sites in GPCRs sequence could provide useful clues for drug design and other biotechnology applications. Experimental identification of phosphorylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of phosphorylation sites from amino acid sequences. In this article, we presented a simple and effective method to recognize phosphorylation sites of human GPCRs by combining amino acid hydrophobicity and support vector machine. The prediction accuracy, sensitivity, specificity, Matthews correlation coefficient and area under the curve values for phosphoserine, phosphothreonine, and phosphotyrosine were 0.964, 0.790, 0.999, 0.866, 0.941; 0.954, 0.800, 0.985, 0.828, 0.958; and 0.976, 0.820, 0.993, 0.861, 0.959, respectively. The establishment of such a fast and accurate prediction method will speed up the pace of identifying proper GPCRs sites to facilitate drug discovery.  相似文献   

12.
Zhou F  Xue Y  Yao X  Xu Y 《Nature protocols》2006,1(3):1318-1321
Post-translational modifications (PTMs) of proteins play essential roles in governing the functions and dynamics of proteins and are implicated in many cellular processes. Several types of PTMs have been investigated through computational approaches, including phosphorylation, sumoylation, palmitoylation, and lysine and arginine methylation, among others. Because the large diversity in the user interfaces (UIs) of different prediction servers for PTMs could possibly hinder experimental biologists in using these servers, we propose to develop a protocol for a unified UI for PTM prediction servers, based on our own work and that of other groups on PTM site prediction. By following this protocol, tool developers can provide a uniform UI regardless of the PTM types and the underlying computational algorithms. With such uniformity in the UI, experimental biologists would be able to use any PTM prediction server compliant with this protocol once they had learned to use one of them. It takes a typical PTM prediction server compliant with this unified UI several minutes to calculate the prediction results for a protein 1,000 amino acids in length.  相似文献   

13.
Li T  Du P  Xu N 《PloS one》2010,5(11):e15411
Phosphorylation is an important type of protein post-translational modification. Identification of possible phosphorylation sites of a protein is important for understanding its functions. Unbiased screening for phosphorylation sites by in vitro or in vivo experiments is time consuming and expensive; in silico prediction can provide functional candidates and help narrow down the experimental efforts. Most of the existing prediction algorithms take only the polypeptide sequence around the phosphorylation sites into consideration. However, protein phosphorylation is a very complex biological process in vivo. The polypeptide sequences around the potential sites are not sufficient to determine the phosphorylation status of those residues. In the current work, we integrated various data sources such as protein functional domains, protein subcellular location and protein-protein interactions, along with the polypeptide sequences to predict protein phosphorylation sites. The heterogeneous information significantly boosted the prediction accuracy for some kinase families. To demonstrate potential application of our method, we scanned a set of human proteins and predicted putative phosphorylation sites for Cyclin-dependent kinases, Casein kinase 2, Glycogen synthase kinase 3, Mitogen-activated protein kinases, protein kinase A, and protein kinase C families (available at http://cmbi.bjmu.edu.cn/huphospho). The predicted phosphorylation sites can serve as candidates for further experimental validation. Our strategy may also be applicable for the in silico identification of other post-translational modification substrates.  相似文献   

14.
Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.  相似文献   

15.
Phosphorylation is one of the most common forms of protein modification. The most frequent targets for protein phosphorylation in eukaryotes are serine and threonine residues, although tyrosine residues also undergo phosphorylation. Many of the currently applied methods for the detection and localization of protein phosphorylation sites are mass spectrometry-based and are biased against the analysis of tyrosine-phosphorylated residues because of the stability and low reactivity of phosphotyrosines. To overcome this lack of sensitive methods for the detection of phosphotyrosine-containing peptides, we have recently developed a method that is not affected by the more predominant threonine or serine phosphorylation within cells. It is based on the specific detection of immonium ion of phosphotyrosine at 216.043 Da and does not require prior knowledge of the protein sequence. In this report, we describe the first application of this new method in a proteomic strategy. Using anti-phosphotyrosine antibodies for immunoprecipitation and one-dimensional gel electrophoresis, we have identified 10 proteins in the epidermal growth factor receptor signaling pathway, of which 8 have been shown previously to be involved in epidermal growth factor signaling. Most importantly, in addition to several known tyrosine phosphorylation sites, we have identified five novel sites on SHIP-2, Hrs, Cbl, STAM, and STAM2, most of which were not predicted to be phosphorylated. Because of its sensitivity and selectivity, this approach will be useful in proteomic approaches to study tyrosine phosphorylation in a number of signal transduction pathways.  相似文献   

16.
A number of complementary methods have been developed for predicting protein-protein interaction sites. We sought to increase prediction robustness and accuracy by combining results from different predictors, and report here a meta web server, meta-PPISP, that is built on three individual web servers: cons-PPISP (http://pipe.scs.fsu.edu/ppisp.html), Promate (http://bioportal.weizmann.ac.il/promate), and PINUP (http://sparks.informatics.iupui.edu/PINUP/). A linear regression method, using the raw scores of the three servers as input, was trained on a set of 35 nonhomologous proteins. Cross validation showed that meta-PPISP outperforms all the three individual servers. At coverages identical to those of the individual methods, the accuracy of meta-PPISP is higher by 4.8 to 18.2 percentage points. Similar improvements in accuracy are also seen on CAPRI and other targets. AVAILABILITY: meta-PPISP can be accessed at http://pipe.scs.fsu.edu/meta-ppisp.html  相似文献   

17.
Fold recognition techniques assist the exploration of protein structures, and web-based servers are part of the standard set of tools used in the analysis of biochemical problems. Despite their success, current methods are only able to predict the correct fold in a relatively small number of cases. We propose an approach that improves the selection of correct folds from among the results of two methods implemented as web servers (SAMT99 and 3DPSSM). Our approach is based on the training of a system of neural networks with models generated by the servers and a set of associated characteristics such as the quality of the sequence-structure alignment, distribution of sequence features (sequence-conserved positions and apolar residues), and compactness of the resulting models. Our results show that it is possible to detect adequate folds to model 80% of the sequences with a high level of confidence. The improvements achieved by taking into account sequence characteristics open the door to future improvements by directly including such factors in the step of model generation. This approach has been implemented as an automatic system LIBELLULA, available as a public web server at http://www.pdg.cnb.uam.es/servers/libellula.html.  相似文献   

18.
Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/.  相似文献   

19.
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.  相似文献   

20.
The accurate prediction of the biochemical function of a protein is becoming increasingly important, given the unprecedented growth of both structural and sequence databanks. Consequently, computational methods are required to analyse such data in an automated manner to ensure genomes are annotated accurately. Protein structure prediction methods, for example, are capable of generating approximate structural models on a genome-wide scale. However, the detection of functionally important regions in such crude models, as well as structural genomics targets, remains an extremely important problem. The method described in the current study, MetSite, represents a fully automatic approach for the detection of metal-binding residue clusters applicable to protein models of moderate quality. The method involves using sequence profile information in combination with approximate structural data. Several neural network classifiers are shown to be able to distinguish metal sites from non-sites with a mean accuracy of 94.5%. The method was demonstrated to identify metal-binding sites correctly in LiveBench targets where no obvious metal-binding sequence motifs were detectable using InterPro. Accurate detection of metal sites was shown to be feasible for low-resolution predicted structures generated using mGenTHREADER where no side-chain information was available. High-scoring predictions were observed for a recently solved hypothetical protein from Haemophilus influenzae, indicating a putative metal-binding site.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号