首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reversible protein phosphorylation is one of the most important post-translational modifications, which regulates various biological cellular processes. Identification of the kinase-specific phosphorylation sites is helpful for understanding the phosphorylation mechanism and regulation processes. Although a number of computational approaches have been developed, currently few studies are concerned about hierarchical structures of kinases, and most of the existing tools use only local sequence information to construct predictive models. In this work, we conduct a systematic and hierarchy-specific investigation of protein phosphorylation site prediction in which protein kinases are clustered into hierarchical structures with four levels including kinase, subfamily, family and group. To enhance phosphorylation site prediction at all hierarchical levels, functional information of proteins, including gene ontology (GO) and protein–protein interaction (PPI), is adopted in addition to primary sequence to construct prediction models based on random forest. Analysis of selected GO and PPI features shows that functional information is critical in determining protein phosphorylation sites for every hierarchical level. Furthermore, the prediction results of Phospho.ELM and additional testing dataset demonstrate that the proposed method remarkably outperforms existing phosphorylation prediction methods at all hierarchical levels. The proposed method is freely available at http://bioinformatics.ustc.edu.cn/phos_pred/.  相似文献   

2.
Viruses infect humans and progress inside the body leading to various diseases and complications. The phosphorylation of viral proteins catalyzed by host kinases plays crucial regulatory roles in enhancing replication and inhibition of normal host-cell functions. Due to its biological importance, there is a desire to identify the protein phosphorylation sites on human viruses. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in human viruses do not include the investigation of the responsible kinases. Thus, we are motivated to propose a new method to identify protein phosphorylation sites with its kinase substrate specificity on human viruses. The experimentally verified phosphorylation data were extracted from virPTM - a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate kinase substrate specificities in viral protein phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. The experimental human phosphorylation sites are collected from Phospho.ELM, grouped according to its kinase annotation, and compared with the virus MDD clusters. This investigation identifies human kinases such as CK2, PKB, CDK, and MAPK as potential kinases for catalyzing virus protein substrates as confirmed by published literature. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A five-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent testing data collected from UniProtKB and Phospho.ELM is used to make a comparison of predictive performance on three popular kinase-specific phosphorylation site prediction tools. In the independent testing, the high sensitivity and specificity of the proposed method demonstrate the predictive effectiveness of the identified substrate motifs and the importance of investigating potential kinases for viral protein phosphorylation sites.  相似文献   

3.
Wang  Cui-cui  Fang  Yaping  Xiao  Jiamin  Li  Menglong 《Amino acids》2011,40(1):239-248
RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at .  相似文献   

4.
5.
6.
Gao S  Xu S  Fang Y  Fang J 《Proteome science》2012,10(Z1):S7

Background

Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.

Methods

A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).

Results

Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.

Conclusions

The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
  相似文献   

7.
Li T  Li F  Zhang X 《Proteins》2008,70(2):404-414
Protein phosphorylation plays important roles in a variety of cellular processes. Detecting possible phosphorylation sites and their corresponding protein kinases is crucial for studying the function of many proteins. This article presents a new prediction system, called PhoScan, to predict phosphorylation sites in a kinase-family-specific way. Common phosphorylation features and kinase-specific features are extracted from substrate sequences of different protein kinases based on the analysis of published experiments, and a scoring system is developed for evaluating the possibility that a peptide can be phosphorylated by the protein kinase at the specific site in its sequence context. PhoScan can achieve a specificity of above 90% with sensitivity around 90% at kinase-family level on the data experimented. The system is applied on a set of human proteins collected from Swiss-Prot and sets of putative phosphorylation sites are predicted for protein kinase A, cyclin-dependent kinase, and casein kinase 2 families. PhoScan is available at http://bioinfo.au.tsinghua.edu.cn/phoscan/.  相似文献   

8.

Background

Genome-wide association (GWA) study has recently become a powerful approach for detecting genetic variants for common diseases without prior knowledge of the variant's location or function. Generally, in GWA studies, the most significant single-nucleotide polymorphisms (SNPs) associated with top-ranked p values are selected in stage one, with follow-up in stage two. The value of selecting SNPs based on statistically significant p values is obvious. However, when minor allele frequencies (MAFs) are relatively low, less-significant p values can still correspond to higher odds ratios (ORs), which might be more useful for prediction of disease status. Therefore, if SNPs are selected using an approach based only on significant p values, some important genetic variants might be missed. We proposed a hybrid approach for selecting candidate SNPs from the discovery stage of GWA study, based on both p values and ORs, and conducted a simulation study to demonstrate the performance of our approach.

Results

The simulation results showed that our hybrid ranking approach was more powerful than the existing ranked p value approach for identifying relatively less-common SNPs. Meanwhile, the type I error probabilities of the hybrid approach is well-controlled at the end of the second stage of the two-stage GWA study.

Conclusions

In GWA studies, SNPs should be considered for inclusion based not only on ranked p values but also on ranked ORs.  相似文献   

9.
With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein–protein interactions and so on. In this paper, we present an integrative approach called InteHC (Inte grative H ierarchical C lustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false‐positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co‐complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state‐of‐the‐art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC . Proteins 2013; 81:2023–2033. © 2013 Wiley Periodicals, Inc.  相似文献   

10.
Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes. Although nearly 10 kinase-specific prediction programs have been developed, numerous PKs have been casually classified into subgroups without a standard rule. For large scale predictions, the false positive rate has also never been addressed. In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. In addition, we developed a simple approach to estimate the theoretically maximal false positive rates. The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for 408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by GPS 2.0 was exhibited with great performance and remarkable accuracy. Using Aurora-B as an example, we also conducted a proteome-wide search and provided systematic prediction of Aurora-B-specific substrates including protein-protein interaction information. Thus, the GPS 2.0 is a useful tool for predicting protein phosphorylation sites and their cognate kinases and is freely available on line.  相似文献   

11.

Background  

Most of the existing in silico phosphorylation site prediction systems use machine learning approach that requires preparing a good set of classification data in order to build the classification knowledge. Furthermore, phosphorylation is catalyzed by kinase enzymes and hence the kinase information of the phosphorylated sites has been used as major classification data in most of the existing systems. Since the number of kinase annotations in protein sequences is far less than that of the proteins being sequenced to date, the prediction systems that use the information found from the small clique of kinase annotated proteins can not be considered as completely perfect for predicting outside the clique. Hence the systems are certainly not generalized. In this paper, a novel generalized prediction system, PPRED (Phosphorylation PREDictor) is proposed that ignores the kinase information and only uses the evolutionary information of proteins for classifying phosphorylation sites.  相似文献   

12.
Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.  相似文献   

13.
14.
Thymidylate synthase (TS) was found to be a substrate for both catalytic subunits of human CK2, with phosphorylation by CK2α and CK2α′ characterized by similar Km values, 4.6 μM and 4.2 μM, respectively, but different efficiencies, the apparent turnover number with CK2α being 10-fold higher. With both catalytic subunits, phosphorylation of human TS, like calmodulin and BID, was strongly inhibited in the presence of the regulatory subunit CK2β, the holoenzyme being activated by polylysine. Phosphorylation of recombinant human, rat, mouse and Trichinella spiralis TSs proteins was compared, with the human enzyme being apparently a much better substrate than the others. Following hydrolysis and TLC, phosphoserine was detected in human and rat, and phosphotyrosine in T. spiralis, TS, used as substrates for CK2α. MALDI-TOF MS analysis led to identification of phosphorylated Ser124 in human TS, within a sequence LGFS124TREEGD, atypical for a CK2 substrate recognition site. The phosphorylation site is located in a region considered important for the catalytic mechanism or regulation of human TS, corresponding to the loop 107-128. Following phosphorylation by CK2α, resulting in incorporation of 0.4 mol of phosphate per mol of dimeric TS, human TS exhibits unaltered Km values for dUMP and N5,10-methylenetetrahydrofolate, but a 50% lower turnover number, pointing to a strong influence of Ser124 phosphorylation on its catalytic efficiency.  相似文献   

15.
Artemis protein has irreplaceable functions in V(D)J recombination and nonhomologous end joining (NHEJ) as a hairpin and 5' and 3' overhang endonuclease. The kinase activity of the DNA-dependent protein kinase catalytic subunit (DNA-PKcs) is necessary in activating Artemis as an endonuclease. Here we report that three basal phosphorylation sites and 11 DNA-PKcs phosphorylation sites within the mammalian Artemis are all located in the C-terminal domain. All but one of these phosphorylation sites deviate from the SQ or TQ motif of DNA-PKcs that was predicted previously from in vitro phosphorylation studies. Phosphatase-treated mammalian Artemis and Artemis that is mutated at the three basal phosphorylation sites still retain DNA-PKcs-dependent endonucleolytic activities, indicating that basal phosphorylation is not required for the activation. In vivo studies of Artemis lacking the C-terminal domain have been reported to be sufficient to complement V(D)J recombination in Artemis null cells. Therefore, the C-terminal domain may have a negative regulatory effect on the Artemis endonucleolytic activities, and phosphorylation by DNA-PKcs in the C-terminal domain may relieve this inhibition.  相似文献   

16.
以小黑杨磷酸化蛋白质组为研究对象,用人工神经网络表达丝氨酸、苏氨酸等残基位点的磷酸化与氨基酸序列的结构特征之间的非线性关系,建立了BP人工神经网络模型,并用磷酸化数据对所建模型进行训练和分析,得适宜的结构为21×16∶8∶4,拟合准确度为90%,Acc、Sn、Sp、MCC分别为78%、89%、67%、0.57,对比分析结果表明,所建模型具有较强的预测能力。  相似文献   

17.
One of the most important goals of biological investigation is to uncover gene functional relations. In this study we propose a framework for extraction and integration of gene functional relations from diverse biological data sources, including gene expression data, biological literature and genomic sequence information. We introduce a two-layered Bayesian network approach to integrate relations from multiple sources into a genome-wide functional network. An experimental study was conducted on a test-bed of Arabidopsis thaliana. Evaluation of the integrated network demonstrated that relation integration could improve the reliability of relations by combining evidence from different data sources. Domain expert judgments on the gene functional clusters in the network confirmed the validity of our approach for relation integration and network inference.  相似文献   

18.
19.
Zhao XM  Wang Y  Chen L  Aihara K 《Proteins》2008,72(1):461-473
Domains are structural and functional units of proteins and play an important role in functional genomics. Theoretically, the functions of a protein can be directly inferred if the biological functions of its component domains are determined. Despite the important role that domains play, only a small number of domains have been annotated so far, and few works have been performed to predict the functions of domains. Hence, it is necessary to develop automatic methods for predicting domain functions based on various available data. In this article, two new methods, that is, the threshold-based classification method and the support vector machines method, are proposed for protein domain function prediction by integrating heterogeneous information sources, including protein-domain mapping features, domain-domain interactions, and domain coexisting features. We show that the integration of heterogeneous information sources improves not only prediction accuracy but also annotation reliability when compared with the methods using only individual information sources.  相似文献   

20.
Luo R  Zhou C  Lin J  Yang D  Shi Y  Cheng G 《Journal of Proteomics》2012,75(3):868-877
Schistosome is the causative agent of human schistosomiasis and related animal disease. Reversible protein phosphorylation plays a key role in signaling processing that are vital for a cell and organism. However, it remains to be undercharacterized in schistosomes. In the present study, we characterized in vivo protein phosphorylation events in different developmental stages (schistosomula and adult worms) of Schistosoma japonicum by using microvolume immobilized metal-ion affinity chromatography (IMAC) pipette tips coupled to nanoLC-ESI-MS/MS. In total, 127 distinct phosphorylation sites were identified in 92 proteins in S. japonicum. A comparison of the phosphopeptides identified between the schistosomula and the adult worms revealed 30 phosphoproteins co-detected in both of the two worms. These proteins included several signal molecules and enzymes such as 14-3-3 protein, cysteine string protein, heat shock protein 90, epidermal growth factor receptor pathway substrate 8, proliferation-associated protein 2G4, peptidyl-prolyl isomerase G, phosphofructokinase and thymidylate kinase. Additionally, the phosphorylation sites were examined for phosphorylation specific motif and evolutionarily conservation. The study represents the first attempt to determine in vivo protein phosphorylation in S. japonicum by using a phosphoproteomic approach. The results by providing an inventory of phosphorylated proteins may facilitate to further understand the mechanisms involved in schistosome development and growth, and then may result in the development of novel vaccine candidates and drug targets for schistosomiasis control.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号