首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
Authors:Shun-Long Weng  Kai-Yao Huang  Fergie Joanda Kaunang  Chien-Hsun Huang  Hui-Ju Kao  Tzu-Hao Chang  Hsin-Yao Wang  Jang-Jih Lu  Tzong-Yi Lee
Institution:1.Department of Obstetrics and Gynecology,Hsinchu Mackay Memorial Hospital,Hsin-Chu,Taiwan;2.Mackay Medicine, Nursing and Management College,Taipei,Taiwan;3.Department of Medicine,Mackay Medical College,New Taipei City,Taiwan;4.Department of Medical Research,Hsinchu Mackay Memorial Hospital,Hsin-Chu,Taiwan;5.Department of Computer Science and Engineering,Yuan Ze University,Taoyuan,Taiwan;6.Tao-Yuan Hospital, Ministry of Health & Welfare,Taoyuan,Taiwan;7.Graduate Institute of Biomedical Informatics,Taipei Medical University,Taipei,Taiwan;8.Department of Laboratory Medicine,Chang Gung Memorial Hospital at Linkou,Taoyuan,Taiwan;9.Department of Medical Biotechnology and Laboratory Science,Chang Gung University,Taoyuan,Taiwan;10.Innovation Center for Big Data and Digital Convergence,Yuan Ze University,Taoyuan,Taiwan
Abstract:

Background

Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites.

Results

After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively.

Conclusion

When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号