首页 | 本学科首页   官方微博 | 高级检索  
     


iNSP-GCAAP: Identifying nonclassical secreted proteins using global composition of amino acid properties
Authors:Trang T. T. Do  Thanh-Hoang Nguyen-Vo  Hung T. Pham  Quang H. Trinh  Binh P. Nguyen
Affiliation:1. School of Innovation, Design and Technology, Wellington Institute of Technology, Lower Hutt, New Zealand;2. School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand;3. Faculty of Information Technology, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam;4. School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
Abstract:Nonclassical secreted proteins (NSPs) refer to a group of proteins released into the extracellular environment under the facilitation of different biological transporting pathways apart from the Sec/Tat system. As experimental determination of NSPs is often costly and requires skilled handling techniques, computational approaches are necessary. In this study, we introduce iNSP-GCAAP, a computational prediction framework, to identify NSPs. We propose using global composition of a customized set of amino acid properties to encode sequence data and use the random forest (RF) algorithm for classification. We used the training dataset introduced by Zhang et al. (Bioinformatics, 36(3), 704–712, 2020) to develop our model and test it with the independent test set in the same study. The area under the receiver operating characteristic curve on that test set was 0.9256, which outperformed other state-of-the-art methods using the same datasets. Our framework is also deployed as a user-friendly web-based application to support the research community to predict NSPs.
Keywords:global composition  Gram-positive bacteria  nonclassical secreted proteins  prediction  random forest
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号