A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins |
| |
Authors: | Yong-Chun Zuo Wei Chen Guo-Liang Fan Qian-Zhong Li |
| |
Affiliation: | 1. School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China 2. Center of Genomics and Computational Biology, College of Sciences, Hebei United University, Tangshan, 063000, China
|
| |
Abstract: | The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|