首页 | 本学科首页   官方微博 | 高级检索  
     


Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets
Authors:Cheng Hua  Kim Bong-Hyun  Grishin Nick V
Affiliation:Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA
Abstract:A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.
Keywords:PDB, Protein Data Bank   SCOP, Structural Classification of Proteins   SSE, secondary structural element   SVM, support vector machine   OrnDC-C, ornithine decarboxylase C-terminal domain   MoeA-I, molybdenum cofactor biosynthesis protein MoeA domain I   CBD, collagen-binding domain   CBM, carbohydrate-binding module   AHM, alignment-based Hausdorff measure   LHM, loop-based Hausdorff measure
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号