首页 | 本学科首页   官方微博 | 高级检索  
   检索      


An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
Authors:Janelle B Leuthaeuser  Brandon E Turner  Don Nguyendac  Gabrielle Shea  Kiran Kumar  Julia D Hayden  Angela F Harper  Shoshana D Brown  John H Morris  Thomas E Ferrin  Patricia C Babbitt  Jacquelyn S Fetrow
Institution:1. Molecular Genetics and Genomics Program, Wake Forest School of Medicine, Winston‐Salem, North Carolina;2. Department of Physics, Wake Forest University, Winston‐Salem, North Carolina;3. Biochemistry Program, Dickinson College, Carlisle, Pennsylvania;4. Department of Pharmaceutical Chemistry, University of California, San Francisco, California;5. Department of Chemistry, University of Richmond, Richmond, Virginia
Abstract:Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Keywords:functional site profile  active site profile  mechanistic determinants  isofunctional clusters  function annotation  functionally relevant clustering  misannotation
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号