Graph-based clustering for finding distant relationships in a large set of protein sequences期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Graph-based clustering for finding distant relationships in a large set of protein sequences

Authors:	Kawaji Hideya Takenaka Yoichi Matsuda Hideo

Affiliation:	Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan.

Abstract:	MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins. RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/

Keywords:
本文献已被 PubMed Oxford 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏