首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Parallel hash-based EST clustering algorithm for gene sequencing
Authors:Mudhireddy R  Ercal F  Frank R
Institution:Computer Science Department, University of Missouri-Rolla, Rolla, Missouri 65409, USA.
Abstract:EST clustering is a simple, yet effective method to discover all the genes present in a variety of species. Although using ESTs is a cost-effective approach in gene discovery, the amount of data, and hence the computational resources required, make it a very challenging problem. Time and storage requirements for EST clustering problems are prohibitively expensive. Existing tools have quadratic time complexity resulting from all against all sequence comparisons. With the rapid growth of EST data we need better and faster clustering tools. In this paper, we present HECT (Hash based EST Clustering Tool), a novel time- and memory-efficient algorithm for EST clustering. We report that HECT can cluster a 10,000 Human EST dataset (which is also used in benchmarking d2_cluster), in 207 minutes on a 1 GHz Pentium III processor which is 36 times faster than the original d2_cluster algorithm. A parallel version of HECT (PECT) is also developed and used to cluster 269,035 soybean EST sequences on IA-32 Linux cluster at National Center for Supercomputing Applications at UIUC. The parallel algorithm exhibited excellent speedup over its sequential counterpart and its memory requirements are almost negligible making it suitable to run virtually on any data size. The performance of the proposed clustering algorithms is compared against other known clustering techniques and results are reported in the paper.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号