PSI: indexing protein structures for fast similarity search |
| |
Authors: | Camoglu Orhan Kahveci Tamer Singh Ambuj K |
| |
Institution: | Department of Computer Science University of California, Santa Barbara, CA 93106, USA. orhan@cs.ucsb.edu |
| |
Abstract: | MOTIVATION: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques. RESULTS: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|