首页 | 本学科首页   官方微博 | 高级检索  
   检索      


An efficient algorithm for finding short approximate non-tandem repeats
Authors:Adebiyi E F  Jiang T  Kaufmann M
Institution:Wilhelm-Schickard-Institut für Informatik, Universit?t Tübingen, Sand 13, Tübingen, 72076, Germany. adebiyi@informatik.uni-tuebingen.de
Abstract:We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact repeats) required by the algorithm, and prove a sublinear bound on their expected numbers. Using this result, we present a sub-quadratic algorithm for finding all short (i.e., of length O(log N)) approximate repeats. The running time of our algorithm is O(DN(3pow(epsilon)-1)log N), where epsilon = D/P and pow(epsilon) is an increasing, concave function that is 0 when epsilon = 0 and about 0.9 for DNA and protein sequences.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号