Statistical distance between texts and filtration methods in sequence comparison |
| |
Authors: | Pevzner Pavel A |
| |
Institution: | Department of Mathematics, University of Southern California Los Angeles, CA 90089-1113, USA and Laboratory of Mathematical Methods, Institute of Genetics of Microorganisms Moscow 113545, USSR |
| |
Abstract: | Upon searching local similarities in long sequences, the necessityof a rapid similarity search becomes acute. Quadraticcomplexity of dynamic programming algorithms forces the employmentof filtration methods that allow elimination of the sequenceswith a low similarity level. The paper is devoted to the theoreticalsubstantiations of the filtration method based on the statisticaldistance between texts. The notion of the filtration efficiencyis introduced and the efficiency of several filters is estimated.It is shown that the efficiency of the statistical l-tuple filtrationupon DNA database search is associated with a potential extensionof the original fourletter alphabet and grows exponentiallywith increasing l. The formula that allows one to estimate thefiltration parameters is presented. |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|