Discovering simple DNA sequences by the algorithmic significance method |
| |
Authors: | Milosavljevic, Aleksandar Jurka, Jerzy |
| |
Affiliation: | Linus Pauling Institute of Science and Medicine 440 Page Mill Rd, Palo Alto, CA 94306, USA |
| |
Abstract: | A new method, algorithmic significance, is proposedas a tool for discovery of patterns in DNA sequences. The mainidea is that patterns can be discovered by finding ways to encodethe observed data concisely. In this sense, the method can beviewed as a formal version of the Occam's Razor principle. Inthis paper the method is applied to discover significantly simpleDNA sequences. We define DNA sequences to be simple if theycontain repeated occurrences of certain wordsand thus can be encoded in a small number of bits. Such definitionincludes minisatellites and microsatellites. A standard dynamicprogramming algorithm for data compression is applied to computethe minimal encoding lengths of sequences in linear time. Anelectronic mail server for identification of simple sequencesbased on the proposed method has been installed at the Internetaddress pythia@anl.gov. |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|