(1) Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 BT Road, 700 108 Kolkata, India
Abstract:
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences.
These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software
called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can
be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences
and assess their statistical significance.