SEGMENT: identifying compositional domains in DNA sequences |
| |
Authors: | Oliver J L Román-Roldán R Pérez J Bernaola-Galván P |
| |
Institution: | Department of Genetics, Faculty of Sciences, University of Granada, Spain. oliver@ugr.es |
| |
Abstract: | MOTIVATION: DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed. RESULTS: Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|