A strategy for finding regions of similarity in complete genome sequences |
| |
Authors: | Vincens, P Buffat, L Andre, C Chevrolat, JP Boisvieux, JF Hazout, S |
| |
Affiliation: | 1Departement de Biologie (FR 36), Ecole Normale Superieure, 46 rue d'Ulm, 75230 Paris Cedex 05, France. |
| |
Abstract: | MOTIVATION: Complete genomic sequences will become available in the future.New methods to deal with very large sequences (sizes beyond 100 kb)efficiently are required. One of the main aims of such work is to increaseour understanding of genome organization and evolution. This requiresstudies of the locations of regions of similarity. RESULTS: We present herea new tool, ASSIRC ('Accelerated Search for SImilarity Regions inChromosomes'), for finding regions of similarity in genomic sequences. Themethod involves three steps: (i) identification of short exact chains offixed size, called 'seeds', common to both sequences, using hashingfunctions; (ii) extension of these seeds into putative regions ofsimilarity by a 'random walk' procedure; (iii) final selection of regionsof similarity by assessing alignments of the putative sequences. We usedsimulations to estimate the proportion of regions of similarity notdetected for particular region sizes, base identity proportions and seedsizes. This approach can be tailored to the user's specifications. Welooked for regions of similarity between two yeast chromosomes (V and IX).The efficiency of the approach was compared to those of conventionalprograms BLAST and FASTA, by assessing CPU time required and the regions ofsimilarity found for the same data set. AVAILABILITY: Source programs arefreely available at the following address: ftp://ftp.biologie.ens.fr/pub/molbio/assirc.tar.gz CONTACT: vincens@biologie.ens.fr,hazout@urbb.jussieu.fr |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|