首页 | 本学科首页   官方微博 | 高级检索  
     


Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps
Authors:Aires-de-Sousa João  Aires-de-Sousa Luisa
Affiliation:Departamento de Química, CQFB, campus Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2829-516 Monte de Caparica Clínica, Portugal. jas@fct.unl.pt
Abstract:MOTIVATION: We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. RESULTS: To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. AVAILABILITY: Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
Keywords:
本文献已被 PubMed Oxford 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号