A symbolic-numeric approach to find patterns in genomes. Application to the translation initiation sites of E. coli |
| |
Authors: | Delamarche C Guerdoux-Jamet P Gras R Nicolas J |
| |
Affiliation: | UPRES-A 6026 CNRS, équipe 'Canaux et Récepteurs Membranaires', batiment 13, Campus de Beaulieu, 35042 Rennes, cedex France. |
| |
Abstract: | DNA sequence data provided by genome sequencing programs open new research prospects. In this respect, computational investigations are of major importance to discover new 'functional/structural patterns' and to improve biological process knowledge. For example, even though the principal steps of translation initiation in prokaryotes are known, it is difficult to point out the exact pattern of the mRNA that is recognized by the ribosome. In this study, we have carried out a systematic context analysis of the complete genome of E. coli, around codons in competition for translation initiation. Using a combinatorial approach, we first show that it is possible to accurately define the initiation site by looking for the localization of patterns representing various combinations of trinucleotides. We have combined this approach with a statistical analysis based on the frequencies of these patterns. This leads to a decision tree, able to discriminate true and false starts with a recognition level near 90%. Our method may help to precisely localize the beginning of open reading frames, and point to likely mistakes for some genes in the database. The method may be included as a component of a gene recognition system, is not restricted to a particular genome or a two-classes discrimination, and may be applied to a broader class of biological patterns. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|