Abstract: | ABSTRACT: BACKGROUND: Understanding the history of life requires that we understand the transfer of genetic materialacross phylogenetic boundaries. Detecting genes that were acquired by means other thanvertical descent is a basic step in that process. Detection by discordant phylogenies iscomputationally expensive and not always definitive. Many have used easily computedcompositional features as an alternative procedure. However, different compositionalmethods produce different predictions, and the effectiveness of any method is not wellestablished. RESULTS: The ability of octamer frequency comparisons to detect genes artificially seeded incyanobacterial genomes was markedly increased by using as a training set those genes thatare highly conserved over all bacteria. Using a subset of octamer frequencies in such testsalso increased effectiveness, but this depended on the specific target genome and the sourceof the contaminating genes. The presence of high frequency octamers and the GC content ofthe contaminating genes were important considerations. A method comprising best practicesfrom these tests was devised, the Core Gene Similarity (CGS) method, and it performedbetter than simple octamer frequency analysis, codon bias, or GC contrasts in detectingseeded genes or naturally occurring transposons. From a comparison of predictions withphylogenetic trees, it appears that the effectiveness of the method is confined to horizontaltransfer events that have occurred recently in evolutionary time. CONCLUSIONS: The CGS method may be an improvement over existing surrogate methods to detect genes offoreign origin. |