Phylogeny-driven target selection for large-scale genome-sequencing (and other) projects |
| |
Authors: | Markus G?ker Hans-Peter Klenk |
| |
Affiliation: | 1.Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany |
| |
Abstract: | Despite the steadily decreasing costs of genome sequencing, prioritizing organisms for sequencing remains important in large-scale projects. Phylogeny-based selection is of interest to identify those organisms whose genomes can be expected to differ most from those that have already been sequenced. Here, we describe a method that infers a phylogenetic scoring independent of which set of organisms has previously been targeted, which is computationally simple and easy to apply in practice. The scoring itself, as well as pre- and post-processing of the data, is illustrated using two real-world examples in which the method has already been applied for selecting targets for genome sequencing. These projects are the JGI CSP Genomic Encyclopedia of Bacteria and Archaea phase I, targeting 1,000 type strains, and, on a smaller-scale, the phylogenomics of the Roseobacter clade. Potential artifacts of the method are discussed and compared to a selection approach based on the taxonomic classification. |
| |
Keywords: | phylogenetic diversity genomics taxon selection 16S rRNA tree of life Genomic Encyclopedia Roseobacter clade |
|
|