PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species |
| |
Authors: | Derrick E. Fouts Lauren Brinkac Erin Beck Jason Inman Granger Sutton |
| |
Affiliation: | J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA |
| |
Abstract: | Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods. |
| |
Keywords: | |
|
|