Using Homolog Groups to Create a Whole-Genomic Tree of Free-Living Organisms: An Update |
| |
Authors: | Christopher H. House Sorel T. Fitz-Gibbon |
| |
Affiliation: | (1) Penn State Astrobiology Research Center and Department of Geosciences, Pennsylvania State University, 212 Deike Building, University Park, PA 16802, USA, US;(2) IGPP Center for Astrobiology and Department of Microbiology and Molecular Genetics, University of California, Los Angeles, CA 90095-1489, USA, US |
| |
Abstract: | Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial ``partial genomes' were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences. Received: 30 May 2001 / Accepted: 10 October 2001 |
| |
Keywords: | : Homologs — Tree of life — Genome — Archaea — Bacteria |
本文献已被 PubMed SpringerLink 等数据库收录! |
|