首页 | 本学科首页   官方微博 | 高级检索  
   检索      


The Genome of Thermosipho africanus TCF52B: Lateral Genetic Connections to the Firmicutes and Archaea
Authors:Camilla L Nesb?  Eric Bapteste  Bruce Curtis  H?kon Dahle  Philippe Lopez  Dave Macleod  Marlena Dlutek  Sharen Bowman  Olga Zhaxybayeva  Nils-K?re Birkeland  W Ford Doolittle
Institution:Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada,1. UMR CNRS 7138 Systématique, Adaptation et Evolution, Universite Pierre et Marie Curie, Paris, France,2. Centre for Geobiology, University of Bergen, Bergen, Norway,3. Department of Process Engineering and Applied Science, Dalhousie, University, Halifax, Nova Scotia B3J 1Z1, Canada,4. Department of Biology, University of Bergen, Bergen, Norway5.
Abstract:Lateral gene transfers (LGT) (also called horizontal gene transfers) have been a major force shaping the Thermosipho africanus TCF52B genome, whose sequence we describe here. Firmicutes emerge as the principal LGT partner. Twenty-six percent of phylogenetic trees suggest LGT with this group, while 13% of the open reading frames indicate LGT with Archaea.Thermosipho africanus TCF52B was isolated from produced fluids of a high-temperature oil reservoir in the North Sea using fish waste as the only substrate (4). Phylogenetic analyses based on the 16S rRNA gene sequence and DNA-DNA hybridization placed it as a strain of Thermosipho africanus, which was first isolated from a shallow marine hydrothermal system in Djibouti, Africa (8, 21).The complete genome sequence of this strain was determined by the conventional whole-genome shotgun strategy. Genomic libraries containing 1- to 4-kb and 40-kb fragments were constructed, and sequence chromatograms were produced using a MegaBACE 1000 capillary DNA sequencer (GE Healthcare). Nucleotide skews were computed as described previously (11). Automated open reading frame (ORF) identification and annotation were performed using the annotation software Manatee made available by TIGR (23). Pseudogenes were identified by doing BLAST searches of neighboring ORFs with the same or similar annotations and by using the program Psi-phi (9, 10), and clustered regularly interspaced short palindromic repeat loci (CRISPRs) were identified using the web site http://crispr.u-psud.fr/crispr/CRISPRHomePage.php with the default parameters (6). Maximum-likelihood (ML) trees (WAG Γ+Ι model, four categories]) were constructed from protein-coding ORFs using PHYML and the PhyloGenie package (5). Recently, several Thermotogales genomes have become available in GenBank. As these genomes had not been published yet, we did not include them in any “genome-scale” analyses (i.e., the phylogenetic analyses). We did, however, include them in the BLAST analyses of mobile Thermosipho africanus genes.The genome of Thermosipho africanus strain TCF52B is a single circular chromosome consisting of 2,016,657 bp with an average G+C content of 30.8%. Strand asymmetries, such as GC skew and tetramer skews, are pronounced and show two clear singularity points, located at roughly 8 kb and 1033 kb from the +1 site (see Fig. S1 in the supplemental material). Since these two points are diametrically opposed on the circular chromosome, dividing it into two halves with opposite compositional skews, they make good candidates for the putative origin and termination of replication. The 1,033-kb region is likely to harbor the origin, since GC skew becomes positive past this location, as in most bacterial genomes with a known origin.The genome contains 2,000 potential coding sequences, of which 1913 are putative protein-coding ORFs, 30 are putatively assigned as pseudogenes, and 57 encode RNA. A comparison to the genome of Thermotoga maritima is given in Table Table1.1. The Thermosipho africanus genome is about 156 kb larger than the Thermotoga maritima genome and carries 36 more ORFs. The genome contains duplicated regions comprising paralogous gene copies, CRISPRs, and mobile genetic elements, which collectively provide considerable indirect evidence for genomic instability and acquisition of exogenous genetic information.

TABLE 1.

General features of the Thermosipho africanus genome, with a comparison to Thermotoga maritima
FeatureThermosipho africanusThermotoga maritima
Length of sequence (bp)2,016,6571,860,725
G+C content (%)30.846
No. of:
    ORFs1,9131,877
    Pseudogenes (disrupted reading frame)30 (17 transposase and integrases)3 (1 transposase) (28 according to http://www-bio3d-igbmc.u-strasbg.fr/ICDS/)
    rRNAs3 16S-23S-5S1 16S-23S-5S
    tRNAs48 (11 clusters, 19 single genes)46 (10 clusters, 19 single genes)
CRISPR direct repeats
    CRISPR 1, 2, 4GTTTAGAATCTACCTATGAGGAATGAAAACTTTCCATACCTCTAAGGAATTATTGAAACA
    CRISPR 3, 5, 6, 7, 11GTTTTCATTCCTCATAGGTAGATTCTAAAC
    CRISPR 8, 9, 12RTTTCAATTCCTRCAAGGTAAGGTACAAAC
    CRISPR 10GTTTCAATCCCTAATAGGTATGCTAAAAAC
Open in a separate windowCRISPR structures comprise direct genomic repeats of 24 to 47 bp length separated by variable-length spacers (1, 13, 22) and are thought to function as a prokaryotic “immune system.” Due to their patchy distribution in prokaryotes, CRISPRs are often assumed to undergo frequent lateral transfer. Thermosipho africanus displays 12 CRIPSRs spread over its chromosome (Fig. (Fig.1),1), compared to 8 such loci in Thermotoga maritima (15). These 12 CRISPRs fall into four groups based on the sequence of their direct repeats (Table (Table1).1). CRISPR-associated proteins, encoded by CRISPR-associated (Cas) genes near CRISPR repeats, function somehow in CRISPR biology, and Cas gene phylogenies provide some of the most compelling evidence for CRISPR mobility (7). In Thermosipho africanus their phylogenetic origins appear to be especially complex. Most interestingly, they do not show strong affinities with other Thermotogales sequences. Instead, although Thermotoga maritima MSB8 harbors many Cas genes (26 in reference 7), in almost every case these do not branch together in ML trees; they are sisters in only 3 of 25 trees (Thermosipho africanus has 30 Cas genes).Open in a separate windowFIG. 1.Distribution of CRISPR loci and mobile elements along the Thermosipho africanus genome, as well as phylogenetic “affiliation” of genes along the chromosome and the GC contents of genes. Outer circle, phylogenetic affiliation of the sister of Thermosipho africanus in phylogenetic trees estimated from predicted ORFs. The following color coding for the sister in the phylogenetic tree was used: green, self; red, Thermotogales; yellow, Firmicutes; blue, Archaea; orange, “others” as defined in Fig. Fig.2;2; pink, complex; gray, complex including Thermotogales; light blue, no tree. Second and third circles, distribution of the mobile elements along the Thermosipho africanus chromosome. Mobile elements in forward orientation are indicated in red, and mobile elements in reverse orientation are indicated in blue. Fourth circle, distribution of CRISPRS and Cas genes along the genome. CRISPR repeats are in green, and Cas genes are in purple. Innermost circle, distribution of gene GC content. Genes having a GC content above the mean are in red, while those with a GC content below the mean are in green. The three spikes in GC content correspond to rRNA operons.Seventy-eight ORFs were annotated as encoding transposases or integrases, and at least 61 of these are likely to be active genes (Fig. (Fig.1).1). (In contrast, the Thermotoga maritima genome contains only 12 ORFs annotated as encoding transposases.) All 78 fall into one of eight groups of highly similar sequences, and each of the 78 is sister to another (see Table S1 in the supplemental material), indicating recent intragenomic transposition and/or lateral gene transfers (LGT) from a closely related lineage. Remarkably, only four of these eight families had homologs in other Thermotogales genomes, and there are no homologs in its closest relative, Thermosipho melanesiensis (see Table S1 in the supplemental material). We did, however, detect likely inactive homologs in Thermosipho melanesiensis for three of the groups (see Table S1 in the supplemental material).We attempted to calculate ML phylogenetic trees from each of the 1,913 ORFs and obtained trees from 1,578 (82%), using the PhyloGenie package. The distribution of the “immediate sisters” (nearest neighbors) of Thermosipho africanus in the trees is shown in Fig. Fig.2.2. In 60% of the trees the sister was another Thermotogales bacterium, in most cases Thermotoga maritima, since this was the only other complete Thermotogales genome included in the analysis. For 9% of the treeable ORFs, the sister gene originated from within its own genome.Open in a separate windowFIG. 2.Distribution of Thermosipho africanus sister taxon or clade in 1,578 phylogenetic trees for potentially protein-coding ORFs. “Other group” means that the organism(s) in the sister group belonged to a taxonomic group that was not Thermotogales, Firmicutes, or Archaea. “Complex” means that the sister clade was composed of organisms from several different taxonomic groups, and “complex including Thermotogales” means that another Thermotogales sequence was included in this clade.The phylogenetic analysis revealed that 58 ORFs (3.7%) had Archaea as immediate sister in the tree. This is considerably lower than the 24% first reported for the Thermotoga maritima genome (16). A lower value was to be expected, for two reasons. First, growth of the bacterial gene and genome data has outpaced that for Archaea, so that bacterial best hits to patchily distributed genes with ambiguous phylogenetic signals have become differentially more likely. Second, the Thermotoga maritima genome will itself be sister for all or most Thermosipho africanus genes that were transferred prior to their divergence and are still present in both.We therefore visually inspected each of the trees in order to also obtain information on LGT that predate the split between Thermosipho and Thermotoga (see Fig. S2 in the supplemental material). This also allowed us to detect transfers where the genes involved have later been duplicated in the Thermosipho africanus genome (so that the sister in the tree was another Thermosipho africanus gene.) This analysis suggested that a total of 202 ORFs (∼13%) have been involved in LGT with Archaea (including both ancient and recent events). Among these, 125 (∼62%) also involve Thermotoga maritima, while 77 (∼38%) have no close homolog in Thermotoga maritima. This latter number is of course an overestimate of the number of potential recent transfers, as many of the transferred genes might have been lost by Thermotoga maritima MSB8, but these numbers do suggest that LGT between the Thermotogales and the Archaea is a still an ongoing process. Thermophilic Archaea such as members of the genera Archaeoglobus (2) and Thermococcus (3, 14) are among the few other organisms considered to be native to oil reservoirs, the habitat from which this strain was isolated (4). Moreover, a recent reanalysis of the Thermotoga maritima genome reported 11.3% archaeal genes in this genome, consistent with our findings (20).A large proportion of the ORFs have a close phylogenetic relationship with Firmicutes, with 8% of the ORFs having Firmicutes as sister in the tree (Fig. (Fig.2).2). This connection has also been observed earlier in phylogenetic analyses (17, 19, 20). To further investigate this, we performed the same analysis of the trees in which Thermosipho africanus clusters with Firmicutes as we did for Archaea (see Fig. S3 in the supplemental material). In total there are 417 (26%) trees that suggest LGT between these lineages. For 244 (58.5%) of these trees the LGT predated the Thermosipho/Thermotoga split, as there was also a close homolog in Thermotoga maritima MSB8, while there was no close Thermotoga maritima homolog in 173 (41.5%) of the trees. Moreover, Thermotogales and Firmicutes were sisters, rather than nested one within the other, in 62 (3.9%) of the trees. One could interpret this as evidence that these two phyla are indeed sisters or that there has been substantial transfer between them, though the true phylogenetic position of the Thermotogales is elsewhere (likely deeper) in the tree. Alternatively, of course, the notion of a unique “true” phylogenetic position could be questioned.A high level of LGT between Thermotogales and Firmicutes might in any case be expected, since some members of the Firmicutes, e.g., the Thermoanaerobales, frequently cohabit with Thermotogales in natural environments. For instance, Thermotogales and the Firmicutes genera Thermoanaerobacter and Desulfotomaculum are the only bacteria thought to be indigenous to oil reservoirs (4, 12, 18). Moreover, most of the mobile elements found scattered in the Thermosipho africanus genome seem to have recently originated from Firmicutes, further supporting the importance of LGT between these lineages.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号