Evolutionary Dynamics of Clustered Irregularly Interspaced Short Palindromic Repeat Systems in the Ocean Metagenome |
| |
Authors: | Valery A. Sorokin Mikhail S. Gelfand Irena I. Artamonova |
| |
Affiliation: | N. I. Vavilov Institute of General Genetics, Russian Academy of Sciences, ul. Gubkina 3, Moscow 119991,1. Faculty of Bioengineering and Bioinformatics, M. V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow 119992,2. A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetny Pereulok 19, Moscow 127994, Russia3. |
| |
Abstract: | Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation.Prokaryotes are highly diverse (33). One of the explanations of this diversity is the high extinction rate, due to genetic aggression, which leads to the clearance of ecological niches and, as a result, may allow new prokaryotic species to emerge. In the absence of host defense, viral infection of prokaryotic colonies results in colony extinction or the fixation of a fraction of the invader''s genetic material in the host genome, profoundly affecting the life cycle of the host (32). Thus, bacteria and archaea have developed various kinds of defense mechanisms to resist this pressure; the best studied of these mechanisms is restriction-modification systems (4).Along with well-known prokaryotic defense mechanisms, such as rapid evolution of cell receptors or the use of restriction-modification or toxin-antitoxin systems (see, e.g., references 6, 21, and 25), newly discovered clustered regularly interspaced palindromic repeat (CRISPR) systems seem to play an important role in protecting the cell from archaeal virus or bacteriophage assaults (reviewed in reference 36). A typical CRISPR system is a genetic locus comprising CRISPR-associated (cas) genes coding for proteins of several distinct functional classes (8, 19, 29) and a CRISPR cassette. A CRISPR cassette is formed by almost identical direct repeats with an average length of 32 nucleotides (nt), which are separated by similarly sized, unique spacers. A considerable proportion of spacers is similar to known phage or virus sequences, suggesting that the system is involved in antivirus defense (8, 29, 31). This involvement was experimentally demonstrated when a CRISPR system was shown to be essential for cell survival after invasion by foreign DNA (5). The mechanism is thought to be analogous to eukaryotic RNA interference (29), but it has not been characterized in detail yet.CRISPR cassettes retain information that could be used to reveal the evolutionary history of individual systems. First, it has been shown that CRISPR-associated genes could be divided into eight subtypes according to operon organization and gene phylogeny (19). Second, the repeats of different CRISPR cassettes may be similar, which might indicate a common origin of such cassettes. The first attempt to cluster CRISPR cassettes by the similarity of repeat sequences resulted in 12 clusters (27). In that study, the cassettes were obtained by the application of PILER-CR to completely sequenced genomes. Third, pairwise comparison of spacers could also reveal the specific evolutionary history of individual CRISPR cassettes.So far, most large-scale studies of CRISPR systems have been restricted to well-studied organisms with completely sequenced genomes (5, 9, 20, 28, 30). However, the dynamic interaction between viruses or phages and microorganisms in natural environments is of particular interest (2, 10, 15, 23, 35, 38, 40-42). It may be studied using CRISPRs in a metagenome, that is, sequenced DNA fragments collected in one geographical location and therefore representing one ecological niche with all its inhabitants. This approach is interesting for two reasons. First, metagenomic samples provide a common census of coexisting organisms, i.e., in many cases, both the infecting viruses and phages and their victims. Second, most bacteria and archaea from metagenomic samples cannot be cultivated, and hence little is known about their CRISPR systems.To date, three studies have considered host-virus interactions in metagenomes. One study used two thermophilic Synechococcus isolates from microbial mats in hot springs at Yellowstone National Park to demonstrate fast coevolution of the host and phage genomes (22). Two studies described archaeal and bacterial interactions with viruses and phages, respectively, in acidophilic biofilms (2, 39). All environmental communities analyzed so far are extreme and are dominated by few species. Natural samples containing many diverse coexisting organisms may arguably be more interesting.The largest available metagenome, produced by the Sorcerer II Global Ocean Sampling (GOS) expedition, comprises samples of genetic material collected from more than 50 geographical locations of the Pacific and Atlantic oceans (34). This variety provides an opportunity to study the evolution of phage-host interactions reflected in CRISPRs.Three algorithms, PILER-CR (14), the CRISPR recognition tool (CRT) (7), and CRISPRFinder (18), have been developed as tools for the discovery of new CRISPR cassettes. All these algorithms define candidate CRISPR cassette sequences as short direct repeats separated by short unique spacers; they then use a variety of standard repeat-finding techniques. However, the implementation of specific details is different.PILER-CR constructs local alignments of the input sequence to itself; each hit between two close regions is a candidate for an alignment of a repeat with its neighbor copy. In terms of dynamic programming, taking into account the repeat structure of a CRISPR cassette implies looking for hits only within a relatively narrow band around the main diagonal of the dot plot. This process is followed by several refinement steps.CRT does not use alignments to identify candidate repeats; rather, it derives them directly from the analysis of an input sequence. It is based on finding series of short repeats of a specified length (searching for exact k-mer matches) and then extending these repeats (increasing k-mer length) while allowing for a certain level of mismatches.Finally, CRISPRFinder is based on a suffix-tree-based algorithm for repeat discovery, again with additional refinement.All three algorithms were used for the CRISPR cassette search in this study. |
| |
Keywords: | |
|
|