首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Bacteria and archaea develop immunity against invading genomes by incorporating pieces of the invaders'' sequences, called spacers, into a clustered regularly interspaced short palindromic repeats (CRISPR) locus between repeats, forming arrays of repeat-spacer units. When spacers are expressed, they direct CRISPR-associated (Cas) proteins to silence complementary invading DNA. In order to characterize the invaders of human microbiomes, we use spacers from CRISPR arrays that we had previously assembled from shotgun metagenomic datasets, and identify contigs that contain these spacers'' targets.

Results

We discover 95,000 contigs that are putative invasive mobile genetic elements, some targeted by hundreds of CRISPR spacers. We find that oral sites in healthy human populations have a much greater variety of mobile genetic elements than stool samples. Mobile genetic elements carry genes encoding diverse functions: only 7% of the mobile genetic elements are similar to known phages or plasmids, although a much greater proportion contain phage- or plasmid-related genes. A small number of contigs share similarity with known integrative and conjugative elements, providing the first examples of CRISPR defenses against this class of element. We provide detailed analyses of a few large mobile genetic elements of various types, and a relative abundance analysis of mobile genetic elements and putative hosts, exploring the dynamic activities of mobile genetic elements in human microbiomes. A joint analysis of mobile genetic elements and CRISPRs shows that protospacer-adjacent motifs drive their interaction network; however, some CRISPR-Cas systems target mobile genetic elements lacking motifs.

Conclusions

We identify a large collection of invasive mobile genetic elements in human microbiomes, an important resource for further study of the interaction between the CRISPR-Cas immune system and invaders.  相似文献   

2.

Background

CRISPR has been becoming a hot topic as a powerful technique for genome editing for human and other higher organisms. The original CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats coupled with CRISPR-associated proteins) is an important adaptive defence system for prokaryotes that provides resistance against invading elements such as viruses and plasmids. A CRISPR cassette contains short nucleotide sequences called spacers. These unique regions retain a history of the interactions between prokaryotes and their invaders in individual strains and ecosystems. One important ecosystem in the human body is the human gut, a rich habitat populated by a great diversity of microorganisms. Gut microbiomes are important for human physiology and health. Metagenome sequencing has been widely applied for studying the gut microbiomes. Most efforts in metagenome study has been focused on profiling taxa compositions and gene catalogues and identifying their associations with human health. Less attention has been paid to the analysis of the ecosystems of microbiomes themselves especially their CRISPR composition.

Results

We conducted a preliminary analysis of CRISPR sequences in a human gut metagenomic data set of Chinese individuals of type-2 diabetes patients and healthy controls. Applying an available CRISPR-identification algorithm, PILER-CR, we identified 3169 CRISPR cassettes in the data, from which we constructed a set of 1302 unique repeat sequences and 36,709 spacers. A more extensive analysis was made for the CRISPR repeats: these repeats were submitted to a more comprehensive clustering and classification using the web server tool CRISPRmap. All repeats were compared with known CRISPRs in the database CRISPRdb. A total of 784 repeats had matches in the database, and the remaining 518 repeats from our set are potentially novel ones.

Conclusions

The computational analysis of CRISPR composition based contigs of metagenome sequencing data is feasible. It provides an efficient approach for finding potential novel CRISPR arrays and for analysing the ecosystem and history of human microbiomes.
  相似文献   

3.
4.
5.
6.

Background

The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.

Results

We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.

Conclusions

We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes.

Results

We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected.

Conclusions

Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-989) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

Metaviriomes, the viral genomes present in an environment, have been studied by direct sequencing of the viral DNA or by cloning in small insert libraries. The short reads generated by both approaches make it very difficult to assemble and annotate such flexible genomic entities. Many environmental viruses belong to unknown groups or prey on uncultured and little known cellular lineages, and hence might not be present in databases.

Methodology and Principal Findings

Here we have used a different approach, the cloning of viral DNA into fosmids before sequencing, to obtain natural contigs that are close to the size of a viral genome. We have studied a relatively low diversity extreme environment: saturated NaCl brines, which simplifies the analysis and interpretation of the data. Forty-two different viral genomes were retrieved, and some of these were almost complete, and could be tentatively identified as head-tail phages (Caudovirales).

Conclusions and Significance

We found a cluster of phage genomes that most likely infect Haloquadratum walsbyi, the square archaeon and major component of the community in these hypersaline habitats. The identity of the prey could be confirmed by the presence of CRISPR spacer sequences shared by the virus and one of the available strain genomes. Other viral clusters detected appeared to prey on the Nanohaloarchaea and on the bacterium Salinibacter ruber, covering most of the diversity of microbes found in this type of environment. This approach appears then as a viable alternative to describe metaviriomes in a much more detailed and reliable way than by the more common approaches based on direct sequencing. An example of transfer of a CRISPR cluster including repeats and spacers was accidentally found supporting the dynamic nature and frequent transfer of this peculiar prokaryotic mechanism of cell protection.  相似文献   

9.
Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.  相似文献   

10.

Background

Metagenomics is a relatively new but fast growing field within environmental biology and medical sciences. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Traditional methods in genomics and microbiology are not efficient in capturing the structure of the microbial community in an environment. Nowadays, high-throughput next-generation sequencing technologies are powerfully driving the metagenomic studies. However, there is an urgent need to develop efficient statistical methods and computational algorithms to rapidly analyze the massive metagenomic short sequencing data and to accurately detect the features/functions present in the microbial community. Although several issues about functions of metagenomes at pathways or subsystems level have been investigated, there is a lack of studies focusing on functional analysis at a low level of a hierarchical functional tree, such as SEED subsystem tree.

Results

A two-step statistical procedure (metaFunction) is proposed to detect all possible functional roles at the low level from a metagenomic sample/community. In the first step a statistical mixture model is proposed at the base of gene codons to estimate the abundances for the candidate functional roles, with sequencing error being considered. As a gene could be involved in multiple biological processes the functional assignment is therefore adjusted by utilizing an error distribution in the second step. The performance of the proposed procedure is evaluated through comprehensive simulation studies. Compared with other existing methods in metagenomic functional analysis the new approach is more accurate in assigning reads to functional roles, and therefore at more general levels. The method is also employed to analyze two real data sets.

Conclusions

metaFunction is a powerful tool in accurate profiling functions in a metagenomic sample.  相似文献   

11.

Background

Yersinia pestis, the pathogen of plague, has greatly influenced human history on a global scale. Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR), an element participating in immunity against phages'' invasion, is composed of short repeated sequences separated by unique spacers and provides the basis of the spoligotyping technology. In the present research, three CRISPR loci were analyzed in 125 strains of Y. pestis from 26 natural plague foci of China, the former Soviet Union and Mongolia were analyzed, for validating CRISPR-based genotyping method and better understanding adaptive microevolution of Y. pestis.

Methodology/Principal Findings

Using PCR amplification, sequencing and online data processing, a high degree of genetic diversity was revealed in all three CRISPR elements. The distribution of spacers and their arrays in Y. pestis strains is strongly region and focus-specific, allowing the construction of a hypothetic evolutionary model of Y. pestis. This model suggests transmission route of microtus strains that encircled Takla Makan Desert and ZhunGer Basin. Starting from Tadjikistan, one branch passed through the Kunlun Mountains, and moved to the Qinghai-Tibet Plateau. Another branch went north via the Pamirs Plateau, the Tianshan Mountains, the Altai Mountains and the Inner Mongolian Plateau. Other Y. pestis lineages might be originated from certain areas along those routes.

Conclusions/significance

CRISPR can provide important information for genotyping and evolutionary research of bacteria, which will help to trace the source of outbreaks. The resulting data will make possible the development of very low cost and high-resolution assays for the systematic typing of any new isolate.  相似文献   

12.
13.

Background

In the honeybee Apis mellifera, the bacterial gut community is consistently colonized by eight distinct phylotypes of bacteria. Managed bee colonies are of considerable economic interest and it is therefore important to elucidate the diversity and role of this microbiota in the honeybee. In this study, we have sequenced the genomes of eleven strains of lactobacilli and bifidobacteria isolated from the honey crop of the honeybee A. mellifera.

Results

Single gene phylogenies confirmed that the isolated strains represent the diversity of lactobacilli and bifidobacteria in the gut, as previously identified by 16S rRNA gene sequencing. Core genome phylogenies of the lactobacilli and bifidobacteria further indicated extensive divergence between strains classified as the same phylotype. Phylotype-specific protein families included unique surface proteins. Within phylotypes, we found a remarkably high level of gene content diversity. Carbohydrate metabolism and transport functions contributed up to 45% of the accessory genes, with some genomes having a higher content of genes encoding phosphotransferase systems for the uptake of carbohydrates than any previously sequenced genome. These genes were often located in highly variable genomic segments that also contained genes for enzymes involved in the degradation and modification of sugar residues. Strain-specific gene clusters for the biosynthesis of exopolysaccharides were identified in two phylotypes. The dynamics of these segments contrasted with low recombination frequencies and conserved gene order structures for the core genes. Hits for CRISPR spacers were almost exclusively found within phylotypes, suggesting that the phylotypes are associated with distinct phage populations.

Conclusions

The honeybee gut microbiota has been described as consisting of a modest number of phylotypes; however, the genomes sequenced in the current study demonstrated a very high level of gene content diversity within all three described phylotypes of lactobacilli and bifidobacteria, particularly in terms of metabolic functions and surface structures, where many features were strain-specific. Together, these results indicate niche differentiation within phylotypes, suggesting that the honeybee gut microbiota is more complex than previously thought.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1476-6) contains supplementary material, which is available to authorized users.  相似文献   

14.
Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction.  相似文献   

15.

SUMMARY

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) genes are present in many bacterial and archaeal genomes. Since the discovery of the typical CRISPR loci in the 1980s, well before their physiological role was revealed, their variable sequences have been used as a complementary typing tool in diagnostic, epidemiologic, and evolutionary analyses of prokaryotic strains. The discovery that CRISPR spacers are often identical to sequence fragments of mobile genetic elements was a major breakthrough that eventually led to the elucidation of CRISPR-Cas as an adaptive immunity system. Key elements of this unique prokaryotic defense system are small CRISPR RNAs that guide nucleases to complementary target nucleic acids of invading viruses and plasmids, generally followed by the degradation of the invader. In addition, several recent studies have pointed at direct links of CRISPR-Cas to regulation of a range of stress-related phenomena. An interesting example concerns a pathogenic bacterium that possesses a CRISPR-associated ribonucleoprotein complex that may play a dual role in defense and/or virulence. In this review, we describe recently reported cases of potential involvement of CRISPR-Cas systems in bacterial stress responses in general and bacterial virulence in particular.  相似文献   

16.

Background

Marine ecosystem function is largely determined by matter and energy transformations mediated by microbial community interaction networks. Viral infection modulates network properties through mortality, gene transfer and metabolic reprogramming.

Results

Here we explore the nature and extent of viral metabolic reprogramming throughout the Pacific Ocean depth continuum. We describe 35 marine viral gene families with potential to reprogram metabolic flux through central metabolic pathways recovered from Pacific Ocean waters. Four of these families have been previously reported but 31 are novel. These known and new carbon pathway auxiliary metabolic genes were recovered from a total of 22 viral metagenomes in which viral auxiliary metabolic genes were differentiated from low-level cellular DNA inputs based on small subunit ribosomal RNA gene content, taxonomy, fragment recruitment and genomic context information. Auxiliary metabolic gene distribution patterns reveal that marine viruses target overlapping, but relatively distinct pathways in sunlit and dark ocean waters to redirect host carbon flux towards energy production and viral genome replication under low nutrient, niche-differentiated conditions throughout the depth continuum.

Conclusions

Given half of ocean microbes are infected by viruses at any given time, these findings of broad viral metabolic reprogramming suggest the need for renewed consideration of viruses in global ocean carbon models.  相似文献   

17.

Background/Aim

The human intestinal microbiota plays an important role in modulation of mucosal immune responses. To study interactions between intestinal epithelial cells (IECs) and commensal bacteria, a functional metagenomic approach was developed. One interest of metagenomics is to provide access to genomes of uncultured microbes. We aimed at identifying bacterial genes involved in regulation of NF-κB signaling in IECs. A high throughput cell-based screening assay allowing rapid detection of NF-κB modulation in IECs was established using the reporter-gene strategy to screen metagenomic libraries issued from the human intestinal microbiota.

Methods

A plasmid containing the secreted alkaline phosphatase (SEAP) gene under the control of NF-κB binding elements was stably transfected in HT-29 cells. The reporter clone HT-29/kb-seap-25 was selected and characterized. Then, a first screening of a metagenomic library from Crohn''s disease patients was performed to identify NF-κB modulating clones. Furthermore, genes potentially involved in the effect of one stimulatory metagenomic clone were determined by sequence analysis associated to mutagenesis by transposition.

Results

The two proinflammatory cytokines, TNF-α and IL-1β, were able to activate the reporter system, translating the activation of the NF-κB signaling pathway and NF-κB inhibitors, BAY 11-7082, caffeic acid phenethyl ester and MG132 were efficient. A screening of 2640 metagenomic clones led to the identification of 171 modulating clones. Among them, one stimulatory metagenomic clone, 52B7, was further characterized. Sequence analysis revealed that its metagenomic DNA insert might belong to a new Bacteroides strain and we identified 2 loci encoding an ABC transport system and a putative lipoprotein potentially involved in 52B7 effect on NF-κB.

Conclusions

We have established a robust high throughput screening assay for metagenomic libraries derived from the human intestinal microbiota to study bacteria-driven NF-κB regulation. This opens a strategic path toward the identification of bacterial strains and molecular patterns presenting a potential therapeutic interest.  相似文献   

18.

Background

The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host and the diet adopted by the host. Although the importance of gut microbiota of humans has been well demonstrated, there is a paucity of research regarding non-human primates (NHPs), especially herbivorous NHPs.

Results

In this study, an analysis of 97,942 pyrosequencing reads generated from Rhinopithecus bieti fecal DNA extracts was performed to help better understanding of the microbial diversity and functional capacity of the R. bieti gut microbiome. The taxonomic analysis of the metagenomic reads indicated that R. bieti fecal microbiomes were dominated by Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria phyla. The comparative analysis of taxonomic classification revealed that the metagenome of R. bieti was characterized by an overrepresentation of bacteria of phylum Fibrobacteres and Spirochaetes as compared with other animals. Primary functional categories were associated mainly with protein, carbohydrates, amino acids, DNA and RNA metabolism, cofactors, cell wall and capsule and membrane transport. Comparing glycoside hydrolase profiles of R. bieti with those of other animal revealed that the R. bieti microbiome was most closely related to cow rumen.

Conclusions

Metagenomic and functional analysis demonstrated that R. bieti possesses a broad diversity of bacteria and numerous glycoside hydrolases responsible for lignocellulosic biomass degradation which might reflect the adaptations associated with a diet rich in fibrous matter. These results would contribute to the limited body of NHPs metagenome studies and provide a unique genetic resource of plant cell wall degrading microbial enzymes. However, future studies on the metagenome sequencing of R. bieti regarding the effects of age, genetics, diet and environment on the composition and activity of the metagenomes are required.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1378-7) contains supplementary material, which is available to authorized users.  相似文献   

19.

Motivation

Carbohydrate Active enzyme (CAZyme) families, encoded by human gut microflora, play a crucial role in breakdown of complex dietary carbohydrates into components that can be absorbed by our intestinal epithelium. Since nutritional wellbeing of an individual is dependent on the nutrient harvesting capability of the gut microbiome, it is important to understand how CAZyme repertoire in the gut is influenced by factors like age, geography and food habits.

Results

This study reports a comprehensive in-silico analysis of CAZyme profiles in the gut microbiomes of 448 individuals belonging to different geographies, using similarity searches of the corresponding gut metagenomic contigs against the carbohydrate active enzymes database. The study identifies a core group of 89 CAZyme families that are present across 85% of the gut microbiomes. The study detects several geography/age-specific trends in gut CAZyme repertoires of the individuals. Notably, a group of CAZymes having a positive correlation with BMI has been identified. Further this group of BMI-associated CAZymes is observed to be specifically abundant in the Firmicutes phyla. One of the major findings from this study is identification of three distinct groups of individuals, referred to as ''CAZotypes'', having similar CAZyme profiles. Distinct taxonomic drivers for these CAZotypes as well as the probable dietary basis for such trends have also been elucidated. The results of this study provide a global view of CAZyme profiles across individuals of various geographies and age-groups. These results re-iterate the need of a more precise understanding of the role of carbohydrate active enzymes in human nutrition.  相似文献   

20.
Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation.Prokaryotes are highly diverse (33). One of the explanations of this diversity is the high extinction rate, due to genetic aggression, which leads to the clearance of ecological niches and, as a result, may allow new prokaryotic species to emerge. In the absence of host defense, viral infection of prokaryotic colonies results in colony extinction or the fixation of a fraction of the invader''s genetic material in the host genome, profoundly affecting the life cycle of the host (32). Thus, bacteria and archaea have developed various kinds of defense mechanisms to resist this pressure; the best studied of these mechanisms is restriction-modification systems (4).Along with well-known prokaryotic defense mechanisms, such as rapid evolution of cell receptors or the use of restriction-modification or toxin-antitoxin systems (see, e.g., references 6, 21, and 25), newly discovered clustered regularly interspaced palindromic repeat (CRISPR) systems seem to play an important role in protecting the cell from archaeal virus or bacteriophage assaults (reviewed in reference 36). A typical CRISPR system is a genetic locus comprising CRISPR-associated (cas) genes coding for proteins of several distinct functional classes (8, 19, 29) and a CRISPR cassette. A CRISPR cassette is formed by almost identical direct repeats with an average length of 32 nucleotides (nt), which are separated by similarly sized, unique spacers. A considerable proportion of spacers is similar to known phage or virus sequences, suggesting that the system is involved in antivirus defense (8, 29, 31). This involvement was experimentally demonstrated when a CRISPR system was shown to be essential for cell survival after invasion by foreign DNA (5). The mechanism is thought to be analogous to eukaryotic RNA interference (29), but it has not been characterized in detail yet.CRISPR cassettes retain information that could be used to reveal the evolutionary history of individual systems. First, it has been shown that CRISPR-associated genes could be divided into eight subtypes according to operon organization and gene phylogeny (19). Second, the repeats of different CRISPR cassettes may be similar, which might indicate a common origin of such cassettes. The first attempt to cluster CRISPR cassettes by the similarity of repeat sequences resulted in 12 clusters (27). In that study, the cassettes were obtained by the application of PILER-CR to completely sequenced genomes. Third, pairwise comparison of spacers could also reveal the specific evolutionary history of individual CRISPR cassettes.So far, most large-scale studies of CRISPR systems have been restricted to well-studied organisms with completely sequenced genomes (5, 9, 20, 28, 30). However, the dynamic interaction between viruses or phages and microorganisms in natural environments is of particular interest (2, 10, 15, 23, 35, 38, 40-42). It may be studied using CRISPRs in a metagenome, that is, sequenced DNA fragments collected in one geographical location and therefore representing one ecological niche with all its inhabitants. This approach is interesting for two reasons. First, metagenomic samples provide a common census of coexisting organisms, i.e., in many cases, both the infecting viruses and phages and their victims. Second, most bacteria and archaea from metagenomic samples cannot be cultivated, and hence little is known about their CRISPR systems.To date, three studies have considered host-virus interactions in metagenomes. One study used two thermophilic Synechococcus isolates from microbial mats in hot springs at Yellowstone National Park to demonstrate fast coevolution of the host and phage genomes (22). Two studies described archaeal and bacterial interactions with viruses and phages, respectively, in acidophilic biofilms (2, 39). All environmental communities analyzed so far are extreme and are dominated by few species. Natural samples containing many diverse coexisting organisms may arguably be more interesting.The largest available metagenome, produced by the Sorcerer II Global Ocean Sampling (GOS) expedition, comprises samples of genetic material collected from more than 50 geographical locations of the Pacific and Atlantic oceans (34). This variety provides an opportunity to study the evolution of phage-host interactions reflected in CRISPRs.Three algorithms, PILER-CR (14), the CRISPR recognition tool (CRT) (7), and CRISPRFinder (18), have been developed as tools for the discovery of new CRISPR cassettes. All these algorithms define candidate CRISPR cassette sequences as short direct repeats separated by short unique spacers; they then use a variety of standard repeat-finding techniques. However, the implementation of specific details is different.PILER-CR constructs local alignments of the input sequence to itself; each hit between two close regions is a candidate for an alignment of a repeat with its neighbor copy. In terms of dynamic programming, taking into account the repeat structure of a CRISPR cassette implies looking for hits only within a relatively narrow band around the main diagonal of the dot plot. This process is followed by several refinement steps.CRT does not use alignments to identify candidate repeats; rather, it derives them directly from the analysis of an input sequence. It is based on finding series of short repeats of a specified length (searching for exact k-mer matches) and then extending these repeats (increasing k-mer length) while allowing for a certain level of mismatches.Finally, CRISPRFinder is based on a suffix-tree-based algorithm for repeat discovery, again with additional refinement.All three algorithms were used for the CRISPR cassette search in this study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号