首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent studies have highlighted the surprising richness of soil bacterial communities; however, bacteria are not the only microorganisms found in soil. To our knowledge, no study has compared the diversities of the four major microbial taxa, i.e., bacteria, archaea, fungi, and viruses, from an individual soil sample. We used metagenomic and small-subunit RNA-based sequence analysis techniques to compare the estimated richness and evenness of these groups in prairie, desert, and rainforest soils. By grouping sequences at the 97% sequence similarity level (an operational taxonomic unit [OTU]), we found that the archaeal and fungal communities were consistently less even than the bacterial communities. Although total richness levels are difficult to estimate with a high degree of certainty, the estimated number of unique archaeal or fungal OTUs appears to rival or exceed the number of unique bacterial OTUs in each of the collected soils. In this first study to comprehensively survey viral communities using a metagenomic approach, we found that soil viruses are taxonomically diverse and distinct from the communities of viruses found in other environments that have been surveyed using a similar approach. Within each of the four microbial groups, we observed minimal taxonomic overlap between sites, suggesting that soil archaea, bacteria, fungi, and viruses are globally as well as locally diverse.  相似文献   

2.
Recent studies have highlighted the surprising richness of soil bacterial communities; however, bacteria are not the only microorganisms found in soil. To our knowledge, no study has compared the diversities of the four major microbial taxa, i.e., bacteria, archaea, fungi, and viruses, from an individual soil sample. We used metagenomic and small-subunit RNA-based sequence analysis techniques to compare the estimated richness and evenness of these groups in prairie, desert, and rainforest soils. By grouping sequences at the 97% sequence similarity level (an operational taxonomic unit [OTU]), we found that the archaeal and fungal communities were consistently less even than the bacterial communities. Although total richness levels are difficult to estimate with a high degree of certainty, the estimated number of unique archaeal or fungal OTUs appears to rival or exceed the number of unique bacterial OTUs in each of the collected soils. In this first study to comprehensively survey viral communities using a metagenomic approach, we found that soil viruses are taxonomically diverse and distinct from the communities of viruses found in other environments that have been surveyed using a similar approach. Within each of the four microbial groups, we observed minimal taxonomic overlap between sites, suggesting that soil archaea, bacteria, fungi, and viruses are globally as well as locally diverse.  相似文献   

3.
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.  相似文献   

4.
One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification.  相似文献   

5.
Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associated metagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/.  相似文献   

6.
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut‐offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high‐throughput environmental sequencing. This method provides rank‐flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast ‐based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave‐one‐out cross‐validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut‐offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.  相似文献   

7.
In this study, we report on first 16S rRNA gene sequences from highly saline brine sediments taken at a depth of 1,515 m in the Kebrit Deep, northern Red Sea. Microbial DNA extracted directly from the sediments was subjected to PCR amplification with primers specific for bacterial and archaeal 16S rRNA gene sequences. The PCR products were cloned, and a total of 11 (6 bacterial and 5 archaeal) clone types were determined by restriction endonuclease digestion. Phylogenetic analysis revealed that most of the cloned sequences were unique, showing no close association with sequences of cultivated organisms or sequences derived from environmental samples. The bacterial clone sequences form a novel phylogenetic lineage (KB1 group) that branches between the Aquificales and the Thermotogales. The archaeal clone sequences group within the Euryarchaeota. Some of the sequences cluster with the group II and group III uncultivated archaea sequence clones, while two clone groups form separate branches. Our results suggest that hitherto unknown archaea and bacteria may thrive in highly saline brines of the Red Sea under extreme environmental conditions. Received: 5 February 1999 / Accepted: 14 July 1999  相似文献   

8.
9.
There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce  相似文献   

10.
? The internal transcribed spacer (ITS) of the nuclear ribosomal DNA region is a widely used species marker for plants and fungi. Recent metagenomic studies using next-generation sequencing, however, generate only partial ITS sequences. Here we compare the performance of partial and full-length ITS sequences with several classification methods. ? We compiled a full-length ITS data set and created short fragments to simulate the read lengths commonly recovered from current next-generation sequencing platforms. We compared recovery, erroneous recovery, and coverage for the following methods: best BLAST hit classification, MEGAN classification, and automated phylogenetic assignment using the Statistical Assignment Program (SAP). ? We found that summarizing results with more inclusive taxonomic ranks increased recovery and reduced erroneous recovery. The similarity-based methods BLAST and MEGAN performed consistently across most fragment lengths. Using a phylogeny-based method, SAP runs with queries 400 bp or longer worked best. Overall, BLAST had the highest recovery rates and MEGAN had the lowest erroneous recovery rates. ? A high-throughput ITS classification method should be selected, taking into consideration read length, an acceptable tradeoff between maximizing the total number of classifications and minimizing the number of erroneous classifications, and the computational speed of the assignment method.  相似文献   

11.
Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases.  相似文献   

12.
Archaea-specific radA primers were used with PCR to amplify fragments of radA genes from 11 cultivated archaeal species and one marine sponge tissue sample that contained essentially an archaeal monoculture. The amino acid sequences encoded by the PCR fragments, three RadA protein sequences previously published (21), and two new complete RadA sequences were aligned with representative bacterial RecA proteins and eucaryal Rad51 and Dmc1 proteins. The alignment supported the existence of four insertions and one deletion in the archaeal and eucaryal sequences relative to the bacterial sequences. The sizes of three of the insertions were found to have taxonomic and phylogenetic significance. Comparative analysis of the RadA sequences, omitting amino acids in the insertions and deletions, shows a cladal distribution of species which mimics to a large extent that obtained by a similar analysis of archaeal 16S rRNA sequences. The PCR technique also was used to amplify fragments of 15 radA genes from uncultured natural sources. Phylogenetic analysis of the amino acid sequences encoded by these fragments reveals several clades with affinity, sometimes only distant, to the putative RadA proteins of several species of Crenarcheota. The two most deeply branching archaeal radA genes found had some amino acid deletion and insertion patterns characteristic of bacterial recA genes. Possible explanations are discussed. Finally, signature codons are presented to distinguish among RecA protein family members.  相似文献   

13.
Insertion sequences (ISs) can constitute an important component of prokaryotic (bacterial and archaeal) genomes. Over 1,500 individual ISs are included at present in the ISfinder database (www-is.biotoul.fr), and these represent only a small portion of those in the available prokaryotic genome sequences and those that are being discovered in ongoing sequencing projects. In spite of this diversity, the transposition mechanisms of only a few of these ubiquitous mobile genetic elements are known, and these are all restricted to those present in bacteria. This review presents an overview of ISs within the archaeal kingdom. We first provide a general historical summary of the known properties and behaviors of archaeal ISs. We then consider how transposition might be regulated in some cases by small antisense RNAs and by termination codon readthrough. This is followed by an extensive analysis of the IS content in the sequenced archaeal genomes present in the public databases as of June 2006, which provides an overview of their distribution among the major archaeal classes and species. We show that the diversity of archaeal ISs is very great and comparable to that of bacteria. We compare archaeal ISs to known bacterial ISs and find that most are clearly members of families first described for bacteria. Several cases of lateral gene transfer between bacteria and archaea are clearly documented, notably for methanogenic archaea. However, several archaeal ISs do not have bacterial equivalents but can be grouped into Archaea-specific groups or families. In addition to ISs, we identify and list nonautonomous IS-derived elements, such as miniature inverted-repeat transposable elements. Finally, we present a possible scenario for the evolutionary history of ISs in the Archaea.  相似文献   

14.
We investigated the influence of environmental parameters and spatial distance on bacterial, archaeal and viral community composition from 13 sites along a 3200-km long voyage from Halifax to Kugluktuk (Canada) through the Labrador Sea, Baffin Bay and the Arctic Archipelago. Variation partitioning was used to disentangle the effects of environmental parameters, spatial distance and spatially correlated environmental parameters on prokaryotic and viral communities. Viral and prokaryotic community composition were related in the Labrador Sea, but were independent of each other in Baffin Bay and the Arctic Archipelago. In oceans, the dominant dispersal mechanism for prokaryotes and viruses is the movement of water masses, thus, dispersal for both groups is passive and similar. Nevertheless, spatial distance explained 7–19% of the variation in viral community composition in the Arctic Archipelago, but was not a significant predictor of bacterial or archaeal community composition in either sampling area, suggesting a decoupling of the processes regulating community composition within these taxonomic groups. According to the metacommunity theory, patterns in bacterial and archaeal community composition suggest a role for species sorting, while patterns of virus community composition are consistent with species sorting in the Labrador Sea and suggest a potential role of mass effects in the Arctic Archipelago. Given that, a specific prokaryotic taxon may be infected by multiple viruses with high reproductive potential, our results suggest that viral community composition was subject to a high turnover relative to prokaryotic community composition in the Arctic Archipelago.  相似文献   

15.
Next‐generation sequencing has dramatically changed the landscape of microbial ecology, large‐scale and in‐depth diversity studies being now widely accessible. However, determining the accuracy of taxonomic and quantitative inferences and comparing results obtained with different approaches are complicated by incongruence of experimental and computational data types and also by lack of knowledge of the true ecological diversity. Here we used highly diverse bacterial and archaeal synthetic communities assembled from pure genomic DNAs to compare inferences from metagenomic and SSU rRNA amplicon sequencing. Both Illumina and 454 metagenomic data outperformed amplicon sequencing in quantifying the community composition, but the outcome was dependent on analysis parameters and platform. New approaches in processing and classifying amplicons can reconstruct the taxonomic composition of the community with high reproducibility within primer sets, but all tested primers sets lead to significant taxon‐specific biases. Controlled synthetic communities assembled to broadly mimic the phylogenetic richness in target environments can provide important validation for fine‐tuning experimental and computational parameters used to characterize natural communities.  相似文献   

16.
As an evolutionary marker, 23S ribosomal RNA (rRNA) offers more diagnostic sequence stretches and greater sequence variation than 16S rRNA. However, 23S rRNA is still not as widely used. Based on 80 metagenome samples from the Global Ocean Sampling (GOS) Expedition, the usefulness and taxonomic resolution of 23S rRNA were compared to those of 16S rRNA. Since 23S rRNA is approximately twice as large as 16S rRNA, twice as many 23S rRNA gene fragments were retrieved from the GOS reads than 16S rRNA gene fragments, with 23S rRNA gene fragments being generally about 100 bp longer. Datasets for 16S and 23S rRNA sequences revealed similar relative abundances for major marine bacterial and archaeal taxa. However, 16S rRNA sequences had a better taxonomic resolution due to their significantly larger reference database.Reevaluation of the specificity of previously published PCR amplification primers and group specific fluorescence in situ hybridization probes on this metagenomic set of non-amplified 23S rRNA sequences revealed that out of 16 primers investigated, only two had more than 90% target group coverage. Evaluations of two probes, BET42a and GAM42a, were in accordance with previous evaluations, with a discrepancy in the target group coverage of the GAM42a probe when evaluated against the GOS metagenomic dataset.  相似文献   

17.
Metagenomic Characterization of Chesapeake Bay Virioplankton   总被引:7,自引:1,他引:6       下载免费PDF全文
Viruses are ubiquitous and abundant throughout the biosphere. In marine systems, virus-mediated processes can have significant impacts on microbial diversity and on global biogeocehmical cycling. However, viral genetic diversity remains poorly characterized. To address this shortcoming, a metagenomic library was constructed from Chesapeake Bay virioplankton. The resulting sequences constitute the largest collection of long-read double-stranded DNA (dsDNA) viral metagenome data reported to date. BLAST homology comparisons showed that Chesapeake Bay virioplankton contained a high proportion of unknown (homologous only to environmental sequences) and novel (no significant homolog) sequences. This analysis suggests that dsDNA viruses are likely one of the largest reservoirs of unknown genetic diversity in the biosphere. The taxonomic origin of BLAST homologs to viral library sequences agreed well with reported abundances of cooccurring bacterial subphyla within the estuary and indicated that cyanophages were abundant. However, the low proportion of Siphophage homologs contradicts a previous assertion that this family comprises most bacteriophage diversity. Identification and analyses of cyanobacterial homologs of the psbA gene illustrated the value of metagenomic studies of virioplankton. The phylogeny of inferred PsbA protein sequences suggested that Chesapeake Bay cyanophage strains are endemic in that environment. The ratio of psbA homologous sequences to total cyanophage sequences in the metagenome indicated that the psbA gene may be nearly universal in Chesapeake Bay cyanophage genomes. Furthermore, the low frequency of psbD homologs in the library supports the prediction that Chesapeake Bay cyanophage populations are dominated by Podoviridae.  相似文献   

18.
The molecular evolution of the V6 and V9 domains of the mitochondrial SSU-rDNA was investigated to evaluate the use of these sequences for DNA barcodes in the Basidiomycota division. The PCR products from 27 isolates belonging to 11 Tricholoma species were sequenced. Both domains in the isolates belonging to the same species had identical sequences. All the species possess distinctive V9 sequences due to point mutations and insertion/deletion events. Secondary structures revealed that the insertion-deletion events occurred in regions not directly involved in the maintenance of the standard SSU-rRNA structure. The inserted sequences possess conserved motifs that enable their alignment among phylogenetically distant species. Hence, the V9 domain by displaying identical sequences within species, an adequate divergence level, easy amplification, and alignment represents an alternative molecular marker for the Basidiomycota division and opens the way for this sequence to be used as specific molecular markers of the fungal kingdom.  相似文献   

19.
Protein sequences with similarities to Escherichia coli RecA were compared across the major kingdoms of eubacteria, archaebacteria, and eukaryotes. The archaeal sequences branch monophyletically and are most closely related to the eukaryotic paralogous Rad51 and Dmc1 groups. A multiple alignment of the sequences suggests a modular structure of RecA-like proteins consisting of distinct segments, some of which are conserved only within subgroups of sequences. The eukaryotic and archaeal sequences share an N-terminal domain which may play a role in interactions with other factors and nucleic acids. Several positions in the alignment blocks are highly conserved within the eubacteria as one group and within the eukaryotes and archaebacteria as a second group, but compared between the groups these positions display nonconservative amino acid substitutions. Conservation within the RecA-like core domain identifies possible key residues involved in ATP-induced conformational changes. We propose that RecA-like proteins derive evolutionarily from an assortment of independent domains and that the functional homologs of RecA in noneubacteria comprise an array of RecA-like proteins acting in series or cooperatively. Received: 25 October 1996 / Accepted: 31 December 1996  相似文献   

20.
Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号