首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP) as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP) (http://nihroadmap.nih.gov). Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN) pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest open source code are available at http://www.jcvi.org/hmp-metarep.  相似文献   

2.
A data set consisting of DNA sequences from a large-scale shotgun DNA cloning and sequencing project has been collected and posted for public release. The purpose is to propose a standard genomic DNA sequencing data set by which various algorithms and implementations can be tested. This set of data is divided into two subsets, one containing raw DNA sequence data (1023 clones) and the other consisting of the corresponding partially refined or edited DNA sequence data (820 clones). Suggested criteria or guidelines for this data refinement are presented so that algorithms for preprocessing and screening raw sequences may be developed. Development of such preprocessing, screening, aligning, and assembling algorithms will expedite large-scale DNA sequencing projects so that the complete unambiguous consensus DNA sequences will be made available to the general research community in a quicker manner. Smaller scale routine DNA sequencing projects will also be greatly aided by such computational efforts.  相似文献   

3.
Here we demonstrate a cost effective and scalable microbial ecology sequencing platform using the Ion Torrent Personal Genome Machine (PGM). We assessed both PCR amplified 16S rRNA and shotgun metagenomic approaches and generated 100,000+ to 1,000,000+ reads using 'post-light' based sequencing technology within different sized semi-conductor chips. Further development of Golay barcoded Ion Tags allowed multiplex analyses of microbial communities with substantially reduced costs compared with platforms such as 454/GS-FLX. Using these protocols we assessed the bacterial and archaeal dynamics within covered anaerobic digesters used to treat piggery wastes. Analysis of these sequence data showed that these novel methanogenic waste treatment systems are dominated by bacterial taxa, in particular Clostridium, Synergistia and Bacteroides that were maintained as a stable community over extended time periods. Archaeal community dynamics were more stochastic with the key methanogenic taxa more difficult to resolve, principally due to the poor congruence seen between community structures generated either by nested PCR or metagenomic approaches for archaeal analyses. Our results show that for microbial community structure and function analyses, the PGM platform provides a low cost, scalable and high throughput solution for both Tag sequencing and metagenomic analyses.  相似文献   

4.

Background

The 16S rRNA gene-based amplicon sequencing analysis is widely used to determine the taxonomic composition of microbial communities. Once the taxonomic composition of each community is obtained, evolutionary relationships among taxa are inferred by a phylogenetic tree. Thus, the combined representation of taxonomic composition and phylogenetic relationships among taxa is a powerful method for understanding microbial community structure; however, applying phylogenetic tree-based representation with information on the abundance of thousands or more taxa in each community is a difficult task. For this purpose, we previously developed the tool VITCOMIC (VIsualization tool for Taxonomic COmpositions of MIcrobial Community), which is based on the genome-sequenced microbes’ phylogenetic information. Here, we introduce VITCOMIC2, which incorporates substantive improvements over VITCOMIC that were necessary to address several issues associated with 16S rRNA gene-based analysis of microbial communities.

Results

We developed VITCOMIC2 to provide (i) sequence identity searches against broad reference taxa including uncultured taxa; (ii) normalization of 16S rRNA gene copy number differences among taxa; (iii) rapid sequence identity searches by applying the graphics processing unit-based sequence identity search tool CLAST; (iv) accurate taxonomic composition inference and nearly full-length 16S rRNA gene sequence reconstructions for metagenomic shotgun sequencing; and (v) an interactive user interface for simultaneous representation of the taxonomic composition of microbial communities and phylogenetic relationships among taxa. We validated the accuracy of processes (ii) and (iv) by using metagenomic shotgun sequencing data from a mock microbial community.

Conclusions

The improvements incorporated into VITCOMIC2 enable users to acquire an intuitive understanding of microbial community composition based on the 16S rRNA gene sequence data obtained from both metagenomic shotgun and amplicon sequencing.
  相似文献   

5.
Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large data sets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated our metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads and provide the largest metagenomic profiling to date of the human gut. It can be accessed at http://huttenhower.sph.harvard.edu/metaphlan/.  相似文献   

6.
The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.  相似文献   

7.
Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https://github.com/dib-lab/SSUsearch under a BSD license.  相似文献   

8.
DNA sequencing has become an integrated part of microbial ecology, and taxonomic marker genes such as the SSU and LSU rRNA are frequently used to assess community structure. One solution for taxonomic community analysis based on shotgun metagenomic data is the Metaxa2 software, which can extract and classify sequence fragments belonging to the rRNA genes. This paper describes the Metaxa2 Diversity Tools, a set of new open-source software programs that extends the capabilities of the Metaxa2 software. These tools allow for better handling of data from multiple samples, improved species classifications, rarefaction analysis accounting for unclassified entries, and determination of significant differences in community composition of different samples. We demonstrate the performance of the software tools on rRNA data extracted from different shotgun metagenomes, and find the tools to streamline and improve the assessments of community diversity, particularly for samples from environments for which few reference genomes are available. Finally, we establish that our resampling algorithm for determining community dissimilarity is robust to differences in coverage depth, suggesting that it forms a complement to multidimensional visualization approaches for finding differences between communities. The Metaxa2 Diversity Tools are included in recent versions (2.1 and later) of Metaxa2 (http://microbiology.se/software/metaxa2/) and facilitate implementation of Metaxa2 within software pipelines for taxonomic analysis of environmental communities.  相似文献   

9.
ToxoDB: accessing the Toxoplasma gondii genome   总被引:1,自引:0,他引:1  
ToxoDB (http://ToxoDB.org) provides a genome resource for the protozoan parasite Toxoplasma gondii. Several sequencing projects devoted to T. gondii have been completed or are in progress: an EST project (http://genome.wustl.edu/est/index.php?toxoplasma=1), a BAC clone end-sequencing project (http://www.sanger.ac.uk/Projects/T_gondii/) and an 8X random shotgun genomic sequencing project (http://www.tigr.org/tdb/e2k1/tga1/). ToxoDB was designed to provide a central point of access for all available T. gondii data, and a variety of data mining tools useful for the analysis of unfinished, un-annotated draft sequence during the early phases of the genome project. In later stages, as more and different types of data become available (microarray, proteomic, SNP, QTL, etc.) the database will provide an integrated data analysis platform facilitating user-defined queries across the different data types.  相似文献   

10.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?  相似文献   

11.
12.
Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.  相似文献   

13.
As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.  相似文献   

14.
With the current fast accumulation of microbial community samples and related metagenomic sequencing data, data integration and analysis system is urgently needed for in-depth analysis of large number of metagenomic samples (also referred to as “microbial communities”) of interest. Although several existing databases have collected a large number of metagenomic samples, they mostly serve as data repositories with crude annotations, and offer limited functionality for analysis. Moreover, the few available tools for comparative analysis in the literature could only support the comparison of a few pre-defined set of metagenomic samples. To facilitate comprehensive comparative analysis on large amount of diverse microbial community samples, we have designed a Meta-Mesh system for a variety of analyses including quantitative analysis of similarities among microbial communities and computation of the correlation between the meta-information of these samples. We have used Meta-Mesh for systematically and efficiently analyses on diverse sets of human associate-habitat microbial community samples. Results have shown that Meta-Mesh could serve well as an efficient data analysis platform for discovery of clusters, biomarker and other valuable biological information from a large pool of human microbial samples.  相似文献   

15.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

16.
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.  相似文献   

17.
Metagenomic analyses of an uncultured viral community from human feces   总被引:24,自引:0,他引:24       下载免费PDF全文
Here we present the first metagenomic analyses of an uncultured viral community from human feces, using partial shotgun sequencing. Most of the sequences were unrelated to anything previously reported. The recognizable viruses were mostly siphophages, and the community contained an estimated 1,200 viral genotypes.  相似文献   

18.
ABSTRACT: BACKGROUND: Gene prediction algorithms (or gene callers) are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms--FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal--for accuracy on short (75-1000 bp) fragments containing sequence error from previously published artificial data and "real" metagenomic datasets. RESULTS: While gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50%) specificity and overprediction of genes in noncoding regions. CONCLUSIONS: Ab initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs.  相似文献   

19.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号