共查询到20条相似文献,搜索用时 31 毫秒
1.
MetaSim: a sequencing simulator for genomics and metagenomics 总被引:1,自引:0,他引:1
Background
The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets.Methodology/Principal Findings
To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.Conclusions/Significance
MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. 相似文献2.
Background
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a prokaryotic adaptive defence system that provides resistance against alien replicons such as viruses and plasmids. Spacers in a CRISPR cassette confer immunity against viruses and plasmids containing regions complementary to the spacers and hence they retain a footprint of interactions between prokaryotes and their viruses in individual strains and ecosystems. The human gut is a rich habitat populated by numerous microorganisms, but a large fraction of these are unculturable and little is known about them in general and their CRISPR systems in particular.Results
We used human gut metagenomic data from three open projects in order to characterize the composition and dynamics of CRISPR cassettes in the human-associated microbiota. Applying available CRISPR-identification algorithms and a previously designed filtering procedure to the assembled human gut metagenomic contigs, we found 388 CRISPR cassettes, 373 of which had repeats not observed previously in complete genomes or other datasets. Only 171 of 3,545 identified spacers were coupled with protospacers from the human gut metagenomic contigs. The number of matches to GenBank sequences was negligible, providing protospacers for 26 spacers.Reconstruction of CRISPR cassettes allowed us to track the dynamics of spacer content. In agreement with other published observations we show that spacers shared by different cassettes (and hence likely older ones) tend to the trailer ends, whereas spacers with matches in the metagenomes are distributed unevenly across cassettes, demonstrating a preference to form clusters closer to the active end of a CRISPR cassette, adjacent to the leader, and hence suggesting dynamical interactions between prokaryotes and viruses in the human gut. Remarkably, spacers match protospacers in the metagenome of the same individual with frequency comparable to a random control, but may match protospacers from metagenomes of other individuals.Conclusions
The analysis of assembled contigs is complementary to the approach based on the analysis of original reads and hence provides additional data about composition and evolution of CRISPR cassettes, revealing the dynamics of CRISPR-phage interactions in metagenomes.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-202) contains supplementary material, which is available to authorized users. 相似文献3.
4.
Background
Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism''s DNA was observed in reads generated via DNA sequencing.Methodology/Principal Findings
We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized.Conclusions/Significance
We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different protocols are not suitable for comparative metagenomics. 相似文献5.
Omar Lakhdari Antonietta Cultrone Julien Tap Karine Gloux Fran?oise Bernard S. Dusko Ehrlich Fabrice Lefèvre Jo?l Doré Hervé M. Blottière 《PloS one》2010,5(9)
Background/Aim
The human intestinal microbiota plays an important role in modulation of mucosal immune responses. To study interactions between intestinal epithelial cells (IECs) and commensal bacteria, a functional metagenomic approach was developed. One interest of metagenomics is to provide access to genomes of uncultured microbes. We aimed at identifying bacterial genes involved in regulation of NF-κB signaling in IECs. A high throughput cell-based screening assay allowing rapid detection of NF-κB modulation in IECs was established using the reporter-gene strategy to screen metagenomic libraries issued from the human intestinal microbiota.Methods
A plasmid containing the secreted alkaline phosphatase (SEAP) gene under the control of NF-κB binding elements was stably transfected in HT-29 cells. The reporter clone HT-29/kb-seap-25 was selected and characterized. Then, a first screening of a metagenomic library from Crohn''s disease patients was performed to identify NF-κB modulating clones. Furthermore, genes potentially involved in the effect of one stimulatory metagenomic clone were determined by sequence analysis associated to mutagenesis by transposition.Results
The two proinflammatory cytokines, TNF-α and IL-1β, were able to activate the reporter system, translating the activation of the NF-κB signaling pathway and NF-κB inhibitors, BAY 11-7082, caffeic acid phenethyl ester and MG132 were efficient. A screening of 2640 metagenomic clones led to the identification of 171 modulating clones. Among them, one stimulatory metagenomic clone, 52B7, was further characterized. Sequence analysis revealed that its metagenomic DNA insert might belong to a new Bacteroides strain and we identified 2 loci encoding an ABC transport system and a putative lipoprotein potentially involved in 52B7 effect on NF-κB.Conclusions
We have established a robust high throughput screening assay for metagenomic libraries derived from the human intestinal microbiota to study bacteria-driven NF-κB regulation. This opens a strategic path toward the identification of bacterial strains and molecular patterns presenting a potential therapeutic interest. 相似文献6.
Pier Luigi Buttigieg Wolfgang Hankeln Ivaylo Kostadinov Renzo Kottmann Pelin Yilmaz Melissa Beth Duhaime Frank Oliver Gl?ckner 《PloS one》2013,8(3)
Background
The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism''s ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation.Methodology/Principal findings
We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious “guilty-by-association” approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs.Conclusion/Significance
Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge. 相似文献7.
Background
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.Results
To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS – Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.Conclusions
RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0503-6) contains supplementary material, which is available to authorized users. 相似文献8.
Michael Cantor Henrik Nordberg Tatyana Smirnova Matthias Hess Susannah Tringe Inna Dubchak 《BMC bioinformatics》2015,16(1)
Background
Metagenomics, the sequencing of DNA collected from an entire microbial community, enables the study of natural microbial consortia in their native habitats. Metagenomics studies produce huge volumes of data, including both the sequences themselves and metadata describing their abundance, assembly, predicted functional characteristics and environmental parameters. The ability to explore these data visually is critically important to meaningful biological interpretation. Current genomics applications cannot effectively integrate sequence data, assembly metadata, and annotation to support both genome and community-level inquiry.Results
Elviz (Environmental Laboratory Visualization) is an interactive web-based tool for the visual exploration of assembled metagenomes and their complex metadata. Elviz allows scientists to navigate metagenome assemblies across multiple dimensions and scales, plotting parameters such as GC content, relative abundance, phylogenetic affiliation and assembled contig length. Furthermore Elviz enables interactive exploration using real-time plot navigation, search, filters, axis selection, and the ability to drill from a whole-community profile down to individual gene annotations. Thus scientists engage in a rapid feedback loop of visual pattern identification, hypothesis generation, and hypothesis testing.Conclusions
Compared to the current alternative of generating a succession of static figures, Elviz can greatly accelerate the speed of metagenome analysis. Elviz can be used to explore both user-submitted datasets and numerous metagenome studies publicly available at the Joint Genome Institute (JGI). Elviz is freely available at http://genome.jgi.doe.gov/viz and runs on most current web-browsers.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0566-4) contains supplementary material, which is available to authorized users. 相似文献9.
Milica Ciric Christina D Moon Sinead C Leahy Christopher J Creevey Eric Altermann Graeme T Attwood Jasna Rakonjac Dragana Gagic 《BMC genomics》2014,15(1)
Background
In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects.Results
By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre.Conclusions
As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-356) contains supplementary material, which is available to authorized users. 相似文献10.
Agnes Dettai Cyril Gallut Sophie Brouillet Joel Pothier Guillaume Lecointre Régis Debruyne 《PloS one》2012,7(12)
Background
Researchers sorely need markers and approaches for biodiversity exploration (both specimen linked and metagenomics) using the full potential of next generation sequencing technologies (NGST). Currently, most studies rely on expensive multiple tagging, PCR primer universality and/or the use of few markers, sometimes with insufficient variability.Methodology/Principal Findings
We propose a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. It relies on the properties of metazoan mitogenomes for enrichment, on careful choice of the organisms to multiplex, as well as on the wide collection of accumulated mitochondrial reference datasets for post-sequencing sorting and identification instead of individual tagging. Multiple divergent organisms can be sequenced simultaneously, and their complete mitogenome obtained at a very low cost. We provide in silico testing of dataset assembly for a selected set of example datasets.Conclusions/Significance
This approach generates large mitogenome datasets. These sequences are useful for phylogenetics, molecular identification and molecular ecology studies, and are compatible with all existing projects or available datasets based on mitochondrial sequences, such as the Barcode of Life project. Our method can yield sequences both from identified samples and metagenomic samples. The use of the same datasets for both kinds of studies makes for a powerful approach, especially since the datasets have a high variability even at species level, and would be a useful complement to the less variable 18S rDNA currently prevailing in metagenomic studies. 相似文献11.
Bo Xu Weijiang Xu Junjun Li Liming Dai Caiyun Xiong Xianghua Tang Yunjuan Yang Yuelin Mu Junpei Zhou Junmei Ding Qian Wu Zunxi Huang 《BMC genomics》2015,16(1)
Background
The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host and the diet adopted by the host. Although the importance of gut microbiota of humans has been well demonstrated, there is a paucity of research regarding non-human primates (NHPs), especially herbivorous NHPs.Results
In this study, an analysis of 97,942 pyrosequencing reads generated from Rhinopithecus bieti fecal DNA extracts was performed to help better understanding of the microbial diversity and functional capacity of the R. bieti gut microbiome. The taxonomic analysis of the metagenomic reads indicated that R. bieti fecal microbiomes were dominated by Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria phyla. The comparative analysis of taxonomic classification revealed that the metagenome of R. bieti was characterized by an overrepresentation of bacteria of phylum Fibrobacteres and Spirochaetes as compared with other animals. Primary functional categories were associated mainly with protein, carbohydrates, amino acids, DNA and RNA metabolism, cofactors, cell wall and capsule and membrane transport. Comparing glycoside hydrolase profiles of R. bieti with those of other animal revealed that the R. bieti microbiome was most closely related to cow rumen.Conclusions
Metagenomic and functional analysis demonstrated that R. bieti possesses a broad diversity of bacteria and numerous glycoside hydrolases responsible for lignocellulosic biomass degradation which might reflect the adaptations associated with a diet rich in fibrous matter. These results would contribute to the limited body of NHPs metagenome studies and provide a unique genetic resource of plant cell wall degrading microbial enzymes. However, future studies on the metagenome sequencing of R. bieti regarding the effects of age, genetics, diet and environment on the composition and activity of the metagenomes are required.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1378-7) contains supplementary material, which is available to authorized users. 相似文献12.
13.
Yanjiao Zhou Hongyu Gao Kathie A Mihindukulasuriya Patricio S La Rosa Kristine M Wylie Tatiana Vishnivetskaya Mircea Podar Barb Warner Phillip I Tarr David E Nelson J Dennis Fortenberry Martin J Holland Sarah E Burr William D Shannon Erica Sodergren George M Weinstock 《Genome biology》2013,14(1):R1
Background
Characterizing the biogeography of the microbiome of healthy humans is essential for understanding microbial associated diseases. Previous studies mainly focused on a single body habitat from a limited set of subjects. Here, we analyzed one of the largest microbiome datasets to date and generated a biogeographical map that annotates the biodiversity, spatial relationships, and temporal stability of 22 habitats from 279 healthy humans.Results
We identified 929 genera from more than 24 million 16S rRNA gene sequences of 22 habitats, and we provide a baseline of inter-subject variation for healthy adults. The oral habitat has the most stable microbiota with the highest alpha diversity, while the skin and vaginal microbiota are less stable and show lower alpha diversity. The level of biodiversity in one habitat is independent of the biodiversity of other habitats in the same individual. The abundances of a given genus at a body site in which it dominates do not correlate with the abundances at body sites where it is not dominant. Additionally, we observed the human microbiota exhibit both cosmopolitan and endemic features. Finally, comparing datasets of different projects revealed a project-based clustering pattern, emphasizing the significance of standardization of metagenomic studies.Conclusions
The data presented here extend the definition of the human microbiome by providing a more complete and accurate picture of human microbiome biogeography, addressing questions best answered by a large dataset of subjects and body sites that are deeply sampled by sequencing. 相似文献14.
15.
Jér?me Lluch Florence Servant Sandrine Pa?ssé Carine Valle Sophie Valière Claire Kuchly Ga?lle Vilchez Cécile Donnadieu Michael Courtney Rémy Burcelin Jacques Amar Olivier Bouchez Benjamin Lelouvier 《PloS one》2015,10(11)
Background
Substantial progress in high-throughput metagenomic sequencing methodologies has enabled the characterisation of bacteria from various origins (for example gut and skin). However, the recently-discovered bacterial microbiota present within animal internal tissues has remained unexplored due to technical difficulties associated with these challenging samples.Results
We have optimized a specific 16S rDNA-targeted metagenomics sequencing (16S metabarcoding) pipeline based on the Illumina MiSeq technology for the analysis of bacterial DNA in human and animal tissues. This was successfully achieved in various mouse tissues despite the high abundance of eukaryotic DNA and PCR inhibitors in these samples. We extensively tested this pipeline on mock communities, negative controls, positive controls and tissues and demonstrated the presence of novel tissue specific bacterial DNA profiles in a variety of organs (including brain, muscle, adipose tissue, liver and heart).Conclusion
The high throughput and excellent reproducibility of the method ensured exhaustive and precise coverage of the 16S rDNA bacterial variants present in mouse tissues. This optimized 16S metagenomic sequencing pipeline will allow the scientific community to catalogue the bacterial DNA profiles of different tissues and will provide a database to analyse host/bacterial interactions in relation to homeostasis and disease. 相似文献16.
Weizhong Li 《BMC bioinformatics》2009,10(1):359
Background
The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. 相似文献17.
18.
Amirhossein Shamsaddini Yang Pan W Evan Johnson Konstantinos Krampis Mariya Shcheglovitova Vahan Simonyan Amy Zanne Raja Mazumder 《BMC genomics》2014,15(1)
Background
Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis.Results
Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a ‘sneak peak’ into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI’s nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results.Conclusion
CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-918) contains supplementary material, which is available to authorized users. 相似文献19.
Background
Metagenomics is a relatively new but fast growing field within environmental biology and medical sciences. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Traditional methods in genomics and microbiology are not efficient in capturing the structure of the microbial community in an environment. Nowadays, high-throughput next-generation sequencing technologies are powerfully driving the metagenomic studies. However, there is an urgent need to develop efficient statistical methods and computational algorithms to rapidly analyze the massive metagenomic short sequencing data and to accurately detect the features/functions present in the microbial community. Although several issues about functions of metagenomes at pathways or subsystems level have been investigated, there is a lack of studies focusing on functional analysis at a low level of a hierarchical functional tree, such as SEED subsystem tree.Results
A two-step statistical procedure (metaFunction) is proposed to detect all possible functional roles at the low level from a metagenomic sample/community. In the first step a statistical mixture model is proposed at the base of gene codons to estimate the abundances for the candidate functional roles, with sequencing error being considered. As a gene could be involved in multiple biological processes the functional assignment is therefore adjusted by utilizing an error distribution in the second step. The performance of the proposed procedure is evaluated through comprehensive simulation studies. Compared with other existing methods in metagenomic functional analysis the new approach is more accurate in assigning reads to functional roles, and therefore at more general levels. The method is also employed to analyze two real data sets.Conclusions
metaFunction is a powerful tool in accurate profiling functions in a metagenomic sample. 相似文献20.