首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.  相似文献   

2.
Caves are relatively accessible subterranean habitats ideal for the study of subsurface microbial dynamics and metabolisms under oligotrophic, non-photosynthetic conditions. A 454-pyrotag analysis of the V6 region of the 16S rRNA gene was used to systematically evaluate the bacterial diversity of ten cave surfaces within Kartchner Caverns, a limestone cave. Results showed an average of 1,994 operational taxonomic units (97 % cutoff) per speleothem and a broad taxonomic diversity that included 21 phyla and 12 candidate phyla. Comparative analysis of speleothems within a single room of the cave revealed three distinct bacterial taxonomic profiles dominated by either Actinobacteria, Proteobacteria, or Acidobacteria. A gradient in observed species richness along the sampling transect revealed that the communities with lower diversity corresponded to those dominated by Actinobacteria while the more diverse communities were those dominated by Proteobacteria. A 16S rRNA gene clone library from one of the Actinobacteria-dominated speleothems identified clones with 99 % identity to chemoautotrophs and previously characterized oligotrophs, providing insights into potential energy dynamics supporting these communities. The robust analysis conducted for this study demonstrated a rich bacterial diversity on speleothem surfaces. Further, it was shown that seemingly comparable speleothems supported divergent phylogenetic profiles suggesting that these communities are very sensitive to subtle variations in nutritional inputs and environmental factors typifying speleothem surfaces in Kartchner Caverns.  相似文献   

3.
Carbonate caves represent subterranean ecosystems that are largely devoid of phototrophic primary production. In semiarid and arid regions, allochthonous organic carbon inputs entering caves with vadose-zone drip water are minimal, creating highly oligotrophic conditions; however, past research indicates that carbonate speleothem surfaces in these caves support diverse, predominantly heterotrophic prokaryotic communities. The current study applied a metagenomic approach to elucidate the community structure and potential energy dynamics of microbial communities, colonizing speleothem surfaces in Kartchner Caverns, a carbonate cave in semiarid, southeastern Arizona, USA. Manual inspection of a speleothem metagenome revealed a community genetically adapted to low-nutrient conditions with indications that a nitrogen-based primary production strategy is probable, including contributions from both Archaea and Bacteria. Genes for all six known CO2-fixation pathways were detected in the metagenome and RuBisCo genes representative of the Calvin–Benson–Bassham cycle were over-represented in Kartchner speleothem metagenomes relative to bulk soil, rhizosphere soil and deep-ocean communities. Intriguingly, quantitative PCR found Archaea to be significantly more abundant in the cave communities than in soils above the cave. MEtaGenome ANalyzer (MEGAN) analysis of speleothem metagenome sequence reads found Thaumarchaeota to be the third most abundant phylum in the community, and identified taxonomic associations to this phylum for indicator genes representative of multiple CO2-fixation pathways. The results revealed that this oligotrophic subterranean environment supports a unique chemoautotrophic microbial community with potentially novel nutrient cycling strategies. These strategies may provide key insights into other ecosystems dominated by oligotrophy, including aphotic subsurface soils or aquifers and photic systems such as arid deserts.  相似文献   

4.
One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification.  相似文献   

5.
The present study was an attempt to demonstrate the capabilities of the microbial strains from the unexplored Labit cave in India to precipitate calcite providing evidence for biotic processes involved in formation of speleothem deposits. Six calcifying bacterial strains majority belonging to genus Bacillus were isolated from the cave. SEM studies revealed an array of various in vitro crystal polymorphs generated by the isolated bacteria which are similar to microscopic observations on natural formations in speleothems. The EDX spectrum of the precipitated crystals predominately composed of calcium carbonate indicating the relevance of bacterial biofilm in cave geomicrobiology and biogenic evolution of cave formations in the studied cave, which is further supported by XRF analysis and Raman spectroscopy.  相似文献   

6.
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.  相似文献   

7.
With the astonishing rate that genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as “marker” genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities (e.g., construction of species trees, phylogenetic based assignment of metagenomic sequence reads to taxonomic groups, phylogeny-based assessment of alpha- and beta-diversity of microbial communities from metagenomic data). We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa.We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for “all bacteria and archaea”, 114 for “all bacteria (greatly expanding on the ∼30 commonly used), and 100 s to 1000 s for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.  相似文献   

8.
Assembling individual genomes from complex community metagenomic data remains a challenging issue for environmental studies. We evaluated the quality of genome assemblies from community short read data (Illumina 100 bp pair-ended sequences) using datasets recovered from freshwater and soil microbial communities as well as in silico simulations. Our analyses revealed that the genome of a single genotype (or species) can be accurately assembled from a complex metagenome when it shows at least about 20 × coverage. At lower coverage, however, the derived assemblies contained a substantial fraction of non-target sequences (chimeras), which explains, at least in part, the higher number of hypothetical genes recovered in metagenomic relative to genomic projects. We also provide examples of how to detect intrapopulation structure in metagenomic datasets and estimate the type and frequency of errors in assembled genes and contigs from datasets of varied species complexity.  相似文献   

9.

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis.

Results

Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a ‘sneak peak’ into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI’s nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results.

Conclusion

CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-918) contains supplementary material, which is available to authorized users.  相似文献   

11.
Fan L  McElroy K  Thomas T 《PloS one》2012,7(6):e39948
Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.  相似文献   

12.
The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM or yield insufficient quality and robustness. Here, we present kASA, a k-mer based tool capable of identifying and profiling metagenomic DNA or protein sequences with high computational efficiency and a user-definable memory footprint. We ensure both high sensitivity and precision by using an amino acid-like encoding of k-mers together with a range of multiple k’s. Custom algorithms and data structures optimized for external memory storage enable a full-scale taxonomic analysis without compromise on laptop, desktop, and HPCC.  相似文献   

13.
Graves’ Disease is the most common organ-specific autoimmune disease and has been linked in small pilot studies to taxonomic markers within the gut microbiome. Important limitations of this work include small sample sizes and low-resolution taxonomic markers. Accordingly, we studied 162 gut microbiomes of mild and severe Graves’ disease (GD) patients and healthy controls. Taxonomic and functional analyses based on metagenome-assembled genomes (MAGs) and MAG-annotated genes, together with predicted metabolic functions and metabolite profiles, revealed a well-defined network of MAGs, genes and clinical indexes separating healthy from GD subjects. A supervised classification model identified a combination of biomarkers including microbial species, MAGs, genes and SNPs, with predictive power superior to models from any single biomarker type (AUC = 0.98). Global, cross-disease multi-cohort analysis of gut microbiomes revealed high specificity of these GD biomarkers, notably discriminating against Parkinson’s Disease, and suggesting that non-invasive stool-based diagnostics will be useful for these diseases.Subject terms: Microbiome, Biomarkers, Population genetics  相似文献   

14.
Understanding the evolutionary dynamics of influenza viruses is essential to control both avian and human influenza. Here, we analyze host-specific and segment-specific Tajima’s D trends of influenza A virus through a systematic review using viral sequences registered in the National Center for Biotechnology Information. To avoid bias from viral population subdivision, viral sequences were stratified according to their sampling locations and sampling years. As a result, we obtained a total of 580 datasets each of which consists of nucleotide sequences of influenza A viruses isolated from a single population of hosts at a single sampling site within a single year. By analyzing nucleotide sequences in the datasets, we found that Tajima’s D values of viral sequences were different depending on hosts and gene segments. Tajima’s D values of viruses isolated from chicken and human samples showed negative, suggesting purifying selection or a rapid population growth of the viruses. The negative Tajima’s D values in rapidly growing viral population were also observed in computer simulations. Tajima’s D values of PB2, PB1, PA, NP, and M genes of the viruses circulating in wild mallards were close to zero, suggesting that these genes have undergone neutral selection in constant-sized population. On the other hand, Tajima’s D values of HA and NA genes of these viruses were positive, indicating HA and NA have undergone balancing selection in wild mallards. Taken together, these results indicated the existence of unknown factors that maintain viral subtypes in wild mallards.  相似文献   

15.
Biodiversity is a complex, yet essential, concept for undergraduate students in ecology and other natural sciences to grasp. As beginner scientists, students must learn to recognize, describe, and interpret patterns of biodiversity across various spatial scales and understand their relationships with ecological processes and human influences. It is also increasingly important for undergraduate programs in ecology and related disciplines to provide students with experiences working with large ecological datasets to develop students’ data science skills and their ability to consider how ecological processes that operate at broader spatial scales (macroscale) affect local ecosystems. To support the goals of improving student understanding of macroscale ecology and biodiversity at multiple spatial scales, we formed an interdisciplinary team that included grant personnel, scientists, and faculty from ecology and spatial sciences to design a flexible learning activity to teach macroscale biodiversity concepts using large datasets from the National Ecological Observatory Network (NEON). We piloted this learning activity in six courses enrolling a total of 109 students, ranging from midlevel ecology and GIS/remote sensing courses, to upper‐level conservation biology. Using our classroom experiences and a pre/postassessment framework, we evaluated whether our learning activity resulted in increased student understanding of macroscale ecology and biodiversity concepts and increased familiarity with analysis techniques, software programs, and large spatio‐ecological datasets. Overall, results suggest that our learning activity improved student understanding of biological diversity, biodiversity metrics, and patterns of biodiversity across several spatial scales. Participating faculty reflected on what went well and what would benefit from changes, and we offer suggestions for implementation of the learning activity based on this feedback. This learning activity introduced students to macroscale ecology and built student skills in working with big data (i.e., large datasets) and performing basic quantitative analyses, skills that are essential for the next generation of ecologists.  相似文献   

16.
Madagascar is well known for its diverse fauna and flora, being home to many species not found anywhere else in the world. However, its biodiversity in the recent past included a range of extinct enigmatic fauna, such as elephant birds, giant lemurs and dwarfed hippopotami. The ‘Malagasy aardvark’ (Plesiorycteropus) has remained one of Madagascar’s least well-understood extinct species since its discovery in the 19th century. Initially considered a close relative of the aardvark (Orycteropus) within the order Tubulidentata, more recent morphological analyses challenged this placement on the grounds that the identifiably derived traits supporting this allocation were adaptations to digging rather than shared ancestry. Because the skeletal evidence showed many morphological traits diagnostic of different eutherian mammal orders, they could not be used to resolve its closest relatives. As a result, the genus was tentatively assigned its own taxonomic order ‘Bibymalagasia’, yet how this order relates to other eutherian mammal orders remains unclear despite numerous morphological investigations. This research presents the first known molecular sequence data for Plesiorycteropus, obtained from the bone protein collagen (I), which places the ‘Malagasy aardvark’ as more closely related to tenrecs than aardvarks. More specifically, Plesiorycteropus was recovered within the order Tenrecoidea (golden moles and tenrecs) within Afrotheria, suggesting that the taxonomic order ‘Bibymalagasia’ is obsolete. This research highlights the potential for collagen sequencing in investigating the phylogeny of extinct species as a viable alternative to ancient DNA (aDNA) sequencing, particularly in cases where aDNA cannot be recovered.  相似文献   

17.
Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of ‘fitness for use’ for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC’s standard checklists for genomics and metagenomics and (b) TDWG’s Darwin Core standard, used primarily in taxonomy and systematic biology.  相似文献   

18.
Microbial communities present in diverse environments from deep seas to human body niches play significant roles in the complex ecosystem and human health. Characterizing their structural and functional diversities is indispensable, and many approaches, such as microscopic observation, DNA fingerprinting, and PCR-based marker gene analysis, have been successfully applied to identify microorganisms. Since the revolutionary improvement of DNA sequencing technologies, direct and high-throughput analysis of genomic DNA from a whole environmental community without prior cultivation has become the mainstream approach, overcoming the constraints of the classical approaches. Here, we first briefly review the history of environmental DNA analysis applications with a focus on profiling the taxonomic composition and functional potentials of microbial communities. To this end, we aim to introduce the shotgun metagenomic sequencing (SMS) approach, which is used for the untargeted (“shotgun”) sequencing of all (“meta”) microbial genomes (“genomic”) present in a sample. SMS data analyses are performed in silico using various software programs; however, in silico analysis is typically regarded as a burden on wet-lab experimental microbiologists. Therefore, in this review, we present microbiologists who are unfamiliar with in silico analyses with a basic and practical SMS data analysis protocol. This protocol covers all the bioinformatics processes of the SMS analysis in terms of data preprocessing, taxonomic profiling, functional annotation, and visualization.  相似文献   

19.
Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.  相似文献   

20.
Trace amounts of sulphur in speleothems suggest that stalagmites may act as archives of sulphur deposition, thereby recording aspects of atmospheric variability in sulphur content. Accurate interpretation of this novel sulphur archive depends upon understanding how biogeochemical cycling in the soil and epikarst above the cave may modify the precursor atmospheric values of sulphur concentration and isotopic composition prior to incorporation into the speleothem record. Dual isotope analysis of δ34S-SO4 and δ18O-SO4 is used to trace biogeochemical transformations of atmospheric sulphur through the cave system at Grotta di Ernesto in the Italian Alps and builds towards a framework for interpretation of speleothem sulphur archives which depends on overlying ecosystem dynamics and karst hydrological properties. A three component model of atmospheric sulphate signal modification is proposed to be driven by (1). vegetation and soil cycling, (2). the degree of groundwater mixing in the karst aquifer; and (3). redox status. The relative influence of each process is specific to individual drip flow sites and associated stalagmites, rendering each sulphur archive a unique signal of environmental conditions. Under conditions found in the soil and epikarst above Grotta di Ernesto, the dual isotope signatures of sulphate sulphur and oxygen incorporated into speleothem carbonate, closely reflect past conditions of industrial sulphur loading to the atmosphere and the extent of signal modification through biogeochemical cycling and aquifer mixing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号