首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Metagenomics holds the promise of greatly advancing the study of diversity in natural communities, but novel theoretical and methodological approaches must first be developed and adjusted for these data sets. We evaluated widely used macroecological metrics of taxonomic diversity on a simulated set of metagenomic samples, using phylogenetically meaningful protein-coding genes as ecological proxies. To our knowledge, this is the first approach of this kind to evaluate taxonomic diversity metrics derived from metagenomic data sets. We demonstrate that abundance matrices derived from protein-coding marker genes reproduce more faithfully the structure of the original community than those derived from SSU-rRNA gene. We also found that the most commonly used diversity metrics are biased estimators of community structure and differ significantly from their corresponding real parameters and that these biases are most likely caused by insufficient sampling and differences in community phylogenetic composition. Our results suggest that the ranking of samples using multidimensional metrics makes a good qualitative alternative for contrasting community structure and that these comparisons can be greatly improved with the incorporation of metrics for both community structure and phylogenetic diversity. These findings will help to achieve a standardized framework for community diversity comparisons derived from metagenomic data sets.  相似文献   

3.
Next‐generation sequencing has dramatically changed the landscape of microbial ecology, large‐scale and in‐depth diversity studies being now widely accessible. However, determining the accuracy of taxonomic and quantitative inferences and comparing results obtained with different approaches are complicated by incongruence of experimental and computational data types and also by lack of knowledge of the true ecological diversity. Here we used highly diverse bacterial and archaeal synthetic communities assembled from pure genomic DNAs to compare inferences from metagenomic and SSU rRNA amplicon sequencing. Both Illumina and 454 metagenomic data outperformed amplicon sequencing in quantifying the community composition, but the outcome was dependent on analysis parameters and platform. New approaches in processing and classifying amplicons can reconstruct the taxonomic composition of the community with high reproducibility within primer sets, but all tested primers sets lead to significant taxon‐specific biases. Controlled synthetic communities assembled to broadly mimic the phylogenetic richness in target environments can provide important validation for fine‐tuning experimental and computational parameters used to characterize natural communities.  相似文献   

4.
Fan L  McElroy K  Thomas T 《PloS one》2012,7(6):e39948
Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.  相似文献   

5.
Metagenomics: Read Length Matters   总被引:7,自引:0,他引:7       下载免费PDF全文
Obtaining an unbiased view of the phylogenetic composition and functional diversity within a microbial community is one central objective of metagenomic analysis. New technologies, such as 454 pyrosequencing, have dramatically reduced sequencing costs, to a level where metagenomic analysis may become a viable alternative to more-focused assessments of the phylogenetic (e.g., 16S rRNA genes) and functional diversity of microbial communities. To determine whether the short (~100 to 200 bp) sequence reads obtained from pyrosequencing are appropriate for the phylogenetic and functional characterization of microbial communities, the results of BLAST and COG analyses were compared for long (~750 bp) and randomly derived short reads from each of two microbial and one virioplankton metagenome libraries. Overall, BLASTX searches against the GenBank nr database found far fewer homologs within the short-sequence libraries. This was especially pronounced for a Chesapeake Bay virioplankton metagenome library. Increasing the short-read sampling depth or the length of derived short reads (up to 400 bp) did not completely resolve the discrepancy in BLASTX homolog detection. Only in cases where the long-read sequence had a close homolog (low BLAST E-score) did the derived short-read sequence also find a significant homolog. Thus, more-distant homologs of microbial and viral genes are not detected by short-read sequences. Among COG hits, derived short reads sampled at a depth of two short reads per long read missed up to 72% of the COG hits found using long reads. Noting the current limitation in computational approaches for the analysis of short sequences, the use of short-read-length libraries does not appear to be an appropriate tool for the metagenomic characterization of microbial communities.  相似文献   

6.
7.

Background

The 16S rRNA gene-based amplicon sequencing analysis is widely used to determine the taxonomic composition of microbial communities. Once the taxonomic composition of each community is obtained, evolutionary relationships among taxa are inferred by a phylogenetic tree. Thus, the combined representation of taxonomic composition and phylogenetic relationships among taxa is a powerful method for understanding microbial community structure; however, applying phylogenetic tree-based representation with information on the abundance of thousands or more taxa in each community is a difficult task. For this purpose, we previously developed the tool VITCOMIC (VIsualization tool for Taxonomic COmpositions of MIcrobial Community), which is based on the genome-sequenced microbes’ phylogenetic information. Here, we introduce VITCOMIC2, which incorporates substantive improvements over VITCOMIC that were necessary to address several issues associated with 16S rRNA gene-based analysis of microbial communities.

Results

We developed VITCOMIC2 to provide (i) sequence identity searches against broad reference taxa including uncultured taxa; (ii) normalization of 16S rRNA gene copy number differences among taxa; (iii) rapid sequence identity searches by applying the graphics processing unit-based sequence identity search tool CLAST; (iv) accurate taxonomic composition inference and nearly full-length 16S rRNA gene sequence reconstructions for metagenomic shotgun sequencing; and (v) an interactive user interface for simultaneous representation of the taxonomic composition of microbial communities and phylogenetic relationships among taxa. We validated the accuracy of processes (ii) and (iv) by using metagenomic shotgun sequencing data from a mock microbial community.

Conclusions

The improvements incorporated into VITCOMIC2 enable users to acquire an intuitive understanding of microbial community composition based on the 16S rRNA gene sequence data obtained from both metagenomic shotgun and amplicon sequencing.
  相似文献   

8.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?  相似文献   

9.
The study of diversity gradients due to elevation dates back to the foundation of biogeography and ecology. Although elevation-driven patterns of plant diversity have been reported for centuries, uncertainty still exists about the assembly rules that drive these patterns. In this study, we revealed the causal factor of community assemblies for the diversity of tree and herb species along an elevation. To this end, we applied an integrated method using both functional traits and phylogeny, called the mean pairwise functional-phylogenetic distance, to understand the assembly rules for woody and herbaceous species communities along an elevation gradient. At higher elevation sites, woody and herbaceous communities were comprised of species having similar traits. The phylogenetic trends for woody species were consistent with the functional trends; closely related species co-occurred more frequently than expected at higher elevations. Phylogenetic trends for herb species were opposite to the functional trends; species with similar traits but having a random phylogenetic distribution co-occurred at higher elevations. We suggest that the community assembly rules for woody and herb species vary with elevation; and functional constraints due to environmental filtering at higher elevation act as assembly rules along gradients in both woody and herbaceous communities, even though their phylogenetic backgrounds differ.  相似文献   

10.
Aim Increasingly, ecologists are using evolutionary relationships to infer the mechanisms of community assembly. However, modern communities are being invaded by non‐indigenous species. Since natives have been associated with one another through evolutionary time, the forces promoting character and niche divergence should be high. On the other hand, exotics have evolved elsewhere, meaning that conserved traits may be more important in their new ranges. Thus, co‐occurrence over sufficient time‐scales for reciprocal evolution may alter how phylogenetic relationships influence assembly. Here, we examined the phylogenetic structure of native and exotic plant communities across a large‐scale gradient in species richness and asked whether local assemblages are composed of more or less closely related natives and exotics and whether phylogenetic turnover among plots and among sites across this gradient is driven by turnover in close or distant relatives differentially for natives and exotics. Location Central and northern California, USA. Methods We used data from 30 to 50 replicate plots at four sites and constructed a maximum likelihood molecular phylogeny using the genes: matK, rbcl, ITS1 and 5.8s. We compared community‐level measures of native and exotic phylogenetic diversity and among‐plot phylobetadiversity. Results There were few exotic clades, but they tended to be widespread. Exotic species were phylogenetically clustered within communities and showed low phylogenetic turnover among communities. In contrast, the more species‐rich native communities showed higher phylogenetic dispersion and turnover among sites. Main conclusions The assembly of native and exotic subcommunities appears to reflect the evolutionary histories of these species and suggests that shared traits drive exotic patterns while evolutionary differentiation drives native assembly. Current invasions appear to be causing phylogenetic homogenization at regional scales.  相似文献   

11.
Phototrophic microbial mat communities from 60 °C and 65 °C regions in the effluent channels of Mushroom and Octopus Springs (Yellowstone National Park, WY, USA) were investigated by shotgun metagenomic sequencing. Analyses of assembled metagenomic sequences resolved six dominant chlorophototrophic populations and permitted the discovery and characterization of undescribed but predominant community members and their physiological potential. Linkage of phylogenetic marker genes and functional genes showed novel chlorophototrophic bacteria belonging to uncharacterized lineages within the order Chlorobiales and within the Kingdom Chloroflexi. The latter is the first chlorophototrophic member of Kingdom Chloroflexi that lies outside the monophyletic group of chlorophototrophs of the Order Chloroflexales. Direct comparison of unassembled metagenomic sequences to genomes of representative isolates showed extensive genetic diversity, genomic rearrangements and novel physiological potential in native populations as compared with genomic references. Synechococcus spp. metagenomic sequences showed a high degree of synteny with the reference genomes of Synechococcus spp. strains A and B′, but synteny declined with decreasing sequence relatedness to these references. There was evidence of horizontal gene transfer among native populations, but the frequency of these events was inversely proportional to phylogenetic relatedness.  相似文献   

12.
Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large data sets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated our metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads and provide the largest metagenomic profiling to date of the human gut. It can be accessed at http://huttenhower.sph.harvard.edu/metaphlan/.  相似文献   

13.
Lake Lanier is an important freshwater lake for the southeast United States, as it represents the main source of drinking water for the Atlanta metropolitan area and is popular for recreational activities. Temperate freshwater lakes such as Lake Lanier are underrepresented among the growing number of environmental metagenomic data sets, and little is known about how functional gene content in freshwater communities relates to that of other ecosystems. To better characterize the gene content and variability of this freshwater planktonic microbial community, we sequenced several samples obtained around a strong summer storm event and during the fall water mixing using a random whole-genome shotgun (WGS) approach. Comparative metagenomics revealed that the gene content was relatively stable over time and more related to that of another freshwater lake and the surface ocean than to soil. However, the phylogenetic diversity of Lake Lanier communities was distinct from that of soil and marine communities. We identified several important genomic adaptations that account for these findings, such as the use of potassium (as opposed to sodium) osmoregulators by freshwater organisms and differences in the community average genome size. We show that the lake community is predominantly composed of sequence-discrete populations and describe a simple method to assess community complexity based on population richness and evenness and to determine the sequencing effort required to cover diversity in a sample. This study provides the first comprehensive analysis of the genetic diversity and metabolic potential of a temperate planktonic freshwater community and advances approaches for comparative metagenomics.  相似文献   

14.
The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.  相似文献   

15.
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao''s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (‘Hill diversities''), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao''s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.  相似文献   

16.
ABSTRACT: BACKGROUND: Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. RESULTS: At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. CONCLUSIONS: Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.  相似文献   

17.
Accurate phylogenetic classification of variable-length DNA fragments   总被引:1,自引:0,他引:1  
Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >or=1 kb with high specificity.  相似文献   

18.
基于COⅡ基因序列的斑腿蝗科部分亚科的分子系统学研究   总被引:1,自引:0,他引:1  
马兰  黄原 《昆虫学报》2006,49(6):982-990
采用PCR产物直接测序法测定了斑腿蝗科10个亚科16属22种的COⅡ基因585 bp的片段, 对序列的碱基组成进行了分析,并评估了数据集的系统发育信号;最后,以癞蝗科的肃南 短鼻蝗作为外群,采用NJ法、MP法、ML法以及贝叶斯推论法构建了系统树,以解决这些物种所代表的亚科之间的系统发育关系。结果表明:22种斑腿蝗科昆虫的COⅡ基因序列碱基组成表现强烈的A+T含量偏向性。对COⅡ基因585 bp序列片段构成的全数据组和根据密码子不同位点划分的密码子第一、第二和第三位点数据组的系统发育信号分析显示,所有数据组都具有一定的系统发育信息。在4种方法得到的合一树中发现: (1)星翅蝗亚科、刺胸蝗亚科、黑背蝗亚科、斑腿蝗亚科的亲缘关系较近;(2)卵翅蝗亚科与稻蝗亚科亲缘关系较近,建议卵翅蝗亚科似乎应归入稻蝗亚科中,板胸蝗亚科与这两个亚科的关系较近;(3)黑蝗亚科和秃蝗亚科似乎应合并为一个亚科;(4)切翅蝗亚科的4个属未聚在一起,表明这些属的区别较大,不是一个单系群;(5)黑蝗亚科和秃蝗亚科关系较近,且与本研究中其他几个亚科的亲缘关系相对较远。研究结果表明COⅡ基因在解决斑腿蝗科的亚科以下属种间的系统发育关系时是一个有效的分子标记。  相似文献   

19.
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.  相似文献   

20.
The microbial mats of Guerrero Negro (GN), Baja California Sur, Mexico historically were considered a simple environment, dominated by cyanobacteria and sulfate-reducing bacteria. Culture-independent rRNA community profiling instead revealed these microbial mats as among the most phylogenetically diverse environments known. A preliminary molecular survey of the GN mat based on only ∼1500 small subunit rRNA gene sequences discovered several new phylum-level groups in the bacterial phylogenetic domain and many previously undetected lower-level taxa. We determined an additional ∼119 000 nearly full-length sequences and 28 000 >200 nucleotide 454 reads from a 10-layer depth profile of the GN mat. With this unprecedented coverage of long sequences from one environment, we confirm the mat is phylogenetically stratified, presumably corresponding to light and geochemical gradients throughout the depth of the mat. Previous shotgun metagenomic data from the same depth profile show the same stratified pattern and suggest that metagenome properties may be predictable from rRNA gene sequences. We verify previously identified novel lineages and identify new phylogenetic diversity at lower taxonomic levels, for example, thousands of operational taxonomic units at the family-genus levels differ considerably from known sequences. The new sequences populate parts of the bacterial phylogenetic tree that previously were poorly described, but indicate that any comprehensive survey of GN diversity has only begun. Finally, we show that taxonomic conclusions are generally congruent between Sanger and 454 sequencing technologies, with the taxonomic resolution achieved dependent on the abundance of reference sequences in the relevant region of the rRNA tree of life.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号