首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies.  相似文献   

2.
3.

Background

In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem.

Results

In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies.

Conclusions

Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop.
  相似文献   

4.
The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput. Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX. However, biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads. Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads. These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.  相似文献   

5.
The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary ‘next-generation'' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world.  相似文献   

6.
SUMMARY: Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. AVAILABILITY: The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.  相似文献   

7.
8.

Background  

Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads.  相似文献   

9.
10.
11.
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded fromhttp://hmpdacc.org). MALINA is made freely available on the web athttp://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.  相似文献   

12.
Conventional pyrosequencing using 2′-deoxyadenosine-5′-O-(1-thiotriphosphate) (dATPαS) is problematic due to the high cost of the substrate (dATPαS) and deterioration in the accuracy of incorporation to read through poly(T) regions. One reason for these problems is that dATPαS has a sulfur on the α-phosphate and also has isomers (Sp and Rp). To solve these problems, 11 nucleotide substrates, which could replace dATPαS in pyrosequencing, were newly synthesized. All substrates were modified on the seventh or eighth position of the adenine base from normal dATP. We found that the substrate that had an ethenyl-linked modified group on the seventh position of the adenine base had low activity in the luciferase reaction and high incorporation efficiency with the thymine base. One substrate in particular had 10-fold better incorporation efficiency than dATPαS. The new nucleotide substrate satisfied all conditions as a replacement of dATPαS.  相似文献   

13.
The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM or yield insufficient quality and robustness. Here, we present kASA, a k-mer based tool capable of identifying and profiling metagenomic DNA or protein sequences with high computational efficiency and a user-definable memory footprint. We ensure both high sensitivity and precision by using an amino acid-like encoding of k-mers together with a range of multiple k’s. Custom algorithms and data structures optimized for external memory storage enable a full-scale taxonomic analysis without compromise on laptop, desktop, and HPCC.  相似文献   

14.
15.
16.
In this study, 341, 246, and 386 positive clones with endo-β-1,4-glucanase, β-glucosidase, and endo-β-1,4-xylanase activities, respectively, were identified by screening from a metagenomic fosmid library constructed from a biogas digester. Subsequently, pools of 4, 10, and 16 positive clones were subjected to 454 pyrosequencing in different subruns. In total, 21 unique glycosyl hydrolase (GH) genes were predicted by bioinformatic analysis, which showed similarities to their nearest neighbors from 39 % to 72 %. In addition to bioinformatics prediction, nine GH genes were expressed and purified to identify their activity with four kinds of substrates. The activities of the most expressed proteins were consistent with their annotation based on bioinformatics prediction; however, three GH genes belonging to the GH5 family showed different activities from their annotation. An efficient acidic cellulase En1 had an optimal condition at 55 °C, pH 5.5, with a specific activity toward carboxymethylcellulose at 118 U/mg and K m at 12.8 g/L. This study demonstrated that there are diverse GHs in the biogas digester system with potential industrial application in lignocellulose hydrolysis, and their activities should be investigated with different substrates before their application. Additionally, pool sequencing of positive fosmid clones might be a cost-effective approach to obtain functional genes from metagenomic libraries.  相似文献   

17.
Fan L  McElroy K  Thomas T 《PloS one》2012,7(6):e39948
Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.  相似文献   

18.
19.
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10 010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.  相似文献   

20.

Background

The microbial communities populating human and natural environments have been extensively characterized with shotgun metagenomics, which provides an in-depth representation of the microbial diversity within a sample. Microbes thriving in urban environments may be crucially important for human health, but have received less attention than those of other environments. Ongoing efforts started to target urban microbiomes at a large scale, but the most recent computational methods to profile these metagenomes have never been applied in this context. It is thus currently unclear whether such methods, that have proven successful at distinguishing even closely related strains in human microbiomes, are also effective in urban settings for tasks such as cultivation-free pathogen detection and microbial surveillance. Here, we aimed at a) testing the currently available metagenomic profiling tools on urban metagenomics; b) characterizing the organisms in urban environment at the resolution of single strain and c) discussing the biological insights that can be inferred from such methods.

Results

We applied three complementary methods on the 1614 metagenomes of the CAMDA 2017 challenge. With MetaMLST we identified 121 known sequence-types from 15 species of clinical relevance. For instance, we identified several Acinetobacter strains that were close to the nosocomial opportunistic pathogen A. nosocomialis. With StrainPhlAn, a generalized version of the MetaMLST approach, we inferred the phylogenetic structure of Pseudomonas stutzeri strains and suggested that the strain-level heterogeneity in environmental samples is higher than in the human microbiome. Finally, we also probed the functional potential of the different strains with PanPhlAn. We further showed that SNV-based and pangenome-based profiling provide complementary information that can be combined to investigate the evolutionary trajectories of microbes and to identify specific genetic determinants of virulence and antibiotic resistances within closely related strains.

Conclusion

We show that strain-level methods developed primarily for the analysis of human microbiomes can be effective for city-associated microbiomes. In fact, (opportunistic) pathogens can be tracked and monitored across many hundreds of urban metagenomes. However, while more effort is needed to profile strains of currently uncharacterized species, this work poses the basis for high-resolution analyses of microbiomes sampled in city and mass transportation environments.

Reviewers

This article was reviewed by Alexandra Bettina Graf, Daniel Huson and Trevor Cickovski.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号