共查询到20条相似文献,搜索用时 15 毫秒
1.
Francis C Weng Chien-Hao Su Ming-Tsung Hsu Tse-Yi Wang Huai-Kuang Tsai Daryi Wang 《BMC bioinformatics》2010,11(1):565
Background
Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies. 相似文献2.
Aggressive assembly of pyrosequencing reads with mates 总被引:2,自引:0,他引:2
Miller JR Delcher AL Koren S Venter E Walenz BP Brownley A Johnson J Li K Mobarry C Sutton G 《Bioinformatics (Oxford, England)》2008,24(24):2818-2824
3.
Background
In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem.Results
In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies.Conclusions
Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop.4.
The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput. Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX. However, biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads. Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads. These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data. 相似文献
5.
The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary ‘next-generation'' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world. 相似文献
6.
CF Davenport J Neugebauer N Beckmann B Friedrich B Kameri S Kokott M Paetow B Siekmann M Wieding-Drewes M Wienhöfer S Wolf B Tümmler V Ahlers F Sprengel 《PloS one》2012,7(8):e41224
SUMMARY: Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. AVAILABILITY: The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7. 相似文献
7.
8.
Wolfgang Gerlach Sebastian Jünemann Felix Tille Alexander Goesmann Jens Stoye 《BMC bioinformatics》2009,10(1):430
Background
Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. 相似文献9.
10.
11.
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded fromhttp://hmpdacc.org). MALINA is made freely available on the web athttp://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported. 相似文献
12.
Conventional pyrosequencing using 2′-deoxyadenosine-5′-O-(1-thiotriphosphate) (dATPαS) is problematic due to the high cost of the substrate (dATPαS) and deterioration in the accuracy of incorporation to read through poly(T) regions. One reason for these problems is that dATPαS has a sulfur on the α-phosphate and also has isomers (Sp and Rp). To solve these problems, 11 nucleotide substrates, which could replace dATPαS in pyrosequencing, were newly synthesized. All substrates were modified on the seventh or eighth position of the adenine base from normal dATP. We found that the substrate that had an ethenyl-linked modified group on the seventh position of the adenine base had low activity in the luciferase reaction and high incorporation efficiency with the thymine base. One substrate in particular had 10-fold better incorporation efficiency than dATPαS. The new nucleotide substrate satisfied all conditions as a replacement of dATPαS. 相似文献
13.
The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM or yield insufficient quality and robustness. Here, we present kASA, a k-mer based tool capable of identifying and profiling metagenomic DNA or protein sequences with high computational efficiency and a user-definable memory footprint. We ensure both high sensitivity and precision by using an amino acid-like encoding of k-mers together with a range of multiple k’s. Custom algorithms and data structures optimized for external memory storage enable a full-scale taxonomic analysis without compromise on laptop, desktop, and HPCC. 相似文献
14.
15.
16.
Xing Yan Alei Geng Jun Zhang Yongjun Wei Lei Zhang Changli Qian Qianfu Wang Shengyue Wang Zhihua Zhou 《Applied microbiology and biotechnology》2013,97(18):8173-8182
In this study, 341, 246, and 386 positive clones with endo-β-1,4-glucanase, β-glucosidase, and endo-β-1,4-xylanase activities, respectively, were identified by screening from a metagenomic fosmid library constructed from a biogas digester. Subsequently, pools of 4, 10, and 16 positive clones were subjected to 454 pyrosequencing in different subruns. In total, 21 unique glycosyl hydrolase (GH) genes were predicted by bioinformatic analysis, which showed similarities to their nearest neighbors from 39 % to 72 %. In addition to bioinformatics prediction, nine GH genes were expressed and purified to identify their activity with four kinds of substrates. The activities of the most expressed proteins were consistent with their annotation based on bioinformatics prediction; however, three GH genes belonging to the GH5 family showed different activities from their annotation. An efficient acidic cellulase En1 had an optimal condition at 55 °C, pH 5.5, with a specific activity toward carboxymethylcellulose at 118 U/mg and K m at 12.8 g/L. This study demonstrated that there are diverse GHs in the biogas digester system with potential industrial application in lignocellulose hydrolysis, and their activities should be investigated with different substrates before their application. Additionally, pool sequencing of positive fosmid clones might be a cost-effective approach to obtain functional genes from metagenomic libraries. 相似文献
17.
Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes. 相似文献
18.
19.
A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library 总被引:2,自引:0,他引:2
Manichanh C Chapple CE Frangeul L Gloux K Guigo R Dore J 《Nucleic acids research》2008,36(16):5180-5188
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10 010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper. 相似文献
20.
Moreno Zolfo Francesco Asnicar Paolo Manghi Edoardo Pasolli Adrian Tett Nicola Segata 《Biology direct》2018,13(1):9