首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 14 毫秒
1.
2.
3.
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded fromhttp://hmpdacc.org). MALINA is made freely available on the web athttp://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.  相似文献   

4.
5.
Compared with traditional algorithms for long metagenomic sequence classification, characterizing microorganisms’ taxonomic and functional abundance based on tens of millions of very short reads are much more challenging. We describe an efficient composition and phylogeny-based algorithm [Metagenome Composition Vector (MetaCV)] to classify very short metagenomic reads (75–100 bp) into specific taxonomic and functional groups. We applied MetaCV to the Meta-HIT data (371-Gb 75-bp reads of 109 human gut metagenomes), and this single-read-based, instead of assembly-based, classification has a high resolution to characterize the composition and structure of human gut microbiota, especially for low abundance species. Most strikingly, it only took MetaCV 10 days to do all the computation work on a server with five 24-core nodes. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time.  相似文献   

6.

Background

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.

Results

To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS – Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.

Conclusions

RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0503-6) contains supplementary material, which is available to authorized users.  相似文献   

7.
8.
Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.  相似文献   

9.
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences.  相似文献   

10.
SUMMARY: Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. AVAILABILITY: The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.  相似文献   

11.
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes
  1. a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
  2. a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
  3. two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
  4. effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/.  相似文献   

12.
The ribosomal rRNA genes are widely used as genetic markers for taxonomic identification of microbes. Particularly the small subunit (SSU; 16S/18S) rRNA gene is frequently used for species‐ or genus‐level identification, but also the large subunit (LSU; 23S/28S) rRNA gene is employed in taxonomic assignment. The metaxa software tool is a popular utility for extracting partial rRNA sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. This study describes a comprehensive update to metaxa – metaxa 2 – that extends the capabilities of the tool, introducing support for the LSU rRNA gene, a greatly improved classifier allowing classification down to genus or species level, as well as enhanced support for short‐read (100 bp) and paired‐end sequences, among other changes. The performance of metaxa 2 was compared to other commonly used taxonomic classifiers, showing that metaxa 2 often outperforms previous methods in terms of making correct predictions while maintaining a low misclassification rate. metaxa 2 is freely available from http://microbiology.se/software/metaxa2/ .  相似文献   

13.
14.
Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ∼10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.  相似文献   

15.

Background  

Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.  相似文献   

16.
Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different 'enterotypes'. RITA is freely available in Web server and standalone versions.  相似文献   

17.
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10 010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.  相似文献   

18.
Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.  相似文献   

19.
20.
Reddy RM  Mohammed MH  Mande SS 《Gene》2012,505(2):259-265
Phylogenetic assignment of individual sequence reads to their respective taxa, referred to as 'taxonomic binning', constitutes a key step of metagenomic analysis. Existing binning methods have limitations either with respect to time or accuracy/specificity of binning. Given these limitations, development of a method that can bin vast amounts of metagenomic sequence data in a rapid, efficient and computationally inexpensive manner can profoundly influence metagenomic analysis in computational resource poor settings. We introduce TWARIT, a hybrid binning algorithm, that employs a combination of short-read alignment and composition-based signature sorting approaches to achieve rapid binning rates without compromising on binning accuracy and specificity. TWARIT is validated with simulated and real-world metagenomes and the results demonstrate significantly lower overall binning times compared to that of existing methods. Furthermore, the binning accuracy and specificity of TWARIT are observed to be comparable/superior to them. A web server implementing TWARIT algorithm is available at http://metagenomics.atc.tcs.com/Twarit/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号