期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin

Sharma VK Kumar N Prakash T Taylor TD 《PloS one》2012,7(4):e34030

Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/. 相似文献

2.

GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis

Fahong Yu Yijun Sun Li Liu William Farmerie 《Bioinformation》2009,4(1):46-49

GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes

a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.

GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/. 相似文献

3.

SPHINX--an algorithm for taxonomic binning of metagenomic sequences

Mohammed MH Ghosh TS Singh NK Mande SS 《Bioinformatics (Oxford, England)》2011,27(1):22-30

相似文献

4.

Crass: identification and reconstruction of CRISPR from unassembled metagenomic data

Connor T. Skennerton Michael Imelfort Gene W. Tyson 《Nucleic acids research》2013,41(10):e105

Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities. 相似文献

5.

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks

Gori F Folino G Jetten MS Marchiori E 《Bioinformatics (Oxford, England)》2011,27(2):196-203

相似文献

6.

Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Shibu Yooseph Weizhong Li Granger Sutton 《BMC bioinformatics》2008,9(1):182

Background

The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. 相似文献

7.

Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

Liu B Gibbons T Ghodsi M Treangen T Pop M 《BMC genomics》2011,12(Z2):S4

相似文献

8.

Rapid identification,differentiation, and proposed new taxonomic classification of Bifidobacterium lactis

Ventura M Zink R 《Applied and environmental microbiology》2002,68(12):6429-6434

Identification of Bifidobacterium lactis and Bifidobacterium animalis is problematic because of phenotypic and genetic homogeneities and has raised the question of whether they belong to one unique taxon. Analysis of the 16S-23S internally transcribed spacer region of B. lactis DSM10140(T), B. animalis ATCC 25527(T), and six potential B. lactis strains suggested two distinct clusters. Two specific 16S-23S spacer rRNA gene-targeted primers have been developed for specific detection of B. animalis. All of the molecular techniques used (B. lactis or B. animalis PCR primers, enterobacterial repetitive intergenic consensus PCR) demonstrated that B. lactis and B. animalis form two main groups and suggest a revision of the strains assigned to B. animalis. We propose that B. lactis should be separated from B. animalis at the subspecies level. 相似文献

9.

NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads 总被引：1，自引：0，他引：1

Rosen GL Reichenberger ER Rosenfeld AM 《Bioinformatics (Oxford, England)》2011,27(1):127-129

相似文献

10.

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

Mohammed MH Ghosh TS Reddy RM Reddy CV Singh NK Mande SS 《BMC genomics》2011,12(Z3):S4

相似文献

11.

Meta-IDBA: a de Novo assembler for metagenomic data 总被引：1，自引：0，他引：1

Peng Y Leung HC Yiu SM Chin FY 《Bioinformatics (Oxford, England)》2011,27(13):i94-101

相似文献

12.

SPA: a short peptide assembler for metagenomic data

Youngik Yang Shibu Yooseph 《Nucleic acids research》2013,41(8):e91

相似文献

13.

WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

Wolfgang Gerlach Sebastian Jünemann Felix Tille Alexander Goesmann Jens Stoye 《BMC bioinformatics》2009,10(1):430

Background

Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. 相似文献

14.

Ab initio gene identification in metagenomic sequences 总被引：1，自引：0，他引：1

Wenhan Zhu Alexandre Lomsadze Mark Borodovsky 《Nucleic acids research》2010,38(12):e132

We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes. 相似文献

15.

Standard data model representation for taxonomic information

Kennedy J Hyam R Kukla R Paterson T 《Omics : a journal of integrative biology》2006,10(2):220-230

The names used by biologists to label the observations they make are imprecise. This is an issue as workers increasingly seek to exploit data gathered from multiple, unrelated sources on line. Even when the international codes of nomenclature are followed strictly the resulting names (Taxon Names) do not uniquely identify the taxa (Taxon Concepts) that have been described by taxonomists but merely groups of type specimens. A standard data model for exchange of taxonomic information is described. It addresses this issue by facilitating explicit communication of information about Taxon Concepts and their associated names. A representation of this model as a XML Schema is introduced and the implications of the use of Globally Unique Identifiers discussed. 相似文献

16.

The tiny mayfly in the room: implications of size-dependent invertebrate taxonomic identification for biomonitoring data properties

Jessica M. Orlofske Donald J. Baird 《Aquatic Ecology》2013,47(4):481-494

相似文献

17.

Reconstruction of ribosomal RNA genes from metagenomic data

Fan L McElroy K Thomas T 《PloS one》2012,7(6):e39948

Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes. 相似文献

18.

Programs for evaluating and characterising bacterial taxonomic data

Bryant T. N. 《Bioinformatics (Oxford, England)》1987,3(1):45-48

Three programs are described for evaluating and characterisingdata collected during numerical taxonomic studies of bacteria.The program VARIANCE compares replicate cultures and evaluatesthe reproducibility of each character. Also it identifies thosecharacters that should be excluded from subsequent taxonomicanalysis because of their poor reproducibility. GPROPS summarisesthe properties of clusters of strains that have been definedfrom a cluster analysis, it can produce a probabilistic identificationmatrix and compares each strain within a cluster with the HypotheticalMean Organism (HMO) of that cluster. OVCLUST is an implementationof the program described by Sneath (1979) which calculates overlapstatistics between major clusters. These programs are designedto complement the CLUSTAN package (Wishart, 1982) which is oftenused for cluster analysis of bacterial taxonomic data. The programswere written in FORTRAN 77 and implemented on an IBM PC usingMS–DOS. Received on November 13, 1986; accepted on January 8, 1987 相似文献

19.

A simple method, using stripdex equipment, for the assessment of yeast taxonomic data and identification keys

R R Davenport 《The Journal of applied bacteriology》1974,37(2):269-271

相似文献

20.

Efficient identification of amino acid types for fast protein backbone assignments

Horng D. Ou Helen C. Lai Zach Serber Volker Dötsch 《Journal of biomolecular NMR》2001,21(3):269-273

We describe a procedure that allows for very efficient identification of amino acid types in proteins by selective ¹⁵N-labeling. The usefulness of selective incorporation of ¹⁵N-labeled amino acids into proteins for the backbone assignment has been recognized for several years. However, widespread use of this method has been hindered by the need to purify each selectively labeled sample and by the relatively high cost of labeling with ¹⁵N-labeled amino acids. Here we demonstrate that purification of the selectively ¹⁵N-labeled samples is not necessary and that background-free HSQC spectra containing only the peaks of the overexpressed heterologous protein can be obtained in crude lysates from as little as 100 ml cultures, thus saving time and money. This method can be used for fast and automated backbone assignment of proteins. 相似文献