首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.  相似文献   

2.
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes
  1. a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
  2. a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
  3. two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
  4. effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/.  相似文献   

3.
4.
Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.  相似文献   

5.
6.

Background  

The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.  相似文献   

7.
8.
Identification of Bifidobacterium lactis and Bifidobacterium animalis is problematic because of phenotypic and genetic homogeneities and has raised the question of whether they belong to one unique taxon. Analysis of the 16S-23S internally transcribed spacer region of B. lactis DSM10140(T), B. animalis ATCC 25527(T), and six potential B. lactis strains suggested two distinct clusters. Two specific 16S-23S spacer rRNA gene-targeted primers have been developed for specific detection of B. animalis. All of the molecular techniques used (B. lactis or B. animalis PCR primers, enterobacterial repetitive intergenic consensus PCR) demonstrated that B. lactis and B. animalis form two main groups and suggest a revision of the strains assigned to B. animalis. We propose that B. lactis should be separated from B. animalis at the subspecies level.  相似文献   

9.
10.
11.
Meta-IDBA: a de Novo assembler for metagenomic data   总被引:1,自引:0,他引:1  
  相似文献   

12.
13.

Background  

Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads.  相似文献   

14.
Ab initio gene identification in metagenomic sequences   总被引:1,自引:0,他引:1  
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.  相似文献   

15.
The names used by biologists to label the observations they make are imprecise. This is an issue as workers increasingly seek to exploit data gathered from multiple, unrelated sources on line. Even when the international codes of nomenclature are followed strictly the resulting names (Taxon Names) do not uniquely identify the taxa (Taxon Concepts) that have been described by taxonomists but merely groups of type specimens. A standard data model for exchange of taxonomic information is described. It addresses this issue by facilitating explicit communication of information about Taxon Concepts and their associated names. A representation of this model as a XML Schema is introduced and the implications of the use of Globally Unique Identifiers discussed.  相似文献   

16.
17.
Fan L  McElroy K  Thomas T 《PloS one》2012,7(6):e39948
Direct sequencing of environmental DNA (metagenomics) has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.  相似文献   

18.
Three programs are described for evaluating and characterisingdata collected during numerical taxonomic studies of bacteria.The program VARIANCE compares replicate cultures and evaluatesthe reproducibility of each character. Also it identifies thosecharacters that should be excluded from subsequent taxonomicanalysis because of their poor reproducibility. GPROPS summarisesthe properties of clusters of strains that have been definedfrom a cluster analysis, it can produce a probabilistic identificationmatrix and compares each strain within a cluster with the HypotheticalMean Organism (HMO) of that cluster. OVCLUST is an implementationof the program described by Sneath (1979) which calculates overlapstatistics between major clusters. These programs are designedto complement the CLUSTAN package (Wishart, 1982) which is oftenused for cluster analysis of bacterial taxonomic data. The programswere written in FORTRAN 77 and implemented on an IBM PC usingMS–DOS. Received on November 13, 1986; accepted on January 8, 1987  相似文献   

19.
20.
We describe a procedure that allows for very efficient identification of amino acid types in proteins by selective 15N-labeling. The usefulness of selective incorporation of 15N-labeled amino acids into proteins for the backbone assignment has been recognized for several years. However, widespread use of this method has been hindered by the need to purify each selectively labeled sample and by the relatively high cost of labeling with 15N-labeled amino acids. Here we demonstrate that purification of the selectively 15N-labeled samples is not necessary and that background-free HSQC spectra containing only the peaks of the overexpressed heterologous protein can be obtained in crude lysates from as little as 100 ml cultures, thus saving time and money. This method can be used for fast and automated backbone assignment of proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号