首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.

Result

Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.

Conclusion

Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.

Reviewers

This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
  相似文献   

2.

Background

Crohn’s disease is associated with gut dysbiosis. Independent studies have shown an increase in the abundance of certain bacterial species, particularly Escherichia coli with the adherent-invasive pathotype, in the gut. The role of these species in this disease needs to be elucidated.

Methods

We performed a metagenomic study investigating the gut microbiota of patients with Crohn’s disease. A metagenomic reconstruction of the consensus genome content of the species was used to assess the genetic variability.

Results

The abnormal shifts in the microbial community structures in Crohn’s disease were heterogeneous among the patients. The metagenomic data suggested the existence of multiple E. coli strains within individual patients. We discovered that the genetic diversity of the species was high and that only a few samples manifested similarity to the adherent-invasive varieties. The other species demonstrated genetic diversity comparable to that observed in the healthy subjects. Our results were supported by a comparison of the sequenced genomes of isolates from the same microbiota samples and a meta-analysis of published gut metagenomes.

Conclusions

The genomic diversity of Crohn’s disease-associated E. coli within and among the patients paves the way towards an understanding of the microbial mechanisms underlying the onset and progression of the Crohn’s disease and the development of new strategies for the prevention and treatment of this disease.
  相似文献   

3.

Objectives

To analyze the microbial diversity and gene content of a thermophilic cellulose-degrading consortium from hot springs in Xiamen, China using 454 pyrosequencing for discovering cellulolytic enzyme resources.

Results

A thermophilic cellulose-degrading consortium, XM70 that was isolated from a hot spring, used sugarcane bagasse as sole carbon and energy source. DNA sequencing of the XM70 sample resulted in 349,978 reads with an average read length of 380 bases, accounting for 133,896,867 bases of sequence information. The characterization of sequencing reads and assembled contigs revealed that most microbes were derived from four phyla: Geobacillus (Firmicutes), Thermus, Bacillus, and Anoxybacillus. Twenty-eight homologous genes belonging to 15 glycoside hydrolase families were detected, including several cellulase genes. A novel hot spring metagenome-derived thermophilic cellulase was expressed and characterized.

Conclusions

The application value of thermostable sugarcane bagasse-degrading enzymes is shown for production of cellulosic biofuel. The practical power of using a short-read-based metagenomic approach for harvesting novel microbial genes is also demonstrated.
  相似文献   

4.

Background

Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism.

Results

The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates.

Conclusions

In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/.
  相似文献   

5.

Background

With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes.

Results

We extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity.

Conclusions

TADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.
  相似文献   

6.

Background

Hot spring bacteria have unique biological adaptations to survive the extreme conditions of these environments; these bacteria produce thermostable enzymes that can be used in biotechnological and industrial applications. However, sequencing these bacteria is complex, since it is not possible to culture them. As an alternative, genome shotgun sequencing of whole microbial communities can be used. The problem is that the classification of sequences within a metagenomic dataset is very challenging particularly when they include unknown microorganisms since they lack genomic reference. We failed to recover a bacterium genome from a hot spring metagenome using the available software tools, so we develop a new tool that allowed us to recover most of this genome.

Results

We present a proteobacteria draft genome reconstructed from a Colombian’s Andes hot spring metagenome. The genome seems to be from a new lineage within the family Rhodanobacteraceae of the class Gammaproteobacteria, closely related to the genus Dokdonella. We were able to generate this genome thanks to CLAME. CLAME, from Spanish “CLAsificador MEtagenomico”, is a tool to group reads in bins. We show that most reads from each bin belong to a single chromosome. CLAME is very effective recovering most of the reads belonging to the predominant species within a metagenome.

Conclusions

We developed a tool that can be used to extract genomes (or parts of them) from a complex metagenome.
  相似文献   

7.

Background

More than 100 different pathogens can cause encephalitis. Testing of all the neurological pathogens by conventional methods can be difficult. Metagenomic next-generation sequencing (NGS) could identify the infectious agents in a target-independent manner. The role of this novel method in clinical diagnostic microbiology still needs to be evaluated. In present study, we used metagenomic NGS to search for an infectious etiology in a human immunodeficiency virus (HIV)-infected patient with lethally diffuse brain lesions. Sequences mapping to Toxoplasma gondii were unexpectedly detected.

Case presentation

A 31-year-old HIV-infected patient presented to hospital in a critical ill condition with a Glasgow coma scale score of 3. Brain magnetic resonance imaging showed diffuse brain abnormalities with contrast enhancement. Metagenomic NGS was performed on DNA extract from 300 μL patient’s cerebrospinal fluid (CSF) with the BGISEQ-50 platform. The sequencing detection identified 65,357 sequence reads uniquely aligned to the Toxoplasma gondii genome. Presence of Toxoplasma gondii genome in CSF was further verified by Toxoplasma gondii-specific polymerase chain reaction and Sanger sequencing. Altogether, those results confirmed the diagnosis of toxoplasmic encephalitis.

Conclusions

This study suggests that metagenomic NGS may be a useful diagnostic tool for toxoplasmic encephalitis. As metagenomic NGS is able to identify all pathogens in a single run, it may be a promising strategy to explore the clinical causative pathogens in central nervous system infections with atypical features.
  相似文献   

8.

Background

The microbial communities populating human and natural environments have been extensively characterized with shotgun metagenomics, which provides an in-depth representation of the microbial diversity within a sample. Microbes thriving in urban environments may be crucially important for human health, but have received less attention than those of other environments. Ongoing efforts started to target urban microbiomes at a large scale, but the most recent computational methods to profile these metagenomes have never been applied in this context. It is thus currently unclear whether such methods, that have proven successful at distinguishing even closely related strains in human microbiomes, are also effective in urban settings for tasks such as cultivation-free pathogen detection and microbial surveillance. Here, we aimed at a) testing the currently available metagenomic profiling tools on urban metagenomics; b) characterizing the organisms in urban environment at the resolution of single strain and c) discussing the biological insights that can be inferred from such methods.

Results

We applied three complementary methods on the 1614 metagenomes of the CAMDA 2017 challenge. With MetaMLST we identified 121 known sequence-types from 15 species of clinical relevance. For instance, we identified several Acinetobacter strains that were close to the nosocomial opportunistic pathogen A. nosocomialis. With StrainPhlAn, a generalized version of the MetaMLST approach, we inferred the phylogenetic structure of Pseudomonas stutzeri strains and suggested that the strain-level heterogeneity in environmental samples is higher than in the human microbiome. Finally, we also probed the functional potential of the different strains with PanPhlAn. We further showed that SNV-based and pangenome-based profiling provide complementary information that can be combined to investigate the evolutionary trajectories of microbes and to identify specific genetic determinants of virulence and antibiotic resistances within closely related strains.

Conclusion

We show that strain-level methods developed primarily for the analysis of human microbiomes can be effective for city-associated microbiomes. In fact, (opportunistic) pathogens can be tracked and monitored across many hundreds of urban metagenomes. However, while more effort is needed to profile strains of currently uncharacterized species, this work poses the basis for high-resolution analyses of microbiomes sampled in city and mass transportation environments.

Reviewers

This article was reviewed by Alexandra Bettina Graf, Daniel Huson and Trevor Cickovski.
  相似文献   

9.
10.

Background

A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed.

Results

We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ? each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{? σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure.

Conclusions

Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
  相似文献   

11.

Background and aims

Bacterial Non-Specific Acid Phosphatase (NSAP) enzymes are capable of dephosphorylating diverse organic phosphoesters but are rarely studied: their distribution in natural and managed environments is poorly understood. The aim of this study was to generate new insight into the environmental distribution of NSAPs and establish their potential global relevance to cycling of organic phosphorus.

Methods

We employed bioinformatic tools to determine NSAP diversity and subcellular localization in microbial genomes; used the corresponding NSAP gene sequences to census metagenomes from diverse ecosystems; studied the effect of long-term land management upon NSAP diversity and abundance.

Results

Periplasmic class B NSAPs are poorly represented in marine and terrestrial environments, reflecting their association with enteric and pathogenic bacteria. Periplasmic class A and outer membrane-associated class C NSAPs are cosmopolitan. NSAPs are more abundant in marine than terrestrial ecosystems and class C more abundant than class A genes, except in an acidic peat where class A genes dominate. A clear effect of land management upon gene abundance was identified.

Conclusions

NSAP genes are cosmopolitan. Class C genes are more widely distributed: their association with the outer-membrane of cells gives them a clear role in the cycling of organic phosphorus, particularly in soils.
  相似文献   

12.
13.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

14.

Purpose of Review

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens.

Recent Findings

Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host.

Summary

Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
  相似文献   

15.

Background

Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies.

Results

We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm.

Conclusions

Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
  相似文献   

16.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

17.
Gao S  Xu S  Fang Y  Fang J 《Proteome science》2012,10(Z1):S7

Background

Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.

Methods

A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).

Results

Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.

Conclusions

The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
  相似文献   

18.
19.

Background

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.

Results

In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.

Conclusion

Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.
  相似文献   

20.

Background

In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.

Methods

The physical basis of these intuitive maps is clarified by means of analytically solvable problems.

Results

Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.

Conclusion

Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号