首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

2.

Background

Crohn’s disease is associated with gut dysbiosis. Independent studies have shown an increase in the abundance of certain bacterial species, particularly Escherichia coli with the adherent-invasive pathotype, in the gut. The role of these species in this disease needs to be elucidated.

Methods

We performed a metagenomic study investigating the gut microbiota of patients with Crohn’s disease. A metagenomic reconstruction of the consensus genome content of the species was used to assess the genetic variability.

Results

The abnormal shifts in the microbial community structures in Crohn’s disease were heterogeneous among the patients. The metagenomic data suggested the existence of multiple E. coli strains within individual patients. We discovered that the genetic diversity of the species was high and that only a few samples manifested similarity to the adherent-invasive varieties. The other species demonstrated genetic diversity comparable to that observed in the healthy subjects. Our results were supported by a comparison of the sequenced genomes of isolates from the same microbiota samples and a meta-analysis of published gut metagenomes.

Conclusions

The genomic diversity of Crohn’s disease-associated E. coli within and among the patients paves the way towards an understanding of the microbial mechanisms underlying the onset and progression of the Crohn’s disease and the development of new strategies for the prevention and treatment of this disease.
  相似文献   

3.

Background

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.

Result

Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.

Conclusion

Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.

Reviewers

This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
  相似文献   

4.

Background

The 16S rRNA gene-based amplicon sequencing analysis is widely used to determine the taxonomic composition of microbial communities. Once the taxonomic composition of each community is obtained, evolutionary relationships among taxa are inferred by a phylogenetic tree. Thus, the combined representation of taxonomic composition and phylogenetic relationships among taxa is a powerful method for understanding microbial community structure; however, applying phylogenetic tree-based representation with information on the abundance of thousands or more taxa in each community is a difficult task. For this purpose, we previously developed the tool VITCOMIC (VIsualization tool for Taxonomic COmpositions of MIcrobial Community), which is based on the genome-sequenced microbes’ phylogenetic information. Here, we introduce VITCOMIC2, which incorporates substantive improvements over VITCOMIC that were necessary to address several issues associated with 16S rRNA gene-based analysis of microbial communities.

Results

We developed VITCOMIC2 to provide (i) sequence identity searches against broad reference taxa including uncultured taxa; (ii) normalization of 16S rRNA gene copy number differences among taxa; (iii) rapid sequence identity searches by applying the graphics processing unit-based sequence identity search tool CLAST; (iv) accurate taxonomic composition inference and nearly full-length 16S rRNA gene sequence reconstructions for metagenomic shotgun sequencing; and (v) an interactive user interface for simultaneous representation of the taxonomic composition of microbial communities and phylogenetic relationships among taxa. We validated the accuracy of processes (ii) and (iv) by using metagenomic shotgun sequencing data from a mock microbial community.

Conclusions

The improvements incorporated into VITCOMIC2 enable users to acquire an intuitive understanding of microbial community composition based on the 16S rRNA gene sequence data obtained from both metagenomic shotgun and amplicon sequencing.
  相似文献   

5.

Background

Flow cytometry, with its high throughput nature, combined with the ability to measure an increasing number of cell parameters at once can surpass the throughput of prevalent genomic and metagenomic approaches in the study of microbiomes. Novel computational approaches to analyze flow cytometry data will result in greater insights and actionability as compared to traditional tools used in the analysis of microbiomes. This paper is a demonstration of the fruitfulness of machine learning in analyzing microbial flow cytometry data generated in anaerobic microbiome perturbation experiments.

Results

Autoencoders were found to be powerful in detecting anomalies in flow cytometry data from nanoparticles and carbon sources perturbed anaerobic microbiomes but was marginal in predicting perturbations due to antibiotics. A comparison between different algorithms based on predictive capabilities suggested that gradient boosting (GB) and deep learning, i.e. feed forward artificial neural network with three hidden layers (DL) were marginally better under tested conditions at predicting overall community structure while distributed random forests (DRF) worked better for predicting the most important putative microbial group(s) in the anaerobic digesters viz. methanogens, and it can be optimized with better parameter tuning. Predictive classification patterns with DL (feed forward artificial neural network with three hidden layers) were found to be comparable to previously demonstrated multivariate analysis. The potential applications of this approach have been demonstrated for monitoring the syntrophic resilience of the anaerobic microbiomes perturbed by synthetic nanoparticles as well as antibiotics.

Conclusion

Machine learning can benefit the microbial flow cytometry research community by providing rapid screening and characterization tools to discover patterns in the dynamic response of microbiomes to several stimuli.
  相似文献   

6.

Introduction

Tandem mass spectrometry (MS/MS) has been widely used for identifying metabolites in many areas. However, computationally identifying metabolites from MS/MS data is challenging due to the unknown of fragmentation rules, which determine the precedence of chemical bond dissociation. Although this problem has been tackled by different ways, the lack of computational tools to flexibly represent adjacent structures of chemical bonds is still a long-term bottleneck for studying fragmentation rules.

Objectives

This study aimed to develop computational methods for investigating fragmentation rules by analyzing annotated MS/MS data.

Methods

We implemented a computational platform, MIDAS-G, for investigating fragmentation rules. MIDAS-G processes a metabolite as a simple graph and uses graph grammars to recognize specific chemical bonds and their adjacent structures. We can apply MIDAS-G to investigate fragmentation rules by adjusting bond weights in the scoring model of the metabolite identification tool and comparing metabolite identification performances.

Results

We used MIDAS-G to investigate four bond types on real annotated MS/MS data in experiments. The experimental results matched data collected from wet labs and literature. The effectiveness of MIDAS-G was confirmed.

Conclusion

We developed a computational platform for investigating fragmentation rules of tandem mass spectrometry. This platform is freely available for download.
  相似文献   

7.

Background

One of the recent challenges of computational biology is development of new algorithms, tools and software to facilitate predictive modeling of big data generated by high-throughput technologies in biomedical research.

Results

To meet these demands we developed PROPER - a package for visual evaluation of ranking classifiers for biological big data mining studies in the MATLAB environment.

Conclusion

PROPER is an efficient tool for optimization and comparison of ranking classifiers, providing over 20 different two- and three-dimensional performance curves.
  相似文献   

8.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

9.
Zheng D  Gerstein MB 《Genome biology》2006,7(Z1):S13.1-S1310

Background

Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions).

Results

Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications.

Conclusion

Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.
  相似文献   

10.

Background

Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.

Results

Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.

Conclusions

RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.
  相似文献   

11.

Introduction

Mass spectrometry imaging (MSI) experiments result in complex multi-dimensional datasets, which require specialist data analysis tools.

Objectives

We have developed massPix—an R package for analysing and interpreting data from MSI of lipids in tissue.

Methods

massPix produces single ion images, performs multivariate statistics and provides putative lipid annotations based on accurate mass matching against generated lipid libraries.

Results

Classification of tissue regions with high spectral similarly can be carried out by principal components analysis (PCA) or k-means clustering.

Conclusion

massPix is an open-source tool for the analysis and statistical interpretation of MSI data, and is particularly useful for lipidomics applications.
  相似文献   

12.

Background

Viral vaccine target discovery requires understanding the diversity of both the virus and the human immune system. The readily available and rapidly growing pool of viral sequence data in the public domain enable the identification and characterization of immune targets relevant to adaptive immunity. A systematic bioinformatics approach is necessary to facilitate the analysis of such large datasets for selection of potential candidate vaccine targets.

Results

This work describes a computational methodology to achieve this analysis, with data of dengue, West Nile, hepatitis A, HIV-1, and influenza A viruses as examples. Our methodology has been implemented as an analytical pipeline that brings significant advancement to the field of reverse vaccinology, enabling systematic screening of known sequence data in nature for identification of vaccine targets. This includes key steps (i) comprehensive and extensive collection of sequence data of viral proteomes (the virome), (ii) data cleaning, (iii) large-scale sequence alignments, (iv) peptide entropy analysis, (v) intra- and inter-species variation analysis of conserved sequences, including human homology analysis, and (vi) functional and immunological relevance analysis.

Conclusion

These steps are combined into the pipeline ensuring that a more refined process, as compared to a simple evolutionary conservation analysis, will facilitate a better selection of vaccine targets and their prioritization for subsequent experimental validation.
  相似文献   

13.

Background

The clinical decision support system can effectively break the limitations of doctors’ knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.

Methods

In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. And a prototype is implemented based on cloud computing technology. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark.

Results

Experiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core.

Conclusions

GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.
  相似文献   

14.

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.
  相似文献   

15.

Objective

To fabricate a novel microbial photobioelectrochemical cell using silicon microfabrication techniques.

Results

High-density photosynthetic cells were immobilized in a microfluidic chamber, and ultra-microelectrodes in a microtip array were inserted into the cytosolic space of the cells to directly harvest photosynthetic electrons. In this way, the microbial photobioelectrochemical cell operated without the aid of electron mediators. Both short circuit current and open circuit voltage of the microbial photobioelectrochemical cell responded to light stimuli, and recorded as high as 250 pA and 45 mV, respectively.

Conclusion

A microbial photobioelectrochemical cell was fabricated with potential use in next-generation photosynthesis-based solar cells and sensors.
  相似文献   

16.

Background

Fragment-based approaches have now become an important component of the drug discovery process. At the same time, pharmaceutical chemists are more often turning to the natural world and its extremely large and diverse collection of natural compounds to discover new leads that can potentially be turned into drugs. In this study we introduce and discuss a computational pipeline to automatically extract statistically overrepresented chemical fragments in therapeutic classes, and search for similar fragments in a large database of natural products. By systematically identifying enriched fragments in therapeutic groups, we are able to extract and focus on few fragments that are likely to be active or structurally important.

Results

We show that several therapeutic classes (including antibacterial, antineoplastic, and drugs active on the cardiovascular system, among others) have enriched fragments that are also found in many natural compounds. Further, our method is able to detect fragments shared by a drug and a natural product even when the global similarity between the two molecules is generally low.

Conclusions

A further development of this computational pipeline is to help predict putative therapeutic activities of natural compounds, and to help identify novel leads for drug discovery.
  相似文献   

17.

Background and aims

Biocrust morphology is often used to infer ecological function, but morphologies vary widely in pigmentation and thickness. Little is known about the links between biocrust morphology and the composition of constituent microbial community. This study aimed to examine these links using dryland crusts varying in stage and morphology.

Methods

We compared the microbial composition of three biocrust developmental stages (Early, Mid, Late) with bare soil (Bare) using high Miseq Illumina sequencing. We used standard diversity measures and network analysis to explore how microbe-microbe associations changed with biocrust stage.

Results

Biocrust richness and diversity increased with increasing stage, and there were marked differences in the microbial signatures among stages. Bare and Late stages were dominated by Alphaproteobacteria, but Cyanobacteria was the dominant phylum in Early and Mid stages. The greatest differences in microbial taxa were between Bare and Late stages. Network analysis indicated highly-connected hubs indicative of small networks.

Conclusions

Our results indicate that readily discernible biocrust features may be good indicators of microbial composition and structure. These findings are important for land managers seeking to use biocrusts as indicators of ecosystem health and function. Treating biocrusts as a single unit without considering crust stage is likely to provide misleading information on their functional roles.
  相似文献   

18.

Introduction

The human gut microbes and their metabolites are involved in multiple host metabolic pathways. Dysbiosis in the gut microbiota and altered metabolite profiles were reported in diseased state. In a region like Assam, where 12.4% of the populations are tribal population, evaluating the influence of ethnicity on gut microbiota and metabolites has become important to further differentiate it from the diseased state.

Objective

To study the influence of ethnicity on fecal metabolite profile and their association with the gut microbiota composition.

Methods

In this study, we determined the untargeted fecal metabolites from five ethnic groups of Assam (Tai-Aiton, Bodo, Karbi, Tea-tribe and Tai-Phake) using GC–MS and compared them among the tribes for common and unique metabolites. Metabolites of microbial origin were related with the available metagenomic data on gut bacterial profiles of the same ethnic groups and functional analysis were carried out based on HMDB.

Results

The core fecal metabolite profile of the Tea-tribe contained aniline, benzoate and acetaldehyde. PLS-DA based on the metabolites suggested that the individuals grouped based on their ethnicity. PCA plot of the data on bacterial abundance at the level of genus indicated clustering of individuals based on ethnicity. Positive correlations were observed between propionic acid and the genus Clostridium (R?=?0.43 and p?=?0.03), butyric acid and the genus Lactobacillus (R?=?0.45 and p?=?0.024), acetic acid and the genus Bacteroides (R?=?0.63 and p?=?0.001) and methane and the genus Escherichia (R?=?0.58 and p?=?0.002).

Conclusion

Results of this study indicated that ethnicity influences both gut bacterial profile and their metabolites.
  相似文献   

19.

Introduction

Microbial cells secrete many metabolites during growth, including important intermediates of the central carbon metabolism. This has not been taken into account by researchers when modeling microbial metabolism for metabolic engineering and systems biology studies.

Materials and Methods

The uptake of metabolites by microorganisms is well studied, but our knowledge of how and why they secrete different intracellular compounds is poor. The secretion of metabolites by microbial cells has traditionally been regarded as a consequence of intracellular metabolic overflow.

Conclusions

Here, we provide evidence based on time-series metabolomics data that microbial cells eliminate some metabolites in response to environmental cues, independent of metabolic overflow. Moreover, we review the different mechanisms of metabolite secretion and explore how this knowledge can benefit metabolic modeling and engineering.
  相似文献   

20.

Background

Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs.

Methods

In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori.

Results

We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis.

Conclusions

The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号