共查询到20条相似文献,搜索用时 31 毫秒
1.
Background
Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.Result
Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.Conclusion
Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.Reviewers
This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.2.
Shuang Wang Jihoon Kim Xiaoqian Jiang Stefan F Brunner Lucila Ohno-Machado 《BMC medical genomics》2014,7(Z1):S9
Background
Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing.Methods
Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., ≤ 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair.Results
Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original miRanda implementations through multiple test datasets.Conclusions
We offer a GPU-based alternative to high performance compute (HPC) that can be developed locally at a relatively small cost. The community of GPU developers in the biomedical research community, particularly for genome analysis, is still growing. With increasing shared resources, this community will be able to advance CMTI in a very significant manner. Our source code is available at https://sourceforge.net/projects/cudamiranda/.3.
4.
D. Jacob C. Deborde M. Lefebvre M. Maucourt A. Moing 《Metabolomics : Official journal of the Metabolomic Society》2017,13(4):36
Introduction
Concerning NMR-based metabolomics, 1D spectra processing often requires an expert eye for disentangling the intertwined peaks.Objectives
The objective of NMRProcFlow is to assist the expert in this task in the best way without requirement of programming skills.Methods
NMRProcFlow was developed to be a graphical and interactive 1D NMR (1H & 13C) spectra processing tool.Results
NMRProcFlow (http://nmrprocflow.org), dedicated to metabolic fingerprinting and targeted metabolomics, covers all spectra processing steps including baseline correction, chemical shift calibration and alignment.Conclusion
Biologists and NMR spectroscopists can easily interact and develop synergies by visualizing the NMR spectra along with their corresponding experimental-factor levels, thus setting a bridge between experimental design and subsequent statistical analyses.5.
N. Cesbron A.-L. Royer Y. Guitton A. Sydor B. Le Bizec G. Dervilly-Pinel 《Metabolomics : Official journal of the Metabolomic Society》2017,13(8):99
Introduction
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.6.
Background
Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.Results
Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.Conclusions
Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.7.
Background
Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs.Methods
In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori.Results
We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis.Conclusions
The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.8.
Rachel A. Spicer Christoph Steinbeck 《Metabolomics : Official journal of the Metabolomic Society》2018,14(1):16
Introduction
Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.Objectives
(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.Methods
A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.Results
Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.Conclusion
Further efforts are required to improve data sharing in metabolomics.9.
Miriam Banas Sindy Neumann Johannes Eiglsperger Eric Schiffer Franz Josef Putz Simone Reichelt-Wurm Bernhard Karl Krämer Philipp Pagel Bernhard Banas 《Metabolomics : Official journal of the Metabolomic Society》2018,14(9):116
Introduction
Allograft rejection is still an important complication after kidney transplantation. Currently, monitoring of these patients mostly relies on the measurement of serum creatinine and clinical evaluation. The gold standard for diagnosing allograft rejection, i.e. performing a renal biopsy is invasive and expensive. So far no adequate biomarkers are available for routine use.Objectives
We aimed to develop a urine metabolite constellation that is characteristic for acute renal allograft rejection.Methods
NMR-Spectroscopy was applied to a training cohort of transplant recipients with and without acute rejection.Results
We obtained a metabolite constellation of four metabolites that shows promising performance to detect renal allograft rejection in the cohorts used (AUC of 0.72 and 0.74, respectively).Conclusion
A metabolite constellation was defined with the potential for further development of an in-vitro diagnostic test that can support physicians in their clinical assessment of a kidney transplant patient.10.
Background
To reproduce and report a bioinformatics analysis, it is important to be able to determine the environment in which a program was run. It can also be valuable when trying to debug why different executions are giving unexpectedly different results.Results
Log::ProgramInfo is a Perl module that writes a log file at the termination of execution of the enclosing program, to document useful execution characteristics. This log file can be used to re-create the environment in order to reproduce an earlier execution. It can also be used to compare the environments of two executions to determine whether there were any differences that might affect (or explain) their operation.Availability
The source is available on CPAN (Macdonald and Boutros, Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/).Conclusion
Using Log::ProgramInfo in programs creating result data for publishable research, and including the Log::ProgramInfo output log as part of the publication of that research is a valuable method to assist others to duplicate the programming environment as a precursor to validating and/or extending that research.11.
Background
In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.Methods
The physical basis of these intuitive maps is clarified by means of analytically solvable problems.Results
Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.Conclusion
Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.12.
Introduction
Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.Objectives
Assessment of the quality of raw data processing in untargeted metabolomics.Methods
Five published untargeted metabolomics studies, were reanalyzed.Results
Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.Conclusion
Incomplete raw data processing shows unexplored potential of current and legacy data.13.
Background
Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.Methods
A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).Results
Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.Conclusions
The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.14.
Renato de Souza Pinto Lemgruber Kaspar Valgepea Mark P. Hodson Ryan Tappel Sean D. Simpson Michael Köpke Lars K. Nielsen Esteban Marcellin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(3):35
Introduction
Quantification of tetrahydrofolates (THFs), important metabolites in the Wood–Ljungdahl pathway (WLP) of acetogens, is challenging given their sensitivity to oxygen.Objective
To develop a simple anaerobic protocol to enable reliable THFs quantification from bioreactors.Methods
Anaerobic cultures were mixed with anaerobic acetonitrile for extraction. Targeted LC–MS/MS was used for quantification.Results
Tetrahydrofolates can only be quantified if sampled anaerobically. THF levels showed a strong correlation to acetyl-CoA, the end product of the WLP.Conclusion
Our method is useful for relative quantification of THFs across different growth conditions. Absolute quantification of THFs requires the use of labelled standards.15.
Marta Michalczuk Beata Urban Tadeusz Porowski Anna Wasilewska Alina Bakunowicz-Łazarczyk 《Metabolomics : Official journal of the Metabolomic Society》2018,14(6):82
Introduction
Citrate is an old metabolite which is best known for the role in the Krebs cycle. Citrate is widely used in many branches of medicine. In ophthalmology citrate is considered as a therapeutic agent and an useful diagnostic tool—biomarker.Objectives
To summarize the published literature on citrate usage in the leading causes of blindness and highlight the new possibilities for this old metabolite.Methods
We conducted a systematic search of the scientific literature about citrate usage in ophthalmology up to January 2018. The reference lists of identified articles were searched for providing in-depth information.Results
This systematic review included 30 articles. The role of citrate in the leading causes of blindness is presented.Conclusions
Citrate might help inhibit cataract progression, in case of questions confirm glaucoma diagnosis or improve cornea repair treatment as adjuvant agent (therapy of ulcerating cornea after alkali injury, crosslinking procedure). However, the knowledge about possible citrate usage in ophthalmology is not widely known. Promoting recent scientific knowledge about citrate usage in ophthalmology may not only benefit of medical improvement but may also limit economic costs caused by leading causes of blindness. Further studies on citrate usage in ophthalmology should continuously be the field of scientific interest.16.
Ferran Casbas Pinto Srinivarao Ravipati David A. Barrett T. Charles Hodgman 《Metabolomics : Official journal of the Metabolomic Society》2017,13(7):81
Introduction
It is difficult to elucidate the metabolic and regulatory factors causing lipidome perturbations.Objectives
This work simplifies this process.Methods
A method has been developed to query an online holistic lipid metabolic network (of 7923 metabolites) to extract the pathways that connect the input list of lipids.Results
The output enables pathway visualisation and the querying of other databases to identify potential regulators. When used to a study a plasma lipidome dataset of polycystic ovary syndrome, 14 enzymes were identified, of which 3 are linked to ELAVL1—an mRNA stabiliser.Conclusion
This method provides a simplified approach to identifying potential regulators causing lipid-profile perturbations.17.
Background
One of the recent challenges of computational biology is development of new algorithms, tools and software to facilitate predictive modeling of big data generated by high-throughput technologies in biomedical research.Results
To meet these demands we developed PROPER - a package for visual evaluation of ranking classifiers for biological big data mining studies in the MATLAB environment.Conclusion
PROPER is an efficient tool for optimization and comparison of ranking classifiers, providing over 20 different two- and three-dimensional performance curves.18.
Jamie V. de Seymour Stephanie Tu Xiaoling He Hua Zhang Ting-Li Han Philip N. Baker Karolina Sulek 《Metabolomics : Official journal of the Metabolomic Society》2018,14(6):79
Introduction
Intrahepatic cholestasis of pregnancy (ICP) is a common maternal liver disease; development can result in devastating consequences, including sudden fetal death and stillbirth. Currently, recognition of ICP only occurs following onset of clinical symptoms.Objective
Investigate the maternal hair metabolome for predictive biomarkers of ICP.Methods
The maternal hair metabolome (gestational age of sampling between 17 and 41 weeks) of 38 Chinese women with ICP and 46 pregnant controls was analysed using gas chromatography–mass spectrometry.Results
Of 105 metabolites detected in hair, none were significantly associated with ICP.Conclusion
Hair samples represent accumulative environmental exposure over time. Samples collected at the onset of ICP did not reveal any metabolic shifts, suggesting rapid development of the disease.19.
Sonia Liggi Christine Hinz Zoe Hall Maria Laura Santoru Simone Poddighe John Fjeldsted Luigi Atzori Julian L. Griffin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(4):52
Introduction
Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.Objectives
Merge in the same platform the steps required for metabolomics data processing.Methods
KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.Results
The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.Conclusion
KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.20.