共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Sequence alignment data is often ordered by coordinate (id of the reference sequence plus position on the sequence where the fragment was mapped) when stored in BAM files, as this simplifies the extraction of variants between the mapped data and the reference or of variants within the mapped data. In this order paired reads are usually separated in the file, which complicates some other applications like duplicate marking or conversion to the FastQ format which require to access the full information of the pairs.Results
In this paper we introduce biobambam, a set of tools based on the efficient collation of alignments in BAM files by read name. The employed collation algorithm avoids time and space consuming sorting of alignments by read name where this is possible without using more than a specified amount of main memory. Using this algorithm tasks like duplicate marking in BAM files and conversion of BAM files to the FastQ format can be performed very efficiently with limited resources. We also make the collation algorithm available in the form of an API for other projects. This API is part of the libmaus package.Conclusions
In comparison with previous approaches to problems involving the collation of alignments by read name like the BAM to FastQ or duplication marking utilities our approach can often perform an equivalent task more efficiently in terms of the required main memory and run-time. Our BAM to FastQ conversion is faster than all widely known alternatives including Picard and bamUtil. Our duplicate marking is about as fast as the closest competitor bamUtil for small data sets and faster than all known alternatives on large and complex data sets.2.
John M. Gaspar 《BMC bioinformatics》2018,19(1):536
Background
Advances in Illumina DNA sequencing technology have produced longer paired-end reads that increasingly have sequence overlaps. These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction and accurate determination of read coverage. Extant merging programs utilize simplistic or unverified models for the selection of bases and quality scores for the overlapping region of merged reads.Results
We first examined the baseline quality score - error rate relationship using sequence reads derived from PhiX. In contrast to numerous published reports, we found that the quality scores produced by Illumina were not substantially inflated above the theoretical values, once the reference genome was corrected for unreported sequence variants. The PhiX reads were then used to create empirical models of sequencing errors in overlapping regions of paired-end reads, and these models were incorporated into a novel merging program, NGmerge. We demonstrate that NGmerge corrects errors and ambiguous bases better than other merging programs, and that it assigns quality scores for merged bases that accurately reflect the error rates. Our results also show that, contrary to published analyses, the sequencing errors of paired-end reads are not independent.Conclusions
We provide a free and open-source program, NGmerge, that performs better than existing read merging programs. NGmerge is available on GitHub (https://github.com/harvardinformatics/NGmerge) under the MIT License; it is written in C and supported on Linux.3.
Daniel Cañueto Josep Gómez Reza M. Salek Xavier Correig Nicolau Cañellas 《Metabolomics : Official journal of the Metabolomic Society》2018,14(3):24
Introduction
Adoption of automatic profiling tools for 1H-NMR-based metabolomic studies still lags behind other approaches in the absence of the flexibility and interactivity necessary to adapt to the properties of study data sets of complex matrices.Objectives
To provide an open source tool that fully integrates these needs and enables the reproducibility of the profiling process.Methods
rDolphin incorporates novel techniques to optimize exploratory analysis, metabolite identification, and validation of profiling output quality.Results
The information and quality achieved in two public datasets of complex matrices are maximized.Conclusion
rDolphin is an open-source R package (http://github.com/danielcanueto/rDolphin) able to provide the best balance between accuracy, reproducibility and ease of use.4.
Chao Xie Chin Lui Wesley Goi Daniel H. Huson Peter F. R. Little Rohan B. H. Williams 《BMC bioinformatics》2016,17(19):508
Background
Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.Results
Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.Conclusions
RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.5.
Background
Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required.Results
We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure.Conclusions
Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.6.
7.
Background
Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies.Results
We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm.Conclusions
Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.8.
Connor Black Olivier P. Chevallier Simon A. Haughey Julia Balog Sara Stead Steven D. Pringle Maria V. Riina Francesca Martucci Pier L. Acutis Mike Morris Dimitrios S. Nikolopoulos Zoltan Takats Christopher T. Elliott 《Metabolomics : Official journal of the Metabolomic Society》2017,13(12):153
Introduction
Fish fraud detection is mainly carried out using a genomic profiling approach requiring long and complex sample preparations and assay running times. Rapid evaporative ionisation mass spectrometry (REIMS) can circumvent these issues without sacrificing a loss in the quality of results.Objectives
To demonstrate that REIMS can be used as a fast profiling technique capable of achieving accurate species identification without the need for any sample preparation. Additionally, we wanted to demonstrate that other aspects of fish fraud other than speciation are detectable using REIMS.Methods
478 samples of five different white fish species were subjected to REIMS analysis using an electrosurgical knife. Each sample was cut 8–12 times with each one lasting 3–5 s and chemometric models were generated based on the mass range m/z 600–950 of each sample.Results
The identification of 99 validation samples provided a 98.99% correct classification in which species identification was obtained near-instantaneously (≈?2 s) unlike any other form of food fraud analysis. Significant time comparisons between REIMS and polymerase chain reaction (PCR) were observed when analysing 6 mislabelled samples demonstrating how REIMS can be used as a complimentary technique to detect fish fraud. Additionally, we have demonstrated that the catch method of fish products is capable of detection using REIMS, a concept never previously reported.Conclusions
REIMS has been proven to be an innovative technique to help aid the detection of fish fraud and has the potential to be utilised by fisheries to conduct their own quality control (QC) checks for fast accurate results.9.
Hongchao Ji Zhimin Zhang Hongmei Lu 《Metabolomics : Official journal of the Metabolomic Society》2018,14(5):68
Introduction
Untargeted and targeted analyses are two classes of metabolic study. Both strategies have been advanced by high resolution mass spectrometers coupled with chromatography, which have the advantages of high mass sensitivity and accuracy. State-of-art methods for mass spectrometric data sets do not always quantify metabolites of interest in a targeted assay efficiently and accurately.Objectives
TarMet can quantify targeted metabolites as well as their isotopologues through a reactive and user-friendly graphical user interface.Methods
TarMet accepts vendor-neutral data files (NetCDF, mzXML and mzML) as inputs. Then it extracts ion chromatograms, detects peak position and bounds and confirms the metabolites via the isotope patterns. It can integrate peak areas for all isotopologues automatically.Results
TarMet detects more isotopologues and quantify them better than state-of-art methods, and it can process isotope tracer assay well.Conclusion
TarMet is a better tool for targeted metabolic and stable isotope tracer analyses.10.
Clara Pérez-Rambla Leonor Puchades-Carrasco María García-Flores José Rubio-Briones José Antonio López-Guerrero Antonio Pineda-Lucena 《Metabolomics : Official journal of the Metabolomic Society》2017,13(5):52
Introduction
Prostate cancer (PCa) is one of the most common malignancies in men worldwide. Serum prostate specific antigen (PSA) level has been extensively used as a biomarker to detect PCa. However, PSA is not cancer-specific and various non-malignant conditions, including benign prostatic hyperplasia (BPH), can cause a rise in PSA blood levels, thus leading to many false positive results.Objectives
In this study, we evaluated the potential of urinary metabolomic profiling for discriminating PCa from BPH.Methods
Urine samples from 64 PCa patients and 51 individuals diagnosed with BPH were analysed using 1H nuclear magnetic resonance (1H-NMR). Comparative analysis of urinary metabolomic profiles was carried out using multivariate and univariate statistical approaches.Results
The urine metabolomic profile of PCa patients is characterised by increased concentrations of branched-chain amino acids (BCAA), glutamate and pseudouridine, and decreased concentrations of glycine, dimethylglycine, fumarate and 4-imidazole-acetate compared with individuals diagnosed with BPH.Conclusion
PCa patients have a specific urinary metabolomic profile. The results of our study underscore the clinical potential of metabolomic profiling to uncover metabolic changes that could be useful to discriminate PCa from BPH in a clinical context.11.
Daqiang Pan Caroline Lindau Simon Lagies Nils Wiedemann Bernd Kammerer 《Metabolomics : Official journal of the Metabolomic Society》2018,14(5):59
Introduction
Subcellular compartmentalization enables eukaryotic cells to carry out different reactions at the same time, resulting in different metabolite pools in the subcellular compartments. Thus, mutations affecting the mitochondrial energy metabolism could cause different metabolic alterations in mitochondria compared to the cytoplasm. Given that the metabolite pool in the cytosol is larger than that of other subcellular compartments, metabolic profiling of total cells could miss these compartment-specific metabolic alterations.Objectives
To reveal compartment-specific metabolic differences, mitochondria and the cytoplasmic fraction of baker’s yeast Saccharomyces cerevisiae were isolated and subjected to metabolic profiling.Methods
Mitochondria were isolated through differential centrifugation and were analyzed together with the remaining cytoplasm by gas chromatography–mass spectrometry (GC–MS) based metabolic profiling.Results
Seventy-two metabolites were identified, of which eight were found exclusively in mitochondria and sixteen exclusively in the cytoplasm. Based on the metabolic signature of mitochondria and of the cytoplasm, mutants of the succinate dehydrogenase (respiratory chain complex II) and of the FOF1-ATP-synthase (complex V) can be discriminated in both compartments by principal component analysis from wild-type and each other. These mitochondrial oxidative phosphorylation machinery mutants altered not only citric acid cycle related metabolites but also amino acids, fatty acids, purine and pyrimidine intermediates and others.Conclusion
By applying metabolomics to isolated mitochondria and the corresponding cytoplasm, compartment-specific metabolic signatures can be identified. This subcellular metabolomics analysis is a powerful tool to study the molecular mechanism of compartment-specific metabolic homeostasis in response to mutations affecting the mitochondrial metabolism.12.
Sonia Liggi Christine Hinz Zoe Hall Maria Laura Santoru Simone Poddighe John Fjeldsted Luigi Atzori Julian L. Griffin 《Metabolomics : Official journal of the Metabolomic Society》2018,14(4):52
Introduction
Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.Objectives
Merge in the same platform the steps required for metabolomics data processing.Methods
KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.Results
The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.Conclusion
KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.13.
Christoph Böttcher Andrea Krähmer Melanie Stürtz Sabine Widder Hartwig Schulz 《Metabolomics : Official journal of the Metabolomic Society》2017,13(4):35
Introduction
Onion (Allium cepa) represents one of the most important horticultural crops and is used as food, spice and medicinal plant almost worldwide. Onion bulbs accumulate a broad range of primary and secondary metabolites which impact nutritional, sensory and technological properties.Objectives
To complement existing analytical methods targeting individual compound classes this work aimed at the development and validation of an analytical workflow for comprehensive metabolite profiling of onion bulbs.Method
Metabolite profiling was performed by liquid chromatography coupled with electrospray ionization quadrupole time-of-flight mass spectrometry (LC/ESI-QTOFMS). For annotation of metabolites accurate mass tandem mass spectrometry experiments were carried out.Results
On the basis of LC/ESI-QTOFMS and two chromatographic methods an analytical workflow was developed which facilitates profiling of polar and semi-polar onion metabolites including fructooligosaccharides, proteinogenic amino acids, peptides, S-substituted cysteine conjugates, flavonoids and saponins. To minimize enzymatic conversion of S-alk(en)ylcysteine sulfoxides, a sample preparation and extraction protocol for fresh onions was developed comprising cryohomogenization and a low-temperature quenching step. A total of 123 metabolites were annotated and characterized by chromatographic and tandem mass spectral data. For validation, recovery rates and matrix effects were determined for 15 model compounds. Repeatability and linearity were assessed for more than 80 endogenous metabolites.Conclusion
As exemplarily demonstrated by comparative metabolic analysis of six onion cultivars the established analytical workflow in combination with targeted and non-targeted data analysis strategies can be successfully applied for comprehensive metabolite profiling of onion bulbs.14.
15.
Frida Torell Kate Bennett Stefan Rännar Katrin Lundstedt-Enkel Torbjörn Lundstedt Johan Trygg 《Metabolomics : Official journal of the Metabolomic Society》2017,13(6):66
Introduction
Post-collection handling, storage and transportation can affect the quality of blood samples. Pre-analytical biases can easily be introduced and can jeopardize accurate profiling of the plasma metabolome. Consequently, a mouse study must be carefully planned in order to avoid any kind of bias that can be introduced, in order not to compromise the outcome of the study. The storage and shipment of the samples should be made in such a way that the freeze–thaw cycles are kept to a minimum. In order to keep the latent effects on the stability of the blood metabolome to a minimum it is essential to study the effect that the post-collection and pre-analytical error have on the metabolome.Objectives
The aim of this study was to investigate the effects of thawing on the metabolic profiles of different sample types.Methods
In the present study, a metabolomics approach was utilized to obtain a thawing profile of plasma samples obtained on three different days of experiment. The plasma samples were collected from the tail on day 1 and 3, while retro-orbital sampling was used on day 5. The samples were analysed using gas chromatography time-of-flight mass spectrometry (GC TOF-MS).Results
The thawed plasma samples were found to be characterized by higher levels of amino acids, fatty acids, glycerol metabolites and purine and pyrimidine metabolites as a result of protein degradation, cell degradation and increased phospholipase activity. The consensus profile was thereafter compared to the previously published study comparing thawing profiles of tissue samples from gut, kidney, liver, muscle and pancreas.Conclusions
The comparison between thawed organ samples and thawed plasma samples indicate that the organ samples are more sensitive to thawing, however thawing still affected all investigated sample types.16.
Margaux Luck Neila Talbi Laurent Gouya Cédric Caradeuc Hervé Puy Gildas Bertho Nicolas Pallet 《Metabolomics : Official journal of the Metabolomic Society》2018,14(1):10
Introduction
Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease.Objectives
In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses.Methods
Urine 1H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained.Results
Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients.Conclusion
These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.17.
Do universal codon-usage patterns minimize the effects of mutation and translation error? 总被引:2,自引:1,他引:1
Background
Do species use codons that reduce the impact of errors in translation or replication? The genetic code is arranged in a way that minimizes errors, defined as the sum of the differences in amino-acid properties caused by single-base changes from each codon to each other codon. However, the extent to which organisms optimize the genetic messages written in this code has been far less studied. We tested whether codon and amino-acid usages from 457 bacteria, 264 eukaryotes, and 33 archaea minimize errors compared to random usages, and whether changes in genome G+C content influence these error values.Results
We tested the hypotheses that organisms choose their codon usage to minimize errors, and that the large observed variation in G+C content in coding sequences, but the low variation in G+U or G+A content, is due to differences in the effects of variation along these axes on the error value. Surprisingly, the biological distribution of error values has far lower variance than randomized error values, but error values of actual codon and amino-acid usages are actually greater than would be expected by chance.Conclusion
These unexpected findings suggest that selection against translation error has not produced codon or amino-acid usages that minimize the effects of errors, and that even messages with very different nucleotide compositions somehow maintain a relatively constant error value. They raise the question: why do all known organisms use highly error-minimizing genetic codes, but fail to minimize the errors in the mRNA messages they encode?18.
Background
An important feature in many genomic studies is quality control and normalization. This is particularly important when analyzing epigenetic data, where the process of obtaining measurements can be bias prone. The GAW20 data was from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), a study with multigeneration families, where DNA cytosine-phosphate-guanine (CpG) methylation was measured pre- and posttreatment with fenofibrate. We performed quality control assessment of the GAW20 DNA methylation data, including normalization, assessment of batch effects and detection of sample swaps.Results
We show that even after normalization, the GOLDN methylation data has systematic differences pre- and posttreatment. Through investigation of (a) CpGs sites containing a single nucleotide polymorphism, (b) the stability of breeding values for methylation across time points, and (c) autosomal gender-associated CpGs, 13 sample swaps were detected, 11 of which were posttreatment.Conclusions
This paper demonstrates several ways to perform quality control of methylation data in the absence of raw data files and highlights the importance of normalization and quality control of the GAW20 methylation data from the GOLDN study.19.
Tafadzwa Chihanga Sarah M. Hausmann Shuisong Ni Michael A. Kennedy 《Metabolomics : Official journal of the Metabolomic Society》2018,14(3):28
Introduction
Comparative metabolic profiling of different human cancer cell lines can reveal metabolic pathways up-regulated or down-regulated in each cell line, potentially providing insight into distinct metabolism taking place in different types of cancer cells. It is noteworthy, however, that human cell lines available from public repositories are deposited with recommended media for optimal growth, and if cell lines to be compared are cultured on different growth media, this introduces a potentially serious confounding variable in metabolic profiling studies designed to identify intrinsic metabolic pathways active in each cell line.Objectives
The goal of this study was to determine if the culture media used to grow human cell lines had a significant impact on the measured metabolic profiles.Methods
NMR-based metabolic profiles of hydrophilic extracts of three human pancreatic cancer cell lines, AsPC-1, MiaPaCa-2 and Panc-1, were compared after culture on Dulbecco’s Modified Eagle Medium (DMEM) or Roswell Park Memorial Institute (RPMI-1640) medium.Results
Comparisons of the same cell lines cultured on different media revealed that the concentrations of many metabolites depended strongly on the choice of culture media. Analyses of different cell lines grown on the same media revealed insight into their metabolic differences.Conclusion
The choice of culture media can significantly impact metabolic profiles of human cell lines and should be considered an important variable when designing metabolic profiling studies. Also, the metabolic differences of cells cultured on media recommended for optimal growth in comparison to a second growth medium can reveal critical insight into metabolic pathways active in each cell line.20.
Carl Brunius Lin Shi Rikard Landberg 《Metabolomics : Official journal of the Metabolomic Society》2016,12(11):173