首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Sequence alignment data is often ordered by coordinate (id of the reference sequence plus position on the sequence where the fragment was mapped) when stored in BAM files, as this simplifies the extraction of variants between the mapped data and the reference or of variants within the mapped data. In this order paired reads are usually separated in the file, which complicates some other applications like duplicate marking or conversion to the FastQ format which require to access the full information of the pairs.

Results

In this paper we introduce biobambam, a set of tools based on the efficient collation of alignments in BAM files by read name. The employed collation algorithm avoids time and space consuming sorting of alignments by read name where this is possible without using more than a specified amount of main memory. Using this algorithm tasks like duplicate marking in BAM files and conversion of BAM files to the FastQ format can be performed very efficiently with limited resources. We also make the collation algorithm available in the form of an API for other projects. This API is part of the libmaus package.

Conclusions

In comparison with previous approaches to problems involving the collation of alignments by read name like the BAM to FastQ or duplication marking utilities our approach can often perform an equivalent task more efficiently in terms of the required main memory and run-time. Our BAM to FastQ conversion is faster than all widely known alternatives including Picard and bamUtil. Our duplicate marking is about as fast as the closest competitor bamUtil for small data sets and faster than all known alternatives on large and complex data sets.
  相似文献   

2.

Background

Advances in Illumina DNA sequencing technology have produced longer paired-end reads that increasingly have sequence overlaps. These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction and accurate determination of read coverage. Extant merging programs utilize simplistic or unverified models for the selection of bases and quality scores for the overlapping region of merged reads.

Results

We first examined the baseline quality score - error rate relationship using sequence reads derived from PhiX. In contrast to numerous published reports, we found that the quality scores produced by Illumina were not substantially inflated above the theoretical values, once the reference genome was corrected for unreported sequence variants. The PhiX reads were then used to create empirical models of sequencing errors in overlapping regions of paired-end reads, and these models were incorporated into a novel merging program, NGmerge. We demonstrate that NGmerge corrects errors and ambiguous bases better than other merging programs, and that it assigns quality scores for merged bases that accurately reflect the error rates. Our results also show that, contrary to published analyses, the sequencing errors of paired-end reads are not independent.

Conclusions

We provide a free and open-source program, NGmerge, that performs better than existing read merging programs. NGmerge is available on GitHub (https://github.com/harvardinformatics/NGmerge) under the MIT License; it is written in C and supported on Linux.
  相似文献   

3.

Introduction

Adoption of automatic profiling tools for 1H-NMR-based metabolomic studies still lags behind other approaches in the absence of the flexibility and interactivity necessary to adapt to the properties of study data sets of complex matrices.

Objectives

To provide an open source tool that fully integrates these needs and enables the reproducibility of the profiling process.

Methods

rDolphin incorporates novel techniques to optimize exploratory analysis, metabolite identification, and validation of profiling output quality.

Results

The information and quality achieved in two public datasets of complex matrices are maximized.

Conclusion

rDolphin is an open-source R package (http://github.com/danielcanueto/rDolphin) able to provide the best balance between accuracy, reproducibility and ease of use.
  相似文献   

4.

Background

Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.

Results

Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.

Conclusions

RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.
  相似文献   

5.

Background

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required.

Results

We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure.

Conclusions

Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
  相似文献   

6.
7.

Background

Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies.

Results

We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm.

Conclusions

Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
  相似文献   

8.

Introduction

Fish fraud detection is mainly carried out using a genomic profiling approach requiring long and complex sample preparations and assay running times. Rapid evaporative ionisation mass spectrometry (REIMS) can circumvent these issues without sacrificing a loss in the quality of results.

Objectives

To demonstrate that REIMS can be used as a fast profiling technique capable of achieving accurate species identification without the need for any sample preparation. Additionally, we wanted to demonstrate that other aspects of fish fraud other than speciation are detectable using REIMS.

Methods

478 samples of five different white fish species were subjected to REIMS analysis using an electrosurgical knife. Each sample was cut 8–12 times with each one lasting 3–5 s and chemometric models were generated based on the mass range m/z 600–950 of each sample.

Results

The identification of 99 validation samples provided a 98.99% correct classification in which species identification was obtained near-instantaneously (≈?2 s) unlike any other form of food fraud analysis. Significant time comparisons between REIMS and polymerase chain reaction (PCR) were observed when analysing 6 mislabelled samples demonstrating how REIMS can be used as a complimentary technique to detect fish fraud. Additionally, we have demonstrated that the catch method of fish products is capable of detection using REIMS, a concept never previously reported.

Conclusions

REIMS has been proven to be an innovative technique to help aid the detection of fish fraud and has the potential to be utilised by fisheries to conduct their own quality control (QC) checks for fast accurate results.
  相似文献   

9.

Introduction

Untargeted and targeted analyses are two classes of metabolic study. Both strategies have been advanced by high resolution mass spectrometers coupled with chromatography, which have the advantages of high mass sensitivity and accuracy. State-of-art methods for mass spectrometric data sets do not always quantify metabolites of interest in a targeted assay efficiently and accurately.

Objectives

TarMet can quantify targeted metabolites as well as their isotopologues through a reactive and user-friendly graphical user interface.

Methods

TarMet accepts vendor-neutral data files (NetCDF, mzXML and mzML) as inputs. Then it extracts ion chromatograms, detects peak position and bounds and confirms the metabolites via the isotope patterns. It can integrate peak areas for all isotopologues automatically.

Results

TarMet detects more isotopologues and quantify them better than state-of-art methods, and it can process isotope tracer assay well.

Conclusion

TarMet is a better tool for targeted metabolic and stable isotope tracer analyses.
  相似文献   

10.

Introduction

Prostate cancer (PCa) is one of the most common malignancies in men worldwide. Serum prostate specific antigen (PSA) level has been extensively used as a biomarker to detect PCa. However, PSA is not cancer-specific and various non-malignant conditions, including benign prostatic hyperplasia (BPH), can cause a rise in PSA blood levels, thus leading to many false positive results.

Objectives

In this study, we evaluated the potential of urinary metabolomic profiling for discriminating PCa from BPH.

Methods

Urine samples from 64 PCa patients and 51 individuals diagnosed with BPH were analysed using 1H nuclear magnetic resonance (1H-NMR). Comparative analysis of urinary metabolomic profiles was carried out using multivariate and univariate statistical approaches.

Results

The urine metabolomic profile of PCa patients is characterised by increased concentrations of branched-chain amino acids (BCAA), glutamate and pseudouridine, and decreased concentrations of glycine, dimethylglycine, fumarate and 4-imidazole-acetate compared with individuals diagnosed with BPH.

Conclusion

PCa patients have a specific urinary metabolomic profile. The results of our study underscore the clinical potential of metabolomic profiling to uncover metabolic changes that could be useful to discriminate PCa from BPH in a clinical context.
  相似文献   

11.

Introduction

Subcellular compartmentalization enables eukaryotic cells to carry out different reactions at the same time, resulting in different metabolite pools in the subcellular compartments. Thus, mutations affecting the mitochondrial energy metabolism could cause different metabolic alterations in mitochondria compared to the cytoplasm. Given that the metabolite pool in the cytosol is larger than that of other subcellular compartments, metabolic profiling of total cells could miss these compartment-specific metabolic alterations.

Objectives

To reveal compartment-specific metabolic differences, mitochondria and the cytoplasmic fraction of baker’s yeast Saccharomyces cerevisiae were isolated and subjected to metabolic profiling.

Methods

Mitochondria were isolated through differential centrifugation and were analyzed together with the remaining cytoplasm by gas chromatography–mass spectrometry (GC–MS) based metabolic profiling.

Results

Seventy-two metabolites were identified, of which eight were found exclusively in mitochondria and sixteen exclusively in the cytoplasm. Based on the metabolic signature of mitochondria and of the cytoplasm, mutants of the succinate dehydrogenase (respiratory chain complex II) and of the FOF1-ATP-synthase (complex V) can be discriminated in both compartments by principal component analysis from wild-type and each other. These mitochondrial oxidative phosphorylation machinery mutants altered not only citric acid cycle related metabolites but also amino acids, fatty acids, purine and pyrimidine intermediates and others.

Conclusion

By applying metabolomics to isolated mitochondria and the corresponding cytoplasm, compartment-specific metabolic signatures can be identified. This subcellular metabolomics analysis is a powerful tool to study the molecular mechanism of compartment-specific metabolic homeostasis in response to mutations affecting the mitochondrial metabolism.
  相似文献   

12.

Introduction

Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.

Objectives

Merge in the same platform the steps required for metabolomics data processing.

Methods

KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.

Results

The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.

Conclusion

KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.
  相似文献   

13.

Introduction

Onion (Allium cepa) represents one of the most important horticultural crops and is used as food, spice and medicinal plant almost worldwide. Onion bulbs accumulate a broad range of primary and secondary metabolites which impact nutritional, sensory and technological properties.

Objectives

To complement existing analytical methods targeting individual compound classes this work aimed at the development and validation of an analytical workflow for comprehensive metabolite profiling of onion bulbs.

Method

Metabolite profiling was performed by liquid chromatography coupled with electrospray ionization quadrupole time-of-flight mass spectrometry (LC/ESI-QTOFMS). For annotation of metabolites accurate mass tandem mass spectrometry experiments were carried out.

Results

On the basis of LC/ESI-QTOFMS and two chromatographic methods an analytical workflow was developed which facilitates profiling of polar and semi-polar onion metabolites including fructooligosaccharides, proteinogenic amino acids, peptides, S-substituted cysteine conjugates, flavonoids and saponins. To minimize enzymatic conversion of S-alk(en)ylcysteine sulfoxides, a sample preparation and extraction protocol for fresh onions was developed comprising cryohomogenization and a low-temperature quenching step. A total of 123 metabolites were annotated and characterized by chromatographic and tandem mass spectral data. For validation, recovery rates and matrix effects were determined for 15 model compounds. Repeatability and linearity were assessed for more than 80 endogenous metabolites.

Conclusion

As exemplarily demonstrated by comparative metabolic analysis of six onion cultivars the established analytical workflow in combination with targeted and non-targeted data analysis strategies can be successfully applied for comprehensive metabolite profiling of onion bulbs.
  相似文献   

14.
15.

Introduction

Post-collection handling, storage and transportation can affect the quality of blood samples. Pre-analytical biases can easily be introduced and can jeopardize accurate profiling of the plasma metabolome. Consequently, a mouse study must be carefully planned in order to avoid any kind of bias that can be introduced, in order not to compromise the outcome of the study. The storage and shipment of the samples should be made in such a way that the freeze–thaw cycles are kept to a minimum. In order to keep the latent effects on the stability of the blood metabolome to a minimum it is essential to study the effect that the post-collection and pre-analytical error have on the metabolome.

Objectives

The aim of this study was to investigate the effects of thawing on the metabolic profiles of different sample types.

Methods

In the present study, a metabolomics approach was utilized to obtain a thawing profile of plasma samples obtained on three different days of experiment. The plasma samples were collected from the tail on day 1 and 3, while retro-orbital sampling was used on day 5. The samples were analysed using gas chromatography time-of-flight mass spectrometry (GC TOF-MS).

Results

The thawed plasma samples were found to be characterized by higher levels of amino acids, fatty acids, glycerol metabolites and purine and pyrimidine metabolites as a result of protein degradation, cell degradation and increased phospholipase activity. The consensus profile was thereafter compared to the previously published study comparing thawing profiles of tissue samples from gut, kidney, liver, muscle and pancreas.

Conclusions

The comparison between thawed organ samples and thawed plasma samples indicate that the organ samples are more sensitive to thawing, however thawing still affected all investigated sample types.
  相似文献   

16.

Introduction

Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease.

Objectives

In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses.

Methods

Urine 1H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained.

Results

Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients.

Conclusion

These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.
  相似文献   

17.

Background

Do species use codons that reduce the impact of errors in translation or replication? The genetic code is arranged in a way that minimizes errors, defined as the sum of the differences in amino-acid properties caused by single-base changes from each codon to each other codon. However, the extent to which organisms optimize the genetic messages written in this code has been far less studied. We tested whether codon and amino-acid usages from 457 bacteria, 264 eukaryotes, and 33 archaea minimize errors compared to random usages, and whether changes in genome G+C content influence these error values.

Results

We tested the hypotheses that organisms choose their codon usage to minimize errors, and that the large observed variation in G+C content in coding sequences, but the low variation in G+U or G+A content, is due to differences in the effects of variation along these axes on the error value. Surprisingly, the biological distribution of error values has far lower variance than randomized error values, but error values of actual codon and amino-acid usages are actually greater than would be expected by chance.

Conclusion

These unexpected findings suggest that selection against translation error has not produced codon or amino-acid usages that minimize the effects of errors, and that even messages with very different nucleotide compositions somehow maintain a relatively constant error value. They raise the question: why do all known organisms use highly error-minimizing genetic codes, but fail to minimize the errors in the mRNA messages they encode?
  相似文献   

18.

Background

An important feature in many genomic studies is quality control and normalization. This is particularly important when analyzing epigenetic data, where the process of obtaining measurements can be bias prone. The GAW20 data was from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), a study with multigeneration families, where DNA cytosine-phosphate-guanine (CpG) methylation was measured pre- and posttreatment with fenofibrate. We performed quality control assessment of the GAW20 DNA methylation data, including normalization, assessment of batch effects and detection of sample swaps.

Results

We show that even after normalization, the GOLDN methylation data has systematic differences pre- and posttreatment. Through investigation of (a) CpGs sites containing a single nucleotide polymorphism, (b) the stability of breeding values for methylation across time points, and (c) autosomal gender-associated CpGs, 13 sample swaps were detected, 11 of which were posttreatment.

Conclusions

This paper demonstrates several ways to perform quality control of methylation data in the absence of raw data files and highlights the importance of normalization and quality control of the GAW20 methylation data from the GOLDN study.
  相似文献   

19.

Introduction

Comparative metabolic profiling of different human cancer cell lines can reveal metabolic pathways up-regulated or down-regulated in each cell line, potentially providing insight into distinct metabolism taking place in different types of cancer cells. It is noteworthy, however, that human cell lines available from public repositories are deposited with recommended media for optimal growth, and if cell lines to be compared are cultured on different growth media, this introduces a potentially serious confounding variable in metabolic profiling studies designed to identify intrinsic metabolic pathways active in each cell line.

Objectives

The goal of this study was to determine if the culture media used to grow human cell lines had a significant impact on the measured metabolic profiles.

Methods

NMR-based metabolic profiles of hydrophilic extracts of three human pancreatic cancer cell lines, AsPC-1, MiaPaCa-2 and Panc-1, were compared after culture on Dulbecco’s Modified Eagle Medium (DMEM) or Roswell Park Memorial Institute (RPMI-1640) medium.

Results

Comparisons of the same cell lines cultured on different media revealed that the concentrations of many metabolites depended strongly on the choice of culture media. Analyses of different cell lines grown on the same media revealed insight into their metabolic differences.

Conclusion

The choice of culture media can significantly impact metabolic profiles of human cell lines and should be considered an important variable when designing metabolic profiling studies. Also, the metabolic differences of cells cultured on media recommended for optimal growth in comparison to a second growth medium can reveal critical insight into metabolic pathways active in each cell line.
  相似文献   

20.

Introduction

Liquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation.

Objective

Our aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality.

Methods

Algorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation.

Results

In authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package (‘batchCorr’).

Conclusions

The developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号