Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.
Methods
We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n?=?1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.
Results
Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.
Conclusion
Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.
Liquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation.
Objective
Our aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality.
Methods
Algorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation.
Results
In authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package (‘batchCorr’).
Conclusions
The developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.
FA esters of hydroxy FAs (FAHFAs) are lipokines with extensive structural and regional isomeric diversity that impact multiple physiological functions, including insulin sensitivity and glucose homeostasis. Because of their low molar abundance, FAHFAs are typically quantified using highly sensitive LC-MS/MS methods. Numerous relevant MS databases house in silico-spectra that allow identification and speciation of FAHFAs. These provisional chemical feature assignments provide a useful starting point but could lead to misidentification. To address this possibility, we analyzed human serum with a commonly applied high-resolution LC-MS untargeted metabolomics platform. We found that many chemical features are putatively assigned to the FAHFA lipid class based on exact mass and fragmentation patterns matching spectral databases. Careful validation using authentic standards revealed that many investigated signals provisionally assigned as FAHFAs are in fact FA dimers formed in the LC-MS pipeline. These isobaric FA dimers differ structurally only by the presence of an olefinic bond. Furthermore, stable isotope-labeled oleic acid spiked into human serum at subphysiological concentrations showed concentration-dependent formation of a diverse repertoire of FA dimers that analytically mimicked FAHFAs. Conversely, validated FAHFA species did not form spontaneously in the LC-MS pipeline. Together, these findings underscore that FAHFAs are endogenous lipid species. However, nonbiological FA dimers forming in the setting of high concentrations of FFAs can be misidentified as FAHFAs. Based on these results, we assembled a FA dimer database to identify nonbiological FA dimers in untargeted metabolomics datasets. 相似文献
1H nuclear magnetic resonance (1H NMR)-based metabolomics was utilized to elucidate the earthworm sub-lethal toxicity after exposure to the persistent environmental contaminant phenanthrene. Earthworms were exposed to 0.05, 0.2 and 0.4 mg/cm2 of phenanthrene [which correspond to 1/32nd to 1/4th of the 48-h LC50 (concentration that causes 50 % mortality), respectively] via contact tests over 1, 2 and 3 days of dermal contact. 1H NMR-based metabolomic analysis of the polar and non-polar fractions of the earthworm tissue extracts revealed heightened Eisenia fetida toxic responses with both longer exposure times and higher phenanthrene concentrations. Principal component analysis (PCA) of the polar fraction showed significant separation between control and exposed earthworms along PC1 for all phenanthrene concentrations on each day. The PCA of the non-polar fraction showed significant separation between the controls and exposed earthworms for only the first day of exposure. These results suggested that alanine, glutamate, maltose, and fatty acids were potential indicators of phenanthrene exposure. Interruption in energy production due to a deactivation of the succinate dehydrogenase enzyme in the Krebs cycle was also postulated in exposed earthworms. Cross-validated partial least squares-regression models showed that the polar metabolic profile of E. fetida was weakly but significantly correlated to phenanthrene exposure concentrations after day 1 and day 2 of exposure. Overall, this study indicates that with longer exposures, contact time becomes more important than concentration in discriminating between control and exposed earthworms. This study also shows that NMR-based metabolomics has promise as a powerful ecotoxicological tool for elucidating the mode of toxicity of contaminants. 相似文献
Metabolomics has advanced significantly in the past 10 years with important developments related to hardware, software and methodologies and an increasing complexity of applications. In discovery-based investigations, applying untargeted analytical methods, thousands of metabolites can be detected with no or limited prior knowledge of the metabolite composition of samples. In these cases, metabolite identification is required following data acquisition and processing. Currently, the process of metabolite identification in untargeted metabolomic studies is a significant bottleneck in deriving biological knowledge from metabolomic studies. In this review we highlight the different traditional and emerging tools and strategies applied to identify subsets of metabolites detected in untargeted metabolomic studies applying various mass spectrometry platforms. We indicate the workflows which are routinely applied and highlight the current limitations which need to be overcome to provide efficient, accurate and robust identification of metabolites in untargeted metabolomic studies. These workflows apply to the identification of metabolites, for which the structure can be assigned based on entries in databases, and for those which are not yet stored in databases and which require a de novo structure elucidation.
In metabolomics, tissues typically are extracted by grinding in liquid nitrogen followed by the stepwise addition of solvents. This is time-consuming and difficult to automate, and the multiple steps can introduce variability. Here we optimize tissue extraction methods compatible with high-throughput, reproducible nuclear magnetic resonance (NMR) spectroscopy- and mass spectrometry (MS)-based metabolomics. Previously, we concluded that methanol/chloroform/water extraction is preferable for metabolomics, and we further optimized this here using fish liver and an automated Precellys 24 bead-based homogenizer, allowing rapid extraction of multiple samples without carryover. We compared three solvent addition strategies: stepwise, two-step, and all solvents simultaneously. Then we evaluated strategies for improved partitioning of metabolites between solvent phases, including the addition of extra water and different partition times. Polar extracts were analyzed by NMR and principal components analysis, and the two-step approach was preferable based on lipid partitioning, reproducibility, yield, and throughput. Longer partitioning or extra water increased yield and decreased lipids in the polar phase but caused metabolic decay in these extracts. Overall, we conclude that the two-step method with extra water provides good quality data but that the two-step method with 10 min partitioning provides a more accurate snapshot of the metabolome. Finally, when validating the two-step strategy using NMR and MS metabolomics, we showed that technical variability was considerably smaller than biological variability. 相似文献
Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.
Objectives
In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.
Methods
The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.
Results
A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.
Conclusion
The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
Metabolomics - Understanding the interaction between organisms and the environment is important for predicting and mitigating the effects of global phenomena such as climate change, and the fate,... 相似文献
For pediatric diseases like childhood leukemia, a short latency period points to in-utero exposures as potentially important risk factors. Untargeted metabolomics of small molecules in archived newborn dried blood spots (DBS) offers an avenue for discovering early-life exposures that contribute to disease risks.
Objectives
The purpose of this study was to develop a quantitative method for untargeted analysis of archived newborn DBS for use in an epidemiological study (California Childhood Leukemia Study, CCLS).
Methods
Using experimental DBS from the blood of an adult volunteer, we optimized extraction of small molecules and integrated measurement of potassium as a proxy for blood hematocrit. We then applied this extraction method to 4.7-mm punches from 106 control DBS samples from the CCLS. Sample extracts were analyzed with liquid chromatography—high resolution mass spectrometry (LC-HRMS) and an untargeted workflow was used to screen for metabolites that discriminate population characteristics such as sex, ethnicity, and birth weight.
Results
Thousands of small molecules were measured in extracts of archived DBS. Normalizing for potassium levels removed variability related to varying hematocrit across DBS punches. Of the roughly 1000 prevalent small molecules that were tested, multivariate linear regression detected significant associations with ethnicity (three metabolites) and birth weight (15 metabolites) after adjusting for multiple testing.
Conclusions
This untargeted workflow can be used for analysis of small molecules in archived DBS to discover novel biomarkers, to provide insights into the initiation and progression of diseases, and to provide guidance for disease prevention.
Fruit color is thought to be an adaptation to different animal pollinators, the fruits of Kadsura coccinea present diverse colors but the metabolic mechanism of the differences in color still needs further clarification. Here, we performed ultra-performance liquid chromatography-tandem mass spectrometry based on targeted metabolome analysis of the fruits of two K. coccinea cultivars, ‘Dahong No. 1’ (red-peel) and ‘Jinhu’ (yellow-peel). A total of seventeen anthocyanins were identified in the fruit peels of ‘Dahong No. 1’, whereas no anthocyanins were detected in ‘Jinhu’. Our results suggest that the color differences between the two cultivars can be explained by variations in abundance of cyanidin 3-O-rutinoside, cyanidin 3-O-glucoside and delphinidin 3-O-glucoside, accounting for 91.64% of the content of total anthocyanins. This study provides new insights into the underlying metabolic causes of color variation in K. coccinea fruits, and thus provides a theoretical basis for the development and utilization of K. coccinea fruits in the future. 相似文献
Cheese intake has been shown to decrease total cholesterol and LDL cholesterol concentrations when compared to butter of equal fat content. An untargeted metabolite profiling may reveal exposure markers of cheese but may also contribute with markers which can help explain how the intake of cheese affects cholesterol concentrations. Twenty-three subjects collected 2 × 24 h urine samples after 6 weeks of cheese and 6 weeks of butter intake with equal amounts of fat in a cross-over intervention study. The samples were analyzed by UPLC-QTOF/MS. A two-step univariate data analysis approach using linear mixed model was applied separately for positive and negative ionization mode: In the first step a total of 44 features related to treatment were identified and in the second step 36 of these features were related to total cholesterol concentrations. Cheese intake resulted in increased urinary indoxyl sulfate, xanthurenic acid, tyramine sulfate, 4-hydroxyphenylacetic acid, isovalerylglutamic acid and several acylglycines including isovalerylglycine, tiglylglycine and isobutyrylglycine when compared to butter intake of equal fat content. The biological mechanisms of action linking the metabolites to cholesterol concentrations need to be further explored. 相似文献
Severe acute malnutrition (SAM) is a major cause of child mortality worldwide, however the pathogenesis of SAM remains poorly understood. Recent studies have uncovered an altered gut microbiota composition in children with SAM, suggesting a role for microbes in the pathogenesis of malnutrition.
Objectives
To elucidate the metabolic consequences of SAM and whether these changes are associated with changes in gut microbiota composition.
Methods
We applied an untargeted multi-platform metabolomics approach [gas chromatography–mass spectrometry (GC-MS) and liquid chromatography–mass spectrometry (LC-MS)] to stool and plasma samples from 47 Nigerian children with SAM and 11 control children. The composition of the stool microbiota was assessed by 16S rRNA gene sequencing.
Results
The plasma metabolome discriminated children with SAM from controls, while no significant differences were observed in the microbial or small molecule composition of stool. The abundance of 585 features in plasma were significantly altered in malnourished children (Wilcoxon test, FDR corrected P?<?0.1), representing approximately 15% of the metabolome. Consistent with previous studies, children with SAM exhibited a marked reduction in amino acids/dipeptides and phospholipids, and an increase in acylcarnitines. We also identified numerous metabolic perturbations which have not been reported previously, including increased disaccharides, truncated fibrinopeptides, angiotensin I, dihydroxybutyrate, lactate, and heme, and decreased bioactive lipids belonging to the eicosanoid and docosanoid family.
Conclusion
Our findings provide a deeper understanding of the metabolic consequences of malnutrition. Further research is required to determine if specific metabolites may guide improved management, and/or act as novel biomarkers for assessing response to treatment.
Sulfur mustard (SM) is a potent alkylating agent and its effects on cells and tissues are varied and complex. Due to limitations in the diagnostics of sulfur mustard exposed individuals (SMEIs) by noninvasive approaches, there is a great necessity to develop novel techniques and biomarkers for this condition. We present here the first nuclear magnetic resonance (NMR) and gas chromatography-mass spectrometry (GC/MS) metabolic profiling of serum from and healthy controls to identify novel biomarkers in blood serum for better diagnostics. Of note, SMEIs were exposed to SM 30 years ago and that differences between two groups could still be found. Pathways in which differences between SMEIs and healthy controls are observed are related to lipid metabolism, ketogenesis, tricarboxylic acid (TCA) cycle and amino acid metabolism. 相似文献