共查询到20条相似文献,搜索用时 15 毫秒
1.
Extracting biomedical information from large metabolomic datasets by multivariate data analysis is of considerable complexity. Common challenges include among others screening for differentially produced metabolites, estimation of fold changes, and sample classification. Prior to these analysis steps, it is important to minimize contributions from unwanted biases and experimental variance. This is the goal of data preprocessing. In this work, different data normalization methods were compared systematically employing two different datasets generated by means of nuclear magnetic resonance (NMR) spectroscopy. To this end, two different types of normalization methods were used, one aiming to remove unwanted sample-to-sample variation while the other adjusts the variance of the different metabolites by variable scaling and variance stabilization methods. The impact of all methods tested on sample classification was evaluated on urinary NMR fingerprints obtained from healthy volunteers and patients suffering from autosomal polycystic kidney disease (ADPKD). Performance in terms of screening for differentially produced metabolites was investigated on a dataset following a Latin-square design, where varied amounts of 8 different metabolites were spiked into a human urine matrix while keeping the total spike-in amount constant. In addition, specific tests were conducted to systematically investigate the influence of the different preprocessing methods on the structure of the analyzed data. In conclusion, preprocessing methods originally developed for DNA microarray analysis, in particular, Quantile and Cubic-Spline Normalization, performed best in reducing bias, accurately detecting fold changes, and classifying samples. 相似文献
2.
The genetic homogeneity of the people of Sardinia makes it an ideal place to study genetic related diseases such as type 1 diabetes, which in this island has one of the highest incidence worldwide. The principal objective of this study was to use 1H high-resolution NMR spectroscopy and supervised methods of multivariate data analysis to highlight the importance of the variation of low concentration metabolites between healthy and diabetic Sardinian children. To achieve this goal, statistical analyses were performed after removal of the prevailing signals of sugars and citrate (related to carbohydrate metabolism) and of hippurate (a metabolite of bacterial origins) whose presence overwhelmed all the other compounds effects on classification. The variable influence in the statistical model showed that other metabolites deriving from gut microbial metabolism ( p-cresol sulphate and phenylacetylglycine) were heavily involved in classification. This suggests the importance of changes in gut microbiota composition associated with type 1 diabetes in children. 相似文献
4.
In order to make sense of the sheer volume of metabolomic data that can be generated using current technology, robust data analysis tools are essential. We propose the use of the growing self-organizing map (GSOM) algorithm and by doing so demonstrate that a deeper analysis of metabolomics data is possible in comparison to the widely used batch-learning self-organizing map, hierarchical cluster analysis and partitioning around medoids algorithms on simulated and real-world time-course metabolomic datasets. We then applied GSOM to a recently published dataset representing metabolome response patterns of three wheat cultivars subject to a field simulated cyclic drought stress. This novel and information rich analysis provided by the proposed GSOM framework can be easily extended to other high-throughput metabolomics studies. 相似文献
5.
MOTIVATION: The focus of this paper is on two new normalization methods for cDNA microarrays. After the image analysis has been performed on a microarray and before differentially expressed genes can be detected, some form of normalization must be applied to the microarrays. Normalization removes biases towards one or other of the fluorescent dyes used to label each mRNA sample allowing for proper evaluation of differential gene expression. RESULTS: The two normalization methods that we present here build on previously described non-linear normalization techniques. We extend these techniques by firstly introducing a normalization method that deals with smooth spatial trends in intensity across microarrays, an important issue that must be dealt with. Secondly we deal with normalization of a new type of cDNA microarray experiment that is coming into prevalence, the small scale specialty or 'boutique' array, where large proportions of the genes on the microarrays are expected to be highly differentially expressed. AVAILABILITY: The normalization methods described in this paper are available via http://www.pi.csiro.au/gena/ in a software suite called tRMA: tools for R Microarray Analysis upon request of the authors. Images and data used in this paper are also available via the same link. 相似文献
6.
BackgroundMicroarray technology allows the monitoring of expression levels for thousands of genes simultaneously. This novel technique helps us to understand gene regulation as well as gene by gene interactions more systematically. In the microarray experiment, however, many undesirable systematic variations are observed. Even in replicated experiment, some variations are commonly observed. Normalization is the process of removing some sources of variation which affect the measured gene expression levels. Although a number of normalization methods have been proposed, it has been difficult to decide which methods perform best. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization.ResultsIn this paper, we use the variability among the replicated slides to compare performance of normalization methods. We also compare normalization methods with regard to bias and mean square error using simulated data.ConclusionsOur results show that intensity-dependent normalization often performs better than global normalization methods, and that linear and nonlinear normalization methods perform similarly. These conclusions are based on analysis of 36 cDNA microarrays of 3,840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells. Simulation studies confirm our findings. 相似文献
7.
Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression
levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene
expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization
of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach
for normalization can be critical, deserving judicious consideration. 相似文献
8.
Hematopoietic stem cell transplantation is the oldest and most successful form of stem cell therapy. High dose therapy (HDT) followed by hematopoietic stem cell transplantation allows physicians to administer increased amounts of chemotherapy and/or radiation while minimizing negative side effects such as damage to blood-producing bone marrow cells. Although HDT is successful in treating a wide range of cancers, it leads to lethal therapy-related myelodysplasia syndrome or acute myeloid leukemia (t-MDS/AML) in 5--10% of patients undergoing autologous hematopoietic cell transplantation for Hodgkin lymphoma and non-Hodgkin lymphoma. In this study, we carried out metabolomic analysis of peripheral blood stem cell samples collected in a cohort of patients before hematopoietic cell transplantation to gain insights into the molecular and cellular pathogenesis of t-MDS. Nonparametric tests and multivariate analyses were used to compare the metabolite concentrations in samples from patients that developed t-MDS within 5 years of transplantation and the patients that did not. The results suggest that the development of t-MDS is associated with dysfunctions in cellular metabolic pathways. The top canonical pathways suggested by the metabolomic analysis include alanine and aspartate metabolism, glyoxylate and dicarboxylate metabolism, phenylalanine metabolism, citrate acid cycle, and aminoacyl-t-RNA biosynthesis. Dysfunctions in these pathways indicate mitochondrial dysfunction that would result in decreased ability to detoxify reactive oxygen species generated by chemo and radiation therapy, therefore leading to cancer-causing mutations. These observations suggest predisposing factors for the development of t-MDS. 相似文献
9.
Background Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently
the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is
not based on a statistical model. 相似文献
11.
Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. 相似文献
12.
Background The quality of microarray data can seriously affect the accuracy of downstream analyses. In order to reduce variability and
enhance signal reproducibility in these data, many normalization methods have been proposed and evaluated, most of which are
for data obtained from cDNA microarrays and Affymetrix GeneChips. CodeLink Bioarrays are a newly emerged, single-color oligonucleotide
microarray platform. To date, there are no reported studies that evaluate normalization methods for CodeLink Bioarrays. 相似文献
13.
In metabolomics, time-resolved, dynamic or temporal data is more and more collected. The number of methods to analyze such
data, however, is very limited and in most cases the dynamic nature of the data is not even taken into account. This paper
reviews current methods in use for analyzing dynamic metabolomic data. Moreover, some methods from other fields of science
that may be of use to analyze such dynamic metabolomics data are described in some detail. The methods are put in a general
framework after providing a formal definition on what constitutes a ‘dynamic’ method. Some of the methods are illustrated
with real-life metabolomics examples. 相似文献
14.
NMR can be used in food analysis for origin discrimination and biomarker discovery using a metabolomic approach. Here, we present an example of this strategy to discriminate honey samples of different botanical origins. The NMR spectra of 353 chloroform extracts of selected honey samples were analyzed to detect possible markers of their floral origin. Six monofloral Italian honey types (acacia, linden, orange, eucalyptus, chestnut, and honeydew) were analyzed together with polyfloral samples. Specific markers were identified for each monofloral origin: two markers for acacia (chrysin and pinocembrin), one for chestnut (??-LACT-3-PKA), two for orange (8-hydroxylinalool and caffeine), one for eucalyptus (dehydrovomifoliol), one for honeydew (a diacylglycerilether) and two for linden (4-(1-hydroxy-1-methylethyl)cyclohexa-1,3-diene-carboxylic acid and 4-(1-methylethenyl)cyclohexa-1,3-diene-carboxylic acid). An NMR-based metabolomic approach that used O2PLS-DA multivariate data analysis allowed us to discriminate the different types of honey. Two different classifiers were built based on different multivariate techniques. The high precision of the classification obtained suggests that this approach could be useful to develop generally applicable metabolomic tools to discriminate the origin of honey samples. 相似文献
16.
Members of the heat shock protein-90 (Hsp90) family are key regulators of biological processes through dynamic interaction with a multitude of protein partners. However, the transient nature of these interactions hinders the identification of Hsp90 interactors. Here we show that chemical cross-linking with ethylene glycolbis (succinimidylsuccinate), but not shorter cross-linkers, generated an abundant 240-kDa heteroconjugate of the molecular chaperone Hsp90 in different cell types. The combined use of pharmacological and genetic approaches allowed the characterization of the subunit composition and subcellular compartmentalization of the multimeric protein complex, termed p240. The in situ formation of p240 did not require the N-terminal domain or the ATPase activity of Hsp90. Utilizing subcellular fractionation techniques and a cell-impermeant cross-linker, subpopulations of p240 were found to be present in both the plasma membrane and the mitochondria. The Hsp90-interacting proteins, including Hsp70, p60Hop and the scaffolding protein filamin A, had no role in governing the formation of p240. Therefore, chemical cross-linking combined with proteomic methods has the potential to unravel the protein components of this p240 complex and, more importantly, may provide an approach to expand the range of tools available to the study of the Hsp90 interactome. 相似文献
17.
The high spectral congestion typically observed in one-dimensional (1D) 1H nuclear magnetic resonance (NMR) spectra of tissue extracts and biofluids limits the metabolic information that can be extracted. This study evaluates the application of two-dimensional J-resolved (JRES) spectroscopy for metabolomics, which can provide proton-decoupled projected 1D spectra (p-JRES). This approach is illustrated by an investigation of embryogenesis in Japanese medaka (Oryzias latipes), an established fish model for developmental toxicology. When combined with optimized spectral pre-processing,(2) including a 0.005-ppm bin width for data segmentation and a logarithmic transformation, the reduced congestion in the p-JRES spectra increases the likelihood that a specific metabolite can be accurately integrated and thus increases the extractable information content of the spectra. Principal components analysis of the p-JRES spectra reveals the concept of a developmental trajectory that summarizes the changes in the NMR-visible metabolome throughout medaka embryogenesis. Advantages and potential disadvantages of the p-JRES approach are discussed. 相似文献
18.
Traditional medicine markets are provided with medicinal plant material throughout the year, however, internal (e.g. plant age, genetic variability and differential expression of genes) and external factors (e.g. water and nutrient availability, rainfall, photoperiod and herbivory), affect secondary metabolite production in plants. In this study, seasonal variability in metabolite production in Curtisia dentata trees from two geographically separated regions in South Africa are compared. NMR analysis of C. dentata stem bark samples yielded spectral data which were processed in MestReNova to perform multivariate data analysis using Soft Independent Modeling of Class Analogy (SIMCA) software. This study shows that there are not only seasonal, regional and yearly differences in secondary metabolite production in C. dentata trees, but that production patterns of hydrophilic and lipophilic chemical compounds in individual trees also vary. Sucrose, isoeugenol and betulinic acid have been used in a targeted analysis to show the variation in individual compounds in the individual trees as a response to seasonal and geographical differences. Therefore, the season and year, as well as the region, harvesting site and specific trees from which plant material is collected affect the concentrations of chemical compounds extracted from C. dentata stem bark for the preparation of remedies. 相似文献
19.
Silica nanoparticles are increasingly used in the biomedical fields due to their excellent solubility, high stability and favorable biocompatibility. However, despite being considered of low genotoxicity, their bio-related adverse effects have attracted particular concern from both the scientific field and the public. In this study, human cervical adenocarcinoma cells (HeLa line) were exposed to 0.01 or 1.0 mg/mL of hydrophilic silica nanoparticles. The 1H NMR spectroscopy coupled with multivariate statistical analysis were used to characterize the metabolic variations of intracellular metabolites and the compositional changes of the corresponding culture media. At the early stage of silica nanoparticles-exposure, no obvious dose–effect of HeLa cell metabolome was observed, which implied that cellular stress-response regulated the metabolic variations of HeLa cell. Silica nanoparticles induced the increases of lipids including triglyceride, LDL, VLDL and lactate/alanine ratio and the decreases of alanine, ATP, choline, creatine, glycine, glycerol, isoleucine, leucine, phenylalanine, tyrosine, and valine, which involved in membrane modification, catabolism of carbohydrate and protein, and stress-response. Subsequently, a complicated synergistic effect of stress-response and toxicological-effect dominated the biochemical process and metabolic response, which was demonstrated in the reverse changes of some metabolites including acetate, ADP, ATP, choline, creatine, glutamine, glycine, lysine, methionine, phenylalanine and valine between 6 and 48 h post-treatment of silica nanoparticles. The toxicological-effects induced by high-dosage silica nanoparticles could be derived from the elevated levels of ATP and ADP, the utilization of glucose and amino acids and the production of metabolic end-products such as glutamate, glycine, lysine, methionine, phenylalanine, and valine. The results indicated that it is important and necessary to pursue further the physiological responses of silica nanoparticles in animal models and human before their practical use. NMR-based metabolomic analysis helps to understand the biological mechanisms of silica nanoparticles and their metabolic fate, and further, it offers an ideal platform for establishing the bio-safety of existing and new nanomaterials. 相似文献
20.
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu. 相似文献
|