首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2.  相似文献   

2.
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.  相似文献   

3.
We established a step-by-step, experiment-guided metabolomics procedure, based on LC-ESI-MS analysis, to generate a detailed picture of the changing metabolic profiles during late berry development in the important Italian grapevine cultivar Corvina. We sampled berries from four developmental time points and three post-harvest time points during the withering process, and used chromatograms of methanolic extracts to test the performance of the MetAlign and MZmine data mining programs. MZmine achieved a better resolution and therefore generated a more useful data matrix. Then both the quantitative performance of the analytical platform and the matrix effect were assessed, and the final dataset was investigated by multivariate data analysis. Our analysis confirmed the results of previous studies but also revealed some novel findings, including the prevalence of two specific flavonoids in unripe berries and important differences between the developmental profiles of flavones and flavanones, suggesting that specific individual metabolites could have different functions, and that flavones and flavanones probably play quite distinct biological roles. Moreover, the hypothesis-free multivariate analysis of subsets of the wide data matrix evidentiated the relationships between the various classes of metabolites, such as those between anthocyanins and hydroxycinnamic acids and between flavan-3-ols and anthocyanins.  相似文献   

4.
The diagnosis of cancer by examination of the urine has the potential to improve patient outcomes by means of earlier detection. Due to the fact that the urine contains metabolic signatures of many biochemical pathways, this biofluid is ideally suited for metabolomic analysis, especially involving diseases of the kidney and urinary system. In this pilot study, we test three independent analytical techniques for suitability for detection of renal cell carcinoma (RCC) in urine of affected patients. Hydrophilic interaction chromatography (HILIC-LC-MS), reversed-phase ultra performance liquid chromatography (RP-UPLC-MS), and gas chromatography time-of-flight mass spectrometry (GC-TOF-MS) all were used as complementary separation techniques. The combination of these techniques is best suited to cover a very large part of the urine metabolome by enabling the detection of both lipophilic and hydrophilic metabolites present therein. In this study, it is demonstrated that sample pretreatment with urease dramatically alters the metabolome composition apart from removal of urea. Two new freely available peak alignment methods, MZmine and XCMS, are used for peak detection and retention time alignment. The results are analyzed by a feature selection algorithm with subsequent univariate analysis of variance (ANOVA) and a multivariate partial least squares (PLS) approach. From more than 2000 mass spectral features detected in the urine, we identify several significant components that lead to discrimination between RCC patients and controls despite the relatively small sample size. A feature selection process condensed the significant features to less than 30 components in each of the data sets. In future work, these potential biomarkers will be further validated with a larger patient cohort. Such investigation will likely lead to clinically applicable assays for earlier diagnosis of RCC, as well as other malignancies, and thereby improved patient prognosis.  相似文献   

5.
6.
Metabolic footprinting has been applied as a non-invasive approach to study the behaviour and responses of cultured cells to a range of genetic and environmental perturbations. Gas chromatography interfaced with time-of-flight mass spectrometry (GC-ToF-MS) has become a powerful tool for the analysis of metabolome-derived samples. Generally, two data analysis strategies are used to interrogate and understand the biological patterns within the multi-dimensional data. The first strategy, a commoner one, uses multivariate analysis after chromatographic and mass spectral deconvolution, and the second strategy directly employs multivariate analysis of non-deconvoluted data. Here, two strategies have been assessed for the separation and classification of metabolic footprints (exometabolomes) of two strains of Candida albicans grown on three different carbon sources (glycerol, glucose and galactose). We describe a semi-automated approach that simultaneously processes all samples using the chromatographic dimension data with principal components analysis (PCA), which can include data pre-processing before PCA analysis. The preprocessed and non-deconvoluted total ion chromatogram (TIC) data showed good separation of classes defined by growth on different carbon sources and when comparing the two strains grown on the same carbon source separation was achieved for strains grown on glucose and glycerol after preprocessing. The discrimination observed is greater for preprocessed and non-deconvoluted TIC data than for that of preprocessed and non-deconvoluted single ion chromatogram data. The results from the proposed approach with those produced by MZmine were compared. The results from MZmine data depicted separations in PCA space according to carbon source, but no separation was seen when studying strains grown on the same carbon source. Our research showed that the non-deconvoluted strategy is suitable for fast comparison of large sets of GC-MS data although it will not directly provide biological information. The non-deconvoluted strategy can avoid problems of analyzing complex samples using deconvolution software.  相似文献   

7.
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models.  相似文献   

8.
MOTIVATION: High-throughput 'ChIP-chip' and 'ChIP-seq' methodologies generate sufficiently large data sets that analysis poses significant informatics challenges, particularly for research groups with modest computational support. To address this challenge, we devised a software platform for storing, analyzing and visualizing high resolution genome-wide binding data. GeneTrack automates several steps of a typical data processing pipeline, including smoothing and peak detection, and facilitates dissemination of the results via the web. Our software is freely available via the Google Project Hosting environment at http://genetrack.googlecode.com  相似文献   

9.
MOTIVATION: Independent component analysis (ICA) is a signal processing technique that can be utilized to recover independent signals from a set of their linear mixtures. We propose ICA for the analysis of signals obtained from large proteomics investigations such as clinical multi-subject studies based on MALDI-TOF MS profiling. The method is validated on simulated and experimental data for demonstrating its capability of correctly extracting protein profiles from MALDI-TOF mass spectra. RESULTS: The comparison on peak detection with an open-source and two commercial methods shows its superior reliability in reducing the false discovery rate of protein peak masses. Moreover, the integration of ICA and statistical tests for detecting the differences in peak intensities between experimental groups allows to identify protein peaks that could be indicators of a diseased state. This data-driven approach demonstrates to be a promising tool for biomarker-discovery studies based on MALDI-TOF MS technology. AVAILABILITY: The MATLAB implementation of the method described in the article and both simulated and experimental data are freely available at http://www.unich.it/proteomica/bioinf/.  相似文献   

10.

Background

Label-free quantitation of mass spectrometric data is one of the simplest and least expensive methods for differential expression profiling of proteins and metabolites. The need for high accuracy and performance computational label-free quantitation methods is still high in the biomarker and drug discovery research field. However, recent most advanced types of LC-MS generate huge amounts of analytical data with high scan speed, high accuracy and resolution, which is often impossible to interpret manually. Moreover, there are still issues to be improved for recent label-free methods, such as how to reduce false positive/negatives of the candidate peaks, how to expand scalability and how to enhance and automate data processing. AB3D (A simple label-free quantitation algorithm for Biomarker Discovery in Diagnostics and Drug discovery using LC-MS) has addressed these issues and has the capability to perform label-free quantitation using MS1 for proteomics study.

Results

We developed an algorithm called AB3D, a label free peak detection and quantitative algorithm using MS1 spectral data. To test our algorithm, practical applications of AB3D for LC-MS data sets were evaluated using 3 datasets. Comparisons were then carried out between widely used software tools such as MZmine 2, MSight, SuperHirn, OpenMS and our algorithm AB3D, using the same LC-MS datasets. All quantitative results were confirmed manually, and we found that AB3D could properly identify and quantify known peptides with fewer false positives and false negatives compared to four other existing software tools using either the standard peptide mixture or the real complex biological samples of Bartonella quintana (strain JK31). Moreover, AB3D showed the best reliability by comparing the variability between two technical replicates using a complex peptide mixture of HeLa and BSA samples. For performance, the AB3D algorithm is about 1.2 - 15 times faster than the four other existing software tools.

Conclusions

AB3D is a simple and fast algorithm for label-free quantitation using MS1 mass spectrometry data for large scale LC-MS data analysis with higher true positive and reasonable false positive rates. Furthermore, AB3D demonstrated the best reproducibility and is about 1.2- 15 times faster than those of existing 4 software tools.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0376-0) contains supplementary material, which is available to authorized users.  相似文献   

11.
An integrated software system for analyzing ChIP-chip and ChIP-seq data   总被引:1,自引:0,他引:1  
Ji H  Jiang H  Ma W  Johnson DS  Myers RM  Wong WH 《Nature biotechnology》2008,26(11):1293-1300
  相似文献   

12.
The combined method of LC-MS/MS is increasingly being used to explore differences in the proteomic composition of complex biological systems. The reliability and utility of such comparative protein expression profiling studies is critically dependent on an accurate and rigorous assessment of quantitative changes in the relative abundance of the myriad of proteins typically present in a biological sample such as blood or tissue. In this review, we provide an overview of key statistical and computational issues relevant to bottom-up shotgun global proteomic analysis, with an emphasis on methods that can be applied to improve the dependability of biological inferences drawn from large proteomic datasets. Focusing on a start-to-finish approach, we address the following topics: 1) low-level data processing steps, such as formation of a data matrix, filtering, and baseline subtraction to minimize noise, 2) mid-level processing steps, such as data normalization, alignment in time, peak detection, peak quantification, peak matching, and error models, to facilitate profile comparisons; and, 3) high-level processing steps such as sample classification and biomarker discovery, and related topics such as significance testing, multiple testing, and choice of feature space. We report on approaches that have recently been developed for these steps, discussing their merits and limitations, and propose areas deserving of further research.  相似文献   

13.
The paper presents two analyzes of the MALDI-TOF mass spectrometry dataset. Both analyzes use the support vector machine as a tool to build a prediction model. The first analysis which is our contribution to the competition uses the given spectra data without further processing. In the second analysis, we employed an additional preprocessing step consisting of peak detection, peak alignment and feature selection based on statistical tests. The experimental results suggest that the preprocessing step with feature selection improves prediction accuracy.  相似文献   

14.
Image registration has been used to support pixel-level data analysis on pedobarographic image data sets. Some registration methods have focused on robustness and sacrificed speed, but a recent approach based on external contours offered both high computational processing speed and high accuracy. However, since contours can be influenced by local perturbations, we sought more global methods. Thus, we propose two new registration methods based on the Fourier transform, cross-correlation and phase correlation which offer high computational speed. We found out that both proposed methods revealed high accuracy for the similarity measures considered, using control geometric transformations. Additionally, both methods revealed high computational processing speed which, combined with their accuracy and robustness, allows their implementation in near-real-time applications. Furthermore, we found that the current methods were robust to moderate levels of noise, and consequently, do not require noise removal procedure like the contours method does.  相似文献   

15.
NMR-based metabolomics requires robust automated methodologies, and the accuracy of NMR-based metabolomics data is greatly influenced by the reproducibility of data acquisition and processing methods. Effective water resonance signal suppression and reproducible spectral phasing and baseline traces across series of related samples are crucial for statistical analysis. We assess robustness, repeatability, sensitivity, selectivity, and practicality of commonly used solvent peak suppression methods in the NMR analysis of biofluids with respect to the automated processing of the NMR spectra and the impact of pulse sequence and data processing methods on the sensitivity of pattern recognition and statistical analysis of the metabolite profiles. We introduce two modifications to the excitation sculpting pulse sequence whereby the excitation solvent suppression pulse cascade is preceded by low-power water resonance presaturation pulses during the relaxation delay. Our analysis indicates that combining water presaturation with excitation sculpting water suppression delivers the most reproducible and information-rich NMR spectra of biofluids.  相似文献   

16.
MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de  相似文献   

17.
Endophytic fungi associated with medicinal plants are a potential source of novel chemistry and biology that may find applications as pharmaceutical and agrochemical drugs. In this study, a combination of metabolomics and bioactivity‐guided approaches were employed to isolate secondary metabolites with cytotoxicity against cancer cells from an endophytic Aspergillus aculeatus. The endophyte was isolated from the Egyptian medicinal plant Terminalia laxiflora and identified using molecular biological methods. Metabolomics and dereplication studies were accomplished by utilizing the MZmine software coupled with the universal Dictionary of Natural Products database. Metabolic profiling, with aid of multivariate data analysis, was performed at different stages of the growth curve to choose the optimized method suitable for up‐scaling. The optimized culture method yielded a crude extract abundant with biologically‐active secondary metabolites. Crude extracts were fractionated using different high‐throughput chromatographic techniques. Purified compounds were identified by HR‐ESI‐MS, 1D‐ and 2D‐NMR. This study introduced a new method of dereplication utilizing both high‐resolution mass spectrometry and NMR spectroscopy. The metabolites were putatively identified by applying a chemotaxonomic filter. We also present a short review on the diverse chemistry of terrestrial endophytic strains of Aspergillus, which has become a part of our dereplication work and this will be of wide interest to those working in this field.  相似文献   

18.
Nmrglue, an open source Python package for working with multidimensional NMR data, is described. When used in combination with other Python scientific libraries, nmrglue provides a highly flexible and robust environment for spectral processing, analysis and visualization and includes a number of common utilities such as linear prediction, peak picking and lineshape fitting. The package also enables existing NMR software programs to be readily tied together, currently facilitating the reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON, and Rowland NMR Toolkit file formats. In addition to standard applications, the versatility offered by nmrglue makes the package particularly suitable for tasks that include manipulating raw spectrometer data files, automated quantitative analysis of multidimensional NMR spectra with irregular lineshapes such as those frequently encountered in the context of biomacromolecular solid-state NMR, and rapid implementation and development of unconventional data processing methods such as covariance NMR and other non-Fourier approaches. Detailed documentation, install files and source code for nmrglue are freely available at http://nmrglue.com. The source code can be redistributed and modified under the New BSD license.  相似文献   

19.
Statistical methods for microarray assays   总被引:1,自引:0,他引:1  
The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating differential expression, multivariate approaches, clustering methods, as well as classification and discrimination are reviewed. The need is stressed for routine statistical data processing protocols and for the search of links of microarray data analysis with quantitative genetic models.  相似文献   

20.
SUMMARY: Data processing, analysis and visualization (datPAV) is an exploratory tool that allows experimentalist to quickly assess the general characteristics of the data. This platform-independent software is designed as a generic tool to process and visualize data matrices. This tool explores organization of the data, detect errors and support basic statistical analyses. Processed data can be reused whereby different step-by-step data processing/analysis workflows can be created to carry out detailed investigation. The visualization option provides publication-ready graphics. Applications of this tool are demonstrated at the web site for three cases of metabolomics, environmental and hydrodynamic data analysis. AVAILABILITY: datPAV is available free for academic use at http://www.sdwa.nus.edu.sg/datPAV/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号