首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Serum protein profiling by mass spectrometry is a promising method for early detection of cancer. We have implemented a combined strategy based on matrix-assisted laser desorption ionization mass spectrometry (MALDI MS) and statistical data analysis for serum protein profiling and applied it in a well-described breast cancer case-control study. A rigorous sample collection protocol ensured high quality specimen and reduced bias from preanalytical factors. Preoperative serum samples obtained from 48 breast cancer patients and 28 controls were used to generate MALDI MS protein profiles. A total of nine mass spectrometric protein profiles were obtained for each serum sample. A total of 533 common peaks were defined and represented a 'reference protein profile'. Among these 533 common peaks, we identified 72 peaks exhibiting statistically significant intensity differences ( p < 0.01) between cases and controls. A diagnostic rule based on these 72 mass values was constructed and exhibited a cross-validated sensitivity and specificity of approximately 85% for the detection of breast cancer. With this method, it was possible to distinguish early stage cancers from controls without major loss of sensitivity and specificity. We conclude that optimized serum sample handling and mass spectrometry data acquisition strategies in combination with statistical analysis provide a viable platform for serum protein profiling in cancer diagnosis.  相似文献   

2.
The random forest classification method was applied to classify samples from 76 breast cancer patients and 77 controls whose proteomic profile had been obtained using mass spectrometry. The analysis consisted of two stages, the detection of peaks from the profiles and the construction of a classification rule using random forests. Using a peak detection method based on finding common local maxima in the smoothed sample spectra, 444 peaks were detected, reducing to 365 robust peaks found in at least 7 out of 10 random subsets of samples. Subjects were classified as cases or controls using the random forest algorithm applied to the 365 peaks. Based on the prediction of the status of out-of-bag samples, the total error rate was 16.3%, with a sensitivity of 81.6% and a specificity of 85.7%. Measures of importance of each of the peaks were calculated to identify regions of the spectrum influencing the classification, and the four most important peaks were identified as mz3863_13, mz2943_12, mz3193_44 and mz8925_94. Combining initial peak detection with the random forest algorithm provides a high-performance classification system for proteomic data, with unbiased estimates of future performance.  相似文献   

3.
MOTIVATION: Application of mass spectrometry in proteomics is a breakthrough in high-throughput analyses. Early applications have focused on protein expression profiles to differentiate among various types of tissue samples (e.g. normal versus tumor). Here our goal is to use mass spectra to differentiate bacterial species using whole-organism samples. The raw spectra are similar to spectra of tissue samples, raising some of the same statistical issues (e.g. non-uniform baselines and higher noise associated with higher baseline), but are substantially noisier. As a result, new preprocessing procedures are required before these spectra can be used for statistical classification. RESULTS: In this study, we introduce novel preprocessing steps that can be used with any mass spectra. These comprise a standardization step and a denoising step. The noise level for each spectrum is determined using only data from that spectrum. Only spectral features that exceed a threshold defined by the noise level are subsequently used for classification. Using this approach, we trained the Random Forest program to classify 240 mass spectra into four bacterial types. The method resulted in zero prediction errors in the training samples and in two test datasets having 240 and 300 spectra, respectively.  相似文献   

4.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

5.
Surface-enhanced laser desorption/ionization (SELDI) time of flight (TOF) is a mass spectrometry technology for measuring the composition of a sampled protein mixture. A mass spectrum contains peaks corresponding to proteins in the sample. The peak areas are proportional to the measured concentrations of the corresponding proteins. Quantifying peak areas is difficult for existing methods because peak shapes are not constant across a spectrum and because peaks often overlap. We present a new method for quantifying peak areas. Our method decomposes a spectrum into peaks and a baseline using so-called statistical finite mixture models. We illustrate our method in detail on 8 samples from culture media of adipose tissue and globally on 64 samples from serum to compare our method to the standard Ciphergen method. Both methods give similar estimates for singleton peaks, but not for overlapping peaks. The Ciphergen method overestimates the heights of such peaks while our method still gives appropriate estimates. Peak quantification is an important step in pre-processing SELDI-TOF data and improvements therein will pay off in the later biomarker discovery phase.  相似文献   

6.

Background  

Mass spectrometry protein profiling is a promising tool for biomarker discovery in clinical proteomics. However, the development of a reliable approach for the separation of protein signals from noise is required. In this paper, LIMPIC, a computational method for the detection of protein peaks from linear-mode MALDI-TOF data is proposed. LIMPIC is based on novel techniques for background noise reduction and baseline removal. Peak detection is performed considering the presence of a non-homogeneous noise level in the mass spectrum. A comparison of the peaks collected from multiple spectra is used to classify them on the basis of a detection rate parameter, and hence to separate the protein signals from other disturbances.  相似文献   

7.
Time-Of-Flight Mass Spectrometry (TOF-SIMS) was used to determine elemental and biomolecular ions from isolated protein samples. We identified a set of 23 mass-to-charge ratio (m/z) peaks that represent signatures for distinguishing biological samples. The 23 peaks were identified by Singular Value Decomposition (SVD) and Canonical Analysis (CA) to find the underlying structure in the complex mass-spectra data sets. From this modified data, SVD was used to identify sets of m/z peaks, and we used these patterns from the TOF-SIMS data to predict the biological source from which individual mass spectra were generated. The signatures were validated using an additional data set different from the initial training set used to identify the signatures. We present a simple method to identify multiple variables required for sample classification based on mass spectra that avoids overfit. This is important in a variety of studies using mass spectrometry, including the ability to identify proteins in complex mixtures and for the identification of new biomarkers.  相似文献   

8.
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm.  相似文献   

9.
MOTIVATION: Independent component analysis (ICA) is a signal processing technique that can be utilized to recover independent signals from a set of their linear mixtures. We propose ICA for the analysis of signals obtained from large proteomics investigations such as clinical multi-subject studies based on MALDI-TOF MS profiling. The method is validated on simulated and experimental data for demonstrating its capability of correctly extracting protein profiles from MALDI-TOF mass spectra. RESULTS: The comparison on peak detection with an open-source and two commercial methods shows its superior reliability in reducing the false discovery rate of protein peak masses. Moreover, the integration of ICA and statistical tests for detecting the differences in peak intensities between experimental groups allows to identify protein peaks that could be indicators of a diseased state. This data-driven approach demonstrates to be a promising tool for biomarker-discovery studies based on MALDI-TOF MS technology. AVAILABILITY: The MATLAB implementation of the method described in the article and both simulated and experimental data are freely available at http://www.unich.it/proteomica/bioinf/.  相似文献   

10.
Summary .   In this article, we apply the recently developed Bayesian wavelet-based functional mixed model methodology to analyze MALDI-TOF mass spectrometry proteomic data. By modeling mass spectra as functions, this approach avoids reliance on peak detection methods. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical or experimental covariates that may affect both the intensities and locations of peaks in the spectra. For example, this provides a straightforward way to account for systematic block and batch effects that characterize these data. From the model output, we identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level. We apply this method to two cancer studies.  相似文献   

11.
Kwon D  Vannucci M  Song JJ  Jeong J  Pfeiffer RM 《Proteomics》2008,8(15):3019-3029
In recent years there has been an increased interest in using protein mass spectroscopy to discriminate diseased from healthy individuals with the aim of discovering molecular markers for disease. A crucial step before any statistical analysis is the pre-processing of the mass spectrometry data. Statistical results are typically strongly affected by the specific pre-processing techniques used. One important pre-processing step is the removal of chemical and instrumental noise from the mass spectra. Wavelet denoising techniques are a standard method for denoising. Existing techniques, however, do not accommodate errors that vary across the mass spectrum, but instead assume a homogeneous error structure. In this paper we propose a novel wavelet denoising approach that deals with heterogeneous errors by incorporating a variance change point detection method in the thresholding procedure. We study our method on real and simulated mass spectrometry data and show that it improves on performances of peak detection methods.  相似文献   

12.
Wagner M  Naik D  Pothen A 《Proteomics》2003,3(9):1692-1698
We report our results in classifying protein matrix-assisted laser desorption/ionization-time of flight mass spectra obtained from serum samples into diseased and healthy groups. We discuss in detail five of the steps in preprocessing the mass spectral data for biomarker discovery, as well as our criterion for choosing a small set of peaks for classifying the samples. Cross-validation studies with four selected proteins yielded misclassification rates in the 10-15% range for all the classification methods. Three of these proteins or protein fragments are down-regulated and one up-regulated in lung cancer, the disease under consideration in this data set. When cross-validation studies are performed, care must be taken to ensure that the test set does not influence the choice of the peaks used in the classification. Misclassification rates are lower when both the training and test sets are used to select the peaks used in classification versus when only the training set is used. This expectation was validated for various statistical discrimination methods when thirteen peaks were used in cross-validation studies. One particular classification method, a linear support vector machine, exhibited especially robust performance when the number of peaks was varied from four to thirteen, and when the peaks were selected from the training set alone. Experiments with the samples randomly assigned to the two classes confirmed that misclassification rates were significantly higher in such cases than those observed with the true data. This indicates that our findings are indeed significant. We found closely matching masses in a database for protein expression in lung cancer for three of the four proteins we used to classify lung cancer. Data from additional samples, increased experience with the performance of various preprocessing techniques, and affirmation of the biological roles of the proteins that help in classification, will strengthen our conclusions in the future.  相似文献   

13.
MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.  相似文献   

14.
Plasma biomarkers of exposure to environmental contaminants play an important role in early detection of disease. The emerging field of proteomics presents an attractive opportunity for candidate biomarker discovery, as it simultaneously measures and analyzes a large number of proteins. This article presents a case study for measuring arsenic concentrations in a population residing in an As-endemic region of Bangladesh using plasma protein expressions measured by SELDI-TOF mass spectrometry. We analyze the data using a unified statistical method based on functional learning to preprocess mass spectra and extract mass spectrometry (MS) features and to associate the selected MS features with arsenic exposure measurements. The task is challenging due to several factors, the high dimensionality of mass spectrometry data, complicated error structures, and a multiple comparison problem. We use nonparametric functional regression techniques for MS modeling, peak detection based on the significant zero-downcrossing method, and peak alignment using a warping algorithm. Our results show significant associations of arsenic exposure to either under- or overexpressions of 20 proteins.  相似文献   

15.
Mass spectrometry biomarker discovery may assist patient's diagnosis in time and realize the characteristics of new diseases. Our previous work built a preprocess method called HHTmass which is capable of removing noise, but HHTmass only a proof of principle to be peak detectable and did not tested for peak reappearance rate and used on medical data. We developed a modified version of biomarker discovery method called Enhance HHTMass (E-HHTMass) for MALDI-TOF and SELDI-TOF mass spectrometry data which improved old HHTMass method by removing the interpolation and the biomarker discovery process. E-HHTMass integrates the preprocessing and classification functions to identify significant peaks. The results show that most known biomarker can be found and high peak appearance rate achieved comparing to MSCAP and old HHTMass2. E-HHTMass is able to adapt to spectra with a small increasing interval. In addition, new peaks are detected which can be potential biomarker after further validation.  相似文献   

16.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short‐time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut‐off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin‐bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease‐free patients to detect peaks with S/N≥2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.  相似文献   

17.
This paper presents computational methods to analyze MALDI-TOF mass spectrometry data for quantitative comparison of peptides and glycans in serum. The methods are applied to identify candidate biomarkers in serum samples of 203 participants from Egypt; 73 hepatocellular carcinoma (HCC) cases, 52 patients with chronic liver disease (CLD) consisting of cirrhosis and fibrosis cases, and 78 population controls. Two complementary sample preparation methods were applied prior to generating mass spectra: (1) low molecular weight (LMW) enrichment of each serum sample was carried out for MALDI-TOF quantification of peptides, and (2) glycans were enzymatically released from proteins in each serum sample and permethylated for MALDI-TOF quantification of glycans. A peak selection algorithm was applied to identify the most useful peptide and glycan peaks for accurate detection of HCC cases from high-risk population of patients with CLD. In addition to global peaks selected by the whole population based approach, where identically labeled patients are treated as a single group, subgroup-specific peaks were identified by searching for peaks that are differentially abundant in a subgroup of patients only. The peak selection process was preceded by peak screening, where we eliminated peaks that have significant association with covariates such as age, gender, and viral infection based on the peptide and glycan spectra from population controls. The performance of the selected peptide and glycan peaks was evaluated in terms of their ability in detecting HCC cases from patients with CLD in a blinded validation set and through the cross-validation method. Finally, we investigated the possibility of using both peptides and glycans in a panel to enhance the diagnostic capability of these candidate markers. Further evaluation is needed to examine the potential clinical utility of the candidate peptide and glycan markers identified in this study.  相似文献   

18.
A common animal model of chemical hepatocarcinogenesis was used to demonstrate the potential identification of carcinogenicity related protein signatures/biomarkers. Therefore, an animal study in which rats were treated with the known liver carcinogen N-nitrosomorpholine (NNM) or the corresponding vehicle was evaluated. Histopathological investigation as well as SELDI-TOF-MS analysis was performed. SELDI-TOF-MS is an affinity-based mass spectrometry method in which subsets of proteins from biological samples are selectively adsorbed to a chemically modified surface. The proteins are subsequently analyzed with respect to their mass-charge ratios (m/z) by a time of flight (TOF) mass spectrometry (MS) approach. As data preprocessing of SELDI-TOF-MS spectra is essential, baseline correction, normalization, peak detection, and alignment of raw spectra were performed using either the Ciphergen ProteinChip Software 3.1 or functions implemented in the library PROcess of the BioConductor Project. Baseline correction and normalization algorithms of both tools lead to comparable results, whereas results after peak detection and alignment steps differed. Variability between technical and biological replicates was investigated. A linear mixed model with factors experimental group and time point was applied for each protein peak, taking into account the different correlation structure of technical and biological replicates. Alternatively, only median intensity values of technical replicates were used. Results of both models were similar and correlated well with those of the histopathological evaluation of the study. In conclusion, statistical analyses lead to comparable results, whereas parameter settings for preprocessing proved to be crucial.  相似文献   

19.
The detection of biomarkers in biological fluids has been advanced by the introduction of mass spectrometry screening methods such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOFMS), which enables the detection of the presence and the molecular mass of proteins in unfractionated mixtures. The generation of reproducible mass spectra over the course of an experiment is vital in obtaining data in which differences in protein profiles between diseased and healthy states can be assessed correctly. We have developed a protocol to automate the collection of protein profiling data from a large number of samples using MALDI-TOFMS, and we used these samples to characterize the technical reproducibility of the method. This protocol has been used for the analysis of proteins found in bronchoalveolar lavage fluid samples from mice with the ultimate goal of enabling the discovery of differential expression patterns predictive of the development of chronic obstructive pulmonary disease. Samples were purified using magnetic bead-based technology and analyzed on an AnchorChip target plate. Our results demonstrate that the number of peaks detected reproducibly decreases significantly as sample size increases, which motivates the need for technical replicates to be explicitly included in the analysis of MALDI-TOF-based protein profiling studies.  相似文献   

20.
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号