首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 30 毫秒
1.
Surface-enhanced laser desorption/ionization (SELDI) time of flight (TOF) is a mass spectrometry technology for measuring the composition of a sampled protein mixture. A mass spectrum contains peaks corresponding to proteins in the sample. The peak areas are proportional to the measured concentrations of the corresponding proteins. Quantifying peak areas is difficult for existing methods because peak shapes are not constant across a spectrum and because peaks often overlap. We present a new method for quantifying peak areas. Our method decomposes a spectrum into peaks and a baseline using so-called statistical finite mixture models. We illustrate our method in detail on 8 samples from culture media of adipose tissue and globally on 64 samples from serum to compare our method to the standard Ciphergen method. Both methods give similar estimates for singleton peaks, but not for overlapping peaks. The Ciphergen method overestimates the heights of such peaks while our method still gives appropriate estimates. Peak quantification is an important step in pre-processing SELDI-TOF data and improvements therein will pay off in the later biomarker discovery phase.  相似文献   

2.
Zou J  Hong G  Guo X  Zhang L  Yao C  Wang J  Guo Z 《PloS one》2011,6(10):e26294

Background

There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached.

Results

In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased.

Conclusions

Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.  相似文献   

3.
In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges. In addition, we remove peptides with erroneous mass assignment through model fitness check, which compares observed intensity patterns to theoretically constructed ones. The proposed algorithm can significantly improve the accuracy and precision of peptide ratio measurements.  相似文献   

4.
This paper presents computational methods to analyze MALDI-TOF mass spectrometry data for quantitative comparison of peptides and glycans in serum. The methods are applied to identify candidate biomarkers in serum samples of 203 participants from Egypt; 73 hepatocellular carcinoma (HCC) cases, 52 patients with chronic liver disease (CLD) consisting of cirrhosis and fibrosis cases, and 78 population controls. Two complementary sample preparation methods were applied prior to generating mass spectra: (1) low molecular weight (LMW) enrichment of each serum sample was carried out for MALDI-TOF quantification of peptides, and (2) glycans were enzymatically released from proteins in each serum sample and permethylated for MALDI-TOF quantification of glycans. A peak selection algorithm was applied to identify the most useful peptide and glycan peaks for accurate detection of HCC cases from high-risk population of patients with CLD. In addition to global peaks selected by the whole population based approach, where identically labeled patients are treated as a single group, subgroup-specific peaks were identified by searching for peaks that are differentially abundant in a subgroup of patients only. The peak selection process was preceded by peak screening, where we eliminated peaks that have significant association with covariates such as age, gender, and viral infection based on the peptide and glycan spectra from population controls. The performance of the selected peptide and glycan peaks was evaluated in terms of their ability in detecting HCC cases from patients with CLD in a blinded validation set and through the cross-validation method. Finally, we investigated the possibility of using both peptides and glycans in a panel to enhance the diagnostic capability of these candidate markers. Further evaluation is needed to examine the potential clinical utility of the candidate peptide and glycan markers identified in this study.  相似文献   

5.
A method is described for the quantitative determination of peptides using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Known limitations imposed by crystal heterogeneity, peptide ionization differences, data handling, and protein quantification with MALDI-TOF mass spectrometry are addressed in this method with a "seed crystal" protocol for analyte-matrix formation, the use of internal protein standards, and a software package called maldi_quant. The seed crystal protocol, a new variation of the fast-evaporation method, minimizes crystal heterogeneity and allows for consistent collection of protein spectra. The software maldi_quant permits rapid and automated analysis of peak intensity data, normalization of peak intensities to internal standards, and peak intensity deconvolution and estimation for vicinal peaks. Using insulin proteins in a background of other unrelated peptides, this method shows an overall coefficient of variance of 4.4%, and a quantitative working range of 0.58-37.5 ng bovine insulin per spot. Coupling of this methodology to powerful analytical procedures such as immunoprecipitation is likely to lead to the rapid and reliable quantification of biologically relevant proteins and their closely related variants.  相似文献   

6.
We developed an automated quantification workflow for PRM‐enabled detection of D3‐Leu labeled apoA‐I in high‐density lipoprotein (HDL) isolated from humans. Subjects received a bolus injection of D3‐Leu and blood was drawn at eight time points over three days. HDL was isolated and separated into six size fractions for subsequent proteolysis and PRM analysis for the detection of D3‐Leu signal from ~0.03 to 0.6% enrichment. We implemented an intensity‐based quantification approach that takes advantage of high‐resolution/accurate mass PRM scans to identify the D3‐Leu 2HM3 ion from non‐specific peaks. Our workflow includes five modules for extracting the targeted PRM peak intensities (XPIs): Peak centroiding, noise removal, fragment ion matching using Δm/z windows, nine intensity quantification options, and validation and visualization outputs. We optimized the XPI workflow using in vitro synthesized and clinical samples of D0/D3‐Leu labeled apoA‐I. Three subjects’ apoA‐I enrichment curves in six HDL size fractions, and LCAT, apoA‐II and apoE from two size fractions were generated within a few hours. Our PRM strategy and automated quantification workflow will expedite the turnaround of HDL apoA‐I metabolism data in clinical studies that aim to understand and treat the mechanisms behind dyslipidemia.  相似文献   

7.
MOTIVATION: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatment-related morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the 'peak probability contrast' technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrix-assisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. RESULTS: Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data.  相似文献   

8.
Kwon D  Vannucci M  Song JJ  Jeong J  Pfeiffer RM 《Proteomics》2008,8(15):3019-3029
In recent years there has been an increased interest in using protein mass spectroscopy to discriminate diseased from healthy individuals with the aim of discovering molecular markers for disease. A crucial step before any statistical analysis is the pre-processing of the mass spectrometry data. Statistical results are typically strongly affected by the specific pre-processing techniques used. One important pre-processing step is the removal of chemical and instrumental noise from the mass spectra. Wavelet denoising techniques are a standard method for denoising. Existing techniques, however, do not accommodate errors that vary across the mass spectrum, but instead assume a homogeneous error structure. In this paper we propose a novel wavelet denoising approach that deals with heterogeneous errors by incorporating a variance change point detection method in the thresholding procedure. We study our method on real and simulated mass spectrometry data and show that it improves on performances of peak detection methods.  相似文献   

9.
10.
MOTIVATION: There is a well-recognized potential of protein expression profiling using the surface-enhanced laser desorption and ionization technology for discovering biomarkers that can be applied in clinical diagnosis, prognosis and therapy prediction. The pre-processing of the raw data, however, is still problematic. METHODS: We focus on the peak detection step, where the standard method is marked by poor specificity. Currently, scientists need to inspect individual spectra visually and laboriously in order to verify that spectral peaks identified by the standard method are real. Motivated by this multi-spectral process, we investigate an analytical approach-called RS for 'regions of significance'-that reduces the data to a single spectrum of F-statistics capturing significant variability between spectra. To account for multiple testing, we use a false discovery rate criterion for identifying potentially interesting proteins. RESULTS: We show that RS has better operating characteristics than several existing methods and demonstrate routine applications on a number of large datasets.  相似文献   

11.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

12.
Gay S  Binz PA  Hochstrasser DF  Appel RD 《Proteomics》2002,2(10):1374-1391
Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.  相似文献   

13.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

14.
蛋白质定量是探索疾病发生发展状况和寻找新药靶标的重要手段.该领域最常用的技术是比较染色后的二维凝胶上蛋白点的光密度值或综合同位素标记后的质谱峰强度方法.但此二者的样品处理方法都比较麻烦,不利于进行大规模蛋白质组的定量研究.最近几年出现了利用质谱数据进行无标记定量的方法, 根据数据类型这些方法可以分为2类:基于鉴定蛋白的肽段数的方法和基于质谱峰强度的方法,在高通量大规模蛋白组定量研究中有很大优势.本综述主要介绍了这2类无标记定量方法的模型及优缺点,并比较了2类方法的灵敏度和准确度.肽段计数方法在检测蛋白丰度变化时更灵敏,而峰面积强度在评估蛋白比率时更准确.  相似文献   

15.
A variety of quantitative proteomics methods have been developed, including label-free, metabolic labeling, and isobaric chemical labeling using iTRAQ or TMT. Here, these methods were compared in terms of the depth of proteome coverage, quantification accuracy, precision, and reproducibility using a high-performance hybrid mass spectrometer, LTQ Orbitrap Velos. Our results show that (1) the spectral counting method provides the deepest proteome coverage for identification, but its quantification performance is worse than labeling-based approaches, especially the quantification reproducibility; (2) metabolic labeling and isobaric chemical labeling are capable of accurate, precise, and reproducible quantification and provide deep proteome coverage for quantification; isobaric chemical labeling surpasses metabolic labeling in terms of quantification precision and reproducibility; and (3) iTRAQ and TMT perform similarly in all aspects compared in the current study using a CID-HCD dual scan configuration. On the basis of the unique advantages of each method, we provide guidance for selection of the appropriate method for a quantitative proteomics study.  相似文献   

16.
With the recent quick expansion of DNA and protein sequence databases, intensive efforts are underway to interpret the linear genetic information of DNA in terms of function, structure, and control of biological processes. The systematic identification and quantification of expressed proteins has proven particularly powerful in this regard. Large-scale protein identification is usually achieved by automated liquid chromatography-tandem mass spectrometry of complex peptide mixtures and sequence database searching of the resulting spectra [Aebersold and Goodlett, Chem. Rev. 2001, 101, 269-295]. As generating large numbers of sequence-specific mass spectra (collision-induced dissociation/CID) spectra has become a routine operation, research has shifted from the generation of sequence database search results to their validation. Here we describe in detail a novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide sequence in a database. We document the performance of the algorithm on a reference data set and in comparison with another sequence database search tool. The software is publicly available for use and evaluation at http://www.systemsbiology.org/research/software/proteomics/ProbID.  相似文献   

17.
The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.  相似文献   

18.
The identification of peptides and proteins from fragmentation mass spectra is a very common approach in the field of proteomics. Contemporary high-throughput peptide identification pipelines can quickly produce large quantities of MS/MS data that contain valuable knowledge about the actual physicochemical processes involved in the peptide fragmentation process, which can be extracted through extensive data mining studies. As these studies attempt to exploit the intensity information contained in the MS/MS spectra, a critical step required for a meaningful comparison of this information between MS/MS spectra is peak intensity normalization. We here describe a procedure for quantifying the efficiency of different published normalization methods in terms of the quartile coefficient of dispersion (qcod) statistic. The quartile coefficient of dispersion is applied to measure the dispersion of the peak intensities between redundant MS/MS spectra, allowing the quantification of the differences in computed peak intensity reproducibility between the different normalization methods. We demonstrate that our results are independent of the data set used in the evaluation procedure, allowing us to provide generic guidance on the choice of normalization method to apply in a certain MS/MS pipeline application.  相似文献   

19.
Recent development of proteomic array technology, including protein profiling coupling ProteinChip array with surface-enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF/MS), provides a potentially powerful tool for discovery of new biomarkers by comparison of its profiles according to patient phenotypes. We used this approach to identify the host factors associated with treatment response in patients with chronic hepatitis C (CHC) receiving a 48-wk course of pegylated interferon (PEG-IFN) alpha 2b plus ribavirin (RBV). Protein profiles of pretreatment serum samples from 32 patients with genotype 1b and high viral load were conducted by SELDI-TOF/MS by using the three different ProteinChip arrays (CM10, Q10, IMAC30). Proteins showed significantly different peak intensities between sustained virological responders (SVRs), and non-SVRs were identified by chromatography, SDS-PAGE, TOF/MS and tandem mass spectrometry (MS/MS) assay. Eleven peak intensities were significantly different between SVRs and non-SVRs. The three SVR-increased peaks could be identified as two apolipoprotein (Apo) fragments and albumin and, among the eight non-SVR-increased proteins, four peaks identified as two iron-related and two fibrogenesis-related protein fragments, respectively. Multivariate analysis showed that the serum ferritin and three peak intensity values (Apo A1, hemopexin and transferrin) were independent variables associated with SVRs, and the area under the receiver operating characteristic (ROC) curves for SVR prediction by using the Apo A1/hemopexin and hemopexin/transferrin were 0.964 and 0.936. In conclusion, pretreatment serum protein profiling by SELDI-TOF/MS is variable for identification of response-related host factors, which are useful for treatment efficacy prediction in CHC receiving PEG-IFN plus RBV. Our data also may help us understand the mechanism for treatment resistance and development of more effective antiviral therapy targeted toward the modulation of lipogenesis or iron homeostasis in CHC patients.  相似文献   

20.
Lysophosphatidic acid (LPA) is a lipid mediator that may play an important role in wound healing, embryonic development, and progression of cancer. Here, we report a procedure for the quantification of LPA by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The method is based on a characteristic mass shift with total charge change (from -2 to +1) of the phosphate species due to 1:1 complexation of LPA(2-) with a dinuclear zinc (II) complex [1,3-bis[bis(pyridin-2-ylmethyl)amino]propan-2-olato dizinc(II) complex; Zn(2)L(3+)] at physiological pH. The monocationic complex [LPA(2-)-Zn(2)L(3+)](+) was detected in the positive mode, in which no other signal of cation adducts of LPA(2-) was observed. The detection limit of 18:1 LPA by this method was 0.1 pmol on a sample plate. The intensity ratio of [LPA(2-)-Zn(2)L(3+)](+) against an internal standard [17:0 LPA(2-)-Zn(2)L(3+)](+) increased linearly with their molar ratio. Based on the relative intensities of complex ions, we determined the amounts of LPA homologs in an egg white by this method; the results obtained were in good agreement with those by gas liquid chromatography. This sensitive and convenient procedure for LPA-specific detection is useful for the quantification of LPA homologs occurring in biological materials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号