首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
SELDI-TOF mass spectrometer''s compact size and automated, high throughput design have been attractive to clinical researchers, and the platform has seen steady-use in biomarker studies. Despite new algorithms and preprocessing pipelines that have been developed to address reproducibility issues, visual inspection of the results of SELDI spectra preprocessing by the best algorithms still shows miscalled peaks and systematic sources of error. This suggests that there continues to be problems with SELDI preprocessing. In this work, we study the preprocessing of SELDI in detail and introduce improvements. While many algorithms, including the vendor supplied software, can identify peak clusters of specific mass (or m/z) in groups of spectra with high specificity and low false discover rate (FDR), the algorithms tend to underperform estimating the exact prevalence and intensity of peaks in those clusters. Thus group differences that at first appear very strong are shown, after careful and laborious hand inspection of the spectra, to be less than significant. Here we introduce a wavelet/neural network based algorithm which mimics what a team of expert, human users would call for peaks in each of several hundred spectra in a typical SELDI clinical study. The wavelet denoising part of the algorithm optimally smoothes the signal in each spectrum according to an improved suite of signal processing algorithms previously reported (the LibSELDI toolbox under development). The neural network part of the algorithm combines those results with the raw signal and a training dataset of expertly called peaks, to call peaks in a test set of spectra with approximately 95% accuracy. The new method was applied to data collected from a study of cervical mucus for the early detection of cervical cancer in HPV infected women. The method shows promise in addressing the ongoing SELDI reproducibility issues.  相似文献   

2.
Summary A novel algorithm for removing baseline distortions in NMR spectra is presented. The algorithm approximates the baseline as the median of the noise extrema. Consequently, the method does not require that NMR peaks be discriminated from noise peaks. In addition, no assumptions regarding the source or functional form of the distortion are made. The algorithm is shown to remove the baseline artifacts present in a particularly distorted NOESY spectrum and to reveal peaks which had been obscured by the artifacts. The parameters and spectral characteristics (signal-to-noise ratio, NMR peak density, peak linewidths) governing the resolution of the calculated baselines are also explored.  相似文献   

3.
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm.  相似文献   

4.
Mass spectrometry (MS) has shown great potential in detecting disease-related biomarkers for early diagnosis of stroke. To discover potential biomarkers from large volume of noisy MS data, peak detection must be performed first. This article proposes a novel automatic peak detection method for the stroke MS data. In this method, a mixture model is proposed to model the spectrum. Bayesian approach is used to estimate parameters of the mixture model, and Markov chain Monte Carlo method is employed to perform Bayesian inference. By introducing a reversible jump method, we can automatically estimate the number of peaks in the model. Instead of separating peak detection into substeps, the proposed peak detection method can do baseline correction, denoising and peak identification simultaneously. Therefore, it minimizes the risk of introducing irrecoverable bias and errors from each substep. In addition, this peak detection method does not require a manually selected denoising threshold. Experimental results on both simulated dataset and stroke MS dataset show that the proposed peak detection method not only has the ability to detect small signal-to-noise ratio peaks, but also greatly reduces false detection rate while maintaining the same sensitivity.  相似文献   

5.
Mass spectrometry is being used to find disease-related patterns in mixtures of proteins derived from biological fluids. Questions have been raised about the reproducibility and reliability of peak quantifications using this technology. We collected nipple aspirate fluid from breast cancer patients and healthy women, pooled them into a quality control sample, and produced 24 replicate SELDI spectra. We developed a novel algorithm to process the spectra, denoising with the undecimated discrete wavelet transform (UDWT), and evaluated it for consistency and reproducibility. UDWT efficiently decomposes spectra into noise and signal. The noise is consistent and uncorrelated. Baseline correction produces isolated peak clusters separated by flat regions. Our method reproducibly detects more peaks than the method implemented in Ciphergen software. After normalization and log transformation, the mean coefficient of variation of peak heights is 10.6%. Our method to process spectra provides improvements over existing methods. Denoising using the UDWT appears to be an important step toward obtaining results that are more accurate. It improves the reproducibility of quantifications and supplies tools for investigation of the variations in the technology more carefully. Further study will be required, because we do not have a gold standard providing an objective assessment of which peaks are present in the samples.  相似文献   

6.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short‐time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut‐off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin‐bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease‐free patients to detect peaks with S/N≥2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.  相似文献   

7.
A major time-consuming step of protein NMR structure determination is the generation of reliable NOESY cross peak lists which usually requires a significant amount of manual interaction. Here we present a new algorithm for automated peak picking involving wavelet de-noised NOESY spectra in a process where the identification of peaks is coupled to automated structure determination. The core of this method is the generation of incremental peak lists by applying different wavelet de-noising procedures which yield peak lists of a different noise content. In combination with additional filters which probe the consistency of the peak lists, good convergence of the NOESY-based automated structure determination could be achieved. These algorithms were implemented in the context of the ARIA software for automated NOE assignment and structure determination and were validated for a polysulfide-sulfur transferase protein of known structure. The procedures presented here should be commonly applicable for efficient protein NMR structure determination and automated NMR peak picking. Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

8.
MOTIVATION: Surface-enhanced laser desorption and ionization (SELDI) time of flight (TOF) is a mass spectrometry technology. The key features in a mass spectrum are its peaks. In order to locate the peaks and quantify their intensities, several pre-processing steps are required. Though different approaches to perform pre-processing have been proposed, there is no systematic study that compares their performance. RESULTS: In this article, we present the results of a systematic comparison of various popular packages for pre-processing of SELDI-TOF data. We evaluate their performance in terms of two of their primary functions: peak detection and peak quantification. Regarding peak quantification, the performance of the algorithms is measured in terms of reproducibility. For peak detection, the comparison is based on sensitivity and false discovery rate. Our results show that for spectra generated with low laser intensity, the software developed by Ciphergen Biosystems (ProteinChip Software 3.1 with the additional tool Biomarker Wizard) produces relatively good results for both peak quantification and detection. On the other hand, for the data produced with either medium or high laser intensity, none of the methods show uniformly better performances under both criteria. Our analysis suggests that an advantageous combination is the use of the packages MassSpecWavelet and PROcess, the former for peak detection and the latter for peak quantification.  相似文献   

9.
The dominant ions in MS/MS spectra of peptides, which have been fragmented by low-energy CID, are often b-, y-ions and their derivatives resulting from the cleavage of the peptide bonds. However, MS/MS spectra typically contain many more peaks. These can result not only from isotope variants and multiply charged replicates of the peptide fragmentation products but also from unknown fragmentation pathways, sample-specific or systematic chemical contaminations or from noise generated by the electronic detection system. The presence of this background complicates spectrum interpretation. Besides dramatically prolonged computation time, it can lead to incorrect protein identification, especially in the case of de novo sequencing algorithms. Here, we present an algorithm for detection and transformation of multiply charged peaks into singly charged monoisotopic peaks, removal of heavy isotope replicates, and random noise. A quantitative criterion for the recognition of some noninterpretable spectra has been derived as a byproduct. The approach is based on numerical spectral analysis and signal detection methods. The algorithm has been implemented in a stand-alone computer program called MS Cleaner that can be obtained from the authors upon request.  相似文献   

10.

Background  

Mass spectrometry protein profiling is a promising tool for biomarker discovery in clinical proteomics. However, the development of a reliable approach for the separation of protein signals from noise is required. In this paper, LIMPIC, a computational method for the detection of protein peaks from linear-mode MALDI-TOF data is proposed. LIMPIC is based on novel techniques for background noise reduction and baseline removal. Peak detection is performed considering the presence of a non-homogeneous noise level in the mass spectrum. A comparison of the peaks collected from multiple spectra is used to classify them on the basis of a detection rate parameter, and hence to separate the protein signals from other disturbances.  相似文献   

11.
MOTIVATION: This article presents a method to identify the isotopic distributions within a mass spectrum using a probabilistic classifier supplemented with dynamic programming. Such a system is needed for a variety of purposes, including generating robust and meaningful features from mass spectra to be used in classification. RESULTS: The primary result of this article is that the dynamic programming approach significantly improves sensitivity, without harming specificity, of a probabilistic classifier for identifying the isotopic distributions. When annotating isotopic distributions where an expert has performed the initial 'peak-picking' (removal of noise peaks), the dynamic programming approach gives a true positive rate of 96% and a false positive rate of 0.0%, whereas the classifier alone has a true positive rate of only 47% when the false positive rate is 0.0%. When annotating isotopic distributions in machine peak-picked spectra, which may contain many noise peaks, the dynamic programming approach gives a true positive rate of only 22.0%, but it still keeps a low false positive rate of 1.0% and still outperforms the classifier alone. It is important to note that all these rates are when we require exact matches with the distributions in annotated spectra; in our evaluation a distribution is considered 'entirely incorrect' if it is missing even one peak or contains even one extraneous peak. We compared to the THRASH and AID-MS systems using a looser requirement: correctly identifying the distribution that contains the mono-isotopic mass. Under this measure, our dynamic programming approach achieves a true positive rate of 82% and a false positive rate of 1%, which again outperforms the classifier alone. The dynamic programming approach ends up being more conservative than THRASH and AID-MS, yielding both fewer true and false peaks, but the F-score of the dynamic programming approach is significantly better than those of THRASH and AID-MS. All results were obtained with 10-fold cross-validation of 99 sections of mass spectra with a total of 214 hand-annotated isotopic distributions. AVAILABILITY: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM.  相似文献   

12.
Novel algorithms are presented for automated NOESY peak picking and NOE signal identification in homonuclear 2D and heteronuclear-resolved 3D [1H,1H]-NOESY spectra during de novoprotein structure determination by NMR, which have been implemented in the new software ATNOS (automated NOESY peak picking). The input for ATNOS consists of the amino acid sequence of the protein, chemical shift lists from the sequence-specific resonance assignment, and one or several 2D or 3D NOESY spectra. In the present implementation, ATNOS performs multiple cycles of NOE peak identification in concert with automated NOE assignment with the software CANDID and protein structure calculation with the program DYANA. In the second and subsequent cycles, the intermediate protein structures are used as an additional guide for the interpretation of the NOESY spectra. By incorporating the analysis of the raw NMR data into the process of automated de novoprotein NMR structure determination, ATNOS enables direct feedback between the protein structure, the NOE assignments and the experimental NOESY spectra. The main elements of the algorithms for NOESY spectral analysis are techniques for local baseline correction and evaluation of local noise level amplitudes, automated determination of spectrum-specific threshold parameters, the use of symmetry relations, and the inclusion of the chemical shift information and the intermediate protein structures in the process of distinguishing between NOE peaks and artifacts. The ATNOS procedure has been validated with experimental NMR data sets of three proteins, for which high-quality NMR structures had previously been obtained by interactive interpretation of the NOESY spectra. The ATNOS-based structures coincide closely with those obtained with interactive peak picking. Overall, we present the algorithms used in this paper as a further important step towards objective and efficient de novoprotein structure determination by NMR.  相似文献   

13.
Zou J  Hong G  Guo X  Zhang L  Yao C  Wang J  Guo Z 《PloS one》2011,6(10):e26294

Background

There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached.

Results

In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased.

Conclusions

Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.  相似文献   

14.

Background  

Quantitative proteomics technologies have been developed to comprehensively identify and quantify proteins in two or more complex samples. Quantitative proteomics based on differential stable isotope labeling is one of the proteomics quantification technologies. Mass spectrometric data generated for peptide quantification are often noisy, and peak detection and definition require various smoothing filters to remove noise in order to achieve accurate peptide quantification. Many traditional smoothing filters, such as the moving average filter, Savitzky-Golay filter and Gaussian filter, have been used to reduce noise in MS peaks. However, limitations of these filtering approaches often result in inaccurate peptide quantification. Here we present the WaveletQuant program, based on wavelet theory, for better or alternative MS-based proteomic quantification.  相似文献   

15.
Abstract: The inorganic phosphate (Pi) NMR peak in brain has an irregular shape, which suggests that it represents more than a single homogeneous pool of Pi. To test the ability of the Marquardt-Levenberg (M-L) nonlinear curve fit algorithm software (Peak-Fit) to separate multiple peaks, locate peak centers, and estimate peak heights, we studied simulated Pi spectra with defined peak centers, areas, and signal-to-noise (S/N) ratios ranging from ∞ to 5.8. As the S/N ratio decreased below 15, the M-L algorithm located peak centers accurately when they were detected; however, small peaks tended to grow smaller and disappear, whereas the amplitudes of larger peaks increased. We developed an in vitro three-compartment model containing a mixture of Pi buffer, phosphocreatine, phosphate diester, and phosphate monoester (PME), portions of which were adjusted to three different pHs before addition of agar. Weighed samples of each buffered gel together with phospholipid extract and bone chips were placed in an NMR tube and covered with mineral oil. Following baseline correction, it was possible to separate the Pi peaks arising from the three compartments with different pH values if each peak made up 10–35% of total Pi area. In vivo, we identified the plasma compartment by intraarterial infusion of Pi. It was assumed that intracellular compartments contained high-energy phosphates and took up glucose. Based on these assumptions we subjected the brains to complete ischemia and observed that Pi compartments at pH 6.82, 6.92, 7.03, and 7.13 increased markedly in amplitude. If the brain cells took up and phosphorylated 2-deoxyglucose (2-DG), 2-DG-6-phosphate (2-DG-6-P) would appear in the PME portion of the spectrum ionized according to pHi. Four 2-DG-6-P peaks with calculated pH values of 6.86, 6.94, 7.04, and 7.15 did appear in the spectrum, thereby confirming that the four larger Pi peaks represented intracellular spaces.  相似文献   

16.
Peak detection is a key step in the analysis of SELDI-TOF-MS spectra, but the current default method has low specificity and poor peak annotation. To improve data quality, scientists still have to validate the identified peaks visually, a tedious and time-consuming process, especially for large data sets. Hence, there is a genuine need for methods that minimize manual validation. We have previously reported a multi-spectral signal detection method, called RS for 'region of significance', with improved specificity. Here we extend it to include a peak quantification algorithm based on annotated regions of significance (ARS). For each spectral region flagged as significant by RS, we first identify a dominant spectrum for determining the number of peaks and the m/z region of these peaks. From each m/z region of peaks, a peak template is extracted from all spectra via the principal component analysis. Finally, with the template, we estimate the amplitude and location of the peak in each spectrum with the least-squares method and refine the estimation of the amplitude via the mixture model.We have evaluated the ARS algorithm on patient samples from a clinical study. Comparison with the standard method shows that ARS (i) inherits the superior specificity of RS, and (ii) gives more accurate peak annotations than the standard method. In conclusion, we find that ARS alleviates the main problems in the preprocessing of SELDI-TOF spectra. The R-package ProSpect that implements ARS is freely available for academic use at http://www.meb.ki.se/ yudpaw.  相似文献   

17.
The random forest classification method was applied to classify samples from 76 breast cancer patients and 77 controls whose proteomic profile had been obtained using mass spectrometry. The analysis consisted of two stages, the detection of peaks from the profiles and the construction of a classification rule using random forests. Using a peak detection method based on finding common local maxima in the smoothed sample spectra, 444 peaks were detected, reducing to 365 robust peaks found in at least 7 out of 10 random subsets of samples. Subjects were classified as cases or controls using the random forest algorithm applied to the 365 peaks. Based on the prediction of the status of out-of-bag samples, the total error rate was 16.3%, with a sensitivity of 81.6% and a specificity of 85.7%. Measures of importance of each of the peaks were calculated to identify regions of the spectrum influencing the classification, and the four most important peaks were identified as mz3863_13, mz2943_12, mz3193_44 and mz8925_94. Combining initial peak detection with the random forest algorithm provides a high-performance classification system for proteomic data, with unbiased estimates of future performance.  相似文献   

18.
Three most simple Projection-Reconstruction algorithms, namely, the Lowest-Value, Additive Back-Projection and Hybrid Back-Projection/Lowest-Value algorithms, are analyzed. A new, also simple, algorithm that reconstructs the spectrum by utilizing the amplitude histogram at each reconstruction point, is explored. The algorithms are tested using simulated spectra. While all the algorithms considered can potentially result in substantial reduction of the amount of data needed for reconstruction, they can suffer from a number of drawbacks. In particular, they often fail when the spectra are noisy and/or contain overlapping peaks. When compared to the existing algorithms, the new, histogram-based algorithm has the potential advantage of being able to deal with spectra containing peaks of opposite phase.  相似文献   

19.
The purpose of this experiment was to test the stability of the heart rate (HR) power spectrum over time in conscious dogs. HR was recorded for 1 h for each of six animals on 2 days. A Fast Fourier transform was used to derive the HR power spectrum for the 12 contiguous 5-min epochs comprising the 1-h recordings. Changes in frequency and amplitude of the various spectral peaks were quantitatively examined. We confirm the presence of two major concentrations of power centered around 0.02 (low frequency peak) and 0.32 Hz (high frequency peak). However, we observed variations in these spectral peaks, especially their amplitudes, both within each hour and from day 1 to day 2. The amplitudes of these two spectral peaks tended to vary reciprocally. HR power spectra based on 5 min of recorded data were also derived from an additional eight animals in both the lying and standing positions; the power spectra from these short recordings were sufficiently sensitive to detect redistributions in power due to changes in posture in all eight dogs. We conclude that: 1) data should be recorded for relatively long periods (e.g., 1 h) to characterize the HR power spectrum; 2) some variability in frequency and amplitude will persist across spectra even when based on longer data bases; 3) care should be taken to ensure that the subject's behavioral state is stable within the recording period; 4) shorter (e.g., 5 min) data bases are not suitable except for detecting relatively robust changes in the HR power spectrum.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号