首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short‐time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut‐off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin‐bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease‐free patients to detect peaks with S/N≥2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.  相似文献   

2.
Peak detection is a key step in the analysis of SELDI-TOF-MS spectra, but the current default method has low specificity and poor peak annotation. To improve data quality, scientists still have to validate the identified peaks visually, a tedious and time-consuming process, especially for large data sets. Hence, there is a genuine need for methods that minimize manual validation. We have previously reported a multi-spectral signal detection method, called RS for 'region of significance', with improved specificity. Here we extend it to include a peak quantification algorithm based on annotated regions of significance (ARS). For each spectral region flagged as significant by RS, we first identify a dominant spectrum for determining the number of peaks and the m/z region of these peaks. From each m/z region of peaks, a peak template is extracted from all spectra via the principal component analysis. Finally, with the template, we estimate the amplitude and location of the peak in each spectrum with the least-squares method and refine the estimation of the amplitude via the mixture model.We have evaluated the ARS algorithm on patient samples from a clinical study. Comparison with the standard method shows that ARS (i) inherits the superior specificity of RS, and (ii) gives more accurate peak annotations than the standard method. In conclusion, we find that ARS alleviates the main problems in the preprocessing of SELDI-TOF spectra. The R-package ProSpect that implements ARS is freely available for academic use at http://www.meb.ki.se/ yudpaw.  相似文献   

3.
We have developed an automated procedure for aligning peaks in multiple TOF spectra that eliminates common timing errors and small variations in spectrometer output. Our method incorporates high-resolution peak detection, re-binning, and robust linear data fitting in the time domain. This procedure aligns label-free (uncalibrated) peaks to minimize the variation in each peak's location from one spectrum to the next, while maintaining a high number of degrees of freedom. We apply our method to replicate pooled-serum spectra from multiple laboratories and increase peak precision (t/sigma(t)) to values limited only by small random errors (with sigma(t) less than one time count in 89 out of 91 instances, 13 peaks in seven datasets). The resulting high precision allowed for an order of magnitude improvement in peak m/z reproducibility. We show that the CV for m/z is 0.01% (100 ppm) for 12 out of the 13 peaks that were observed in all datasets between 2995 and 9297 Da.  相似文献   

4.
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm.  相似文献   

5.
We present an evaluation of the accuracy and precision of relaxation rates calculated using a variety of methods, applied to data sets obtained for several very different protein systems. We show that common methods of data evaluation, such as the determination of peak heights and peak volumes, may be subject to bias, giving incorrect values for quantities such as R1 and R2. For example, one common method of peak-height determination, using a search routine to obtain the peak-height maximum in successive spectra, may be a source of significant systematic error in the relaxation rate. The alternative use of peak volumes or of a fixed coordinate position for the peak height in successive spectra gives more accurate results, particularly in cases where the signal/noise is low, but these methods have inherent problems of their own. For example, volumes are difficult to quantitate for overlapped peaks. We show that with any method of sampling the peak intensity, the choice of a 2- or 3-parameter equation to fit the exponential relaxation decay curves can dramatically affect both the accuracy and precision of the calculated relaxation rates. In general, a 2-parameter fit of relaxation decay curves is preferable. However, for very low intensity peaks a 3 parameter fit may be more appropriate.  相似文献   

6.
Mass spectrometry biomarker discovery may assist patient's diagnosis in time and realize the characteristics of new diseases. Our previous work built a preprocess method called HHTmass which is capable of removing noise, but HHTmass only a proof of principle to be peak detectable and did not tested for peak reappearance rate and used on medical data. We developed a modified version of biomarker discovery method called Enhance HHTMass (E-HHTMass) for MALDI-TOF and SELDI-TOF mass spectrometry data which improved old HHTMass method by removing the interpolation and the biomarker discovery process. E-HHTMass integrates the preprocessing and classification functions to identify significant peaks. The results show that most known biomarker can be found and high peak appearance rate achieved comparing to MSCAP and old HHTMass2. E-HHTMass is able to adapt to spectra with a small increasing interval. In addition, new peaks are detected which can be potential biomarker after further validation.  相似文献   

7.
We present an algorithmic method allowing automatic tracking of NMR peaks in a series of spectra. It consists in a two phase analysis. The first phase is a local modeling of the peak displacement between two consecutive experiments using distance matrices. Then, from the coefficients of these matrices, a value graph containing the a priori set of possible paths used by these peaks is generated. On this set, the minimization under constraint of the target function by a heuristic approach provides a solution to the peak-tracking problem. This approach has been named GAPT, standing for General Algorithm for NMR Peak Tracking. It has been validated in numerous simulations resembling those encountered in NMR spectroscopy. We show the robustness and limits of the method for situations with many peak-picking errors, and presenting a high local density of peaks. It is then applied to the case of a temperature study of the NMR spectrum of the Lipid Transfer Protein (LTP).  相似文献   

8.
Motivation: Mass spectrometry (MS), such as the surface-enhancedlaser desorption and ionization time-of-flight (SELDI-TOF) MS,provides a potentially promising proteomic technology for biomarkerdiscovery. An important matter for such a technology to be usedroutinely is its reproducibility. It is of significant interestto develop quantitative measures to evaluate the quality andreliability of different experimental methods. Results: We compare the quality of SELDI-TOF MS data using unfractionated,fractionated plasma samples and abundant protein depletion methodsin terms of the numbers of detected peaks and reliability. Severalstatistical quality-control and quality-assessment techniquesare proposed, including the Graeco–Latin square designfor the sample allocation on a Protein chip, the use of thepairwise Pearson correlation coefficient as the similarity measurebetween the spectra in conjunction with multi-dimensional scaling(MDS) for graphically evaluating similarity of replicates andassessing outlier samples; and the use of the reliability ratiofor evaluating reproducibility. Our results show that the numberof peaks detected is similar among the three sample preparationtechnologies, and the use of the Sigma multi-removal kit doesnot improve peak detection. Fractionation of plasma samplesintroduces more experimental variability. The peaks detectedusing the unfractionated plasma samples have the highest reproducibilityas determined by the reliability ratio. Availability: Our algorithm for assessment of SELDI-TOF experimentquality is available at http://www.biostat.harvard.edu/~xlin Contact: harezlak{at}post.harvard.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Thomas Lengauer  相似文献   

9.
MOTIVATION: Surface-enhanced laser desorption and ionization (SELDI) time of flight (TOF) is a mass spectrometry technology. The key features in a mass spectrum are its peaks. In order to locate the peaks and quantify their intensities, several pre-processing steps are required. Though different approaches to perform pre-processing have been proposed, there is no systematic study that compares their performance. RESULTS: In this article, we present the results of a systematic comparison of various popular packages for pre-processing of SELDI-TOF data. We evaluate their performance in terms of two of their primary functions: peak detection and peak quantification. Regarding peak quantification, the performance of the algorithms is measured in terms of reproducibility. For peak detection, the comparison is based on sensitivity and false discovery rate. Our results show that for spectra generated with low laser intensity, the software developed by Ciphergen Biosystems (ProteinChip Software 3.1 with the additional tool Biomarker Wizard) produces relatively good results for both peak quantification and detection. On the other hand, for the data produced with either medium or high laser intensity, none of the methods show uniformly better performances under both criteria. Our analysis suggests that an advantageous combination is the use of the packages MassSpecWavelet and PROcess, the former for peak detection and the latter for peak quantification.  相似文献   

10.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

11.
Surface-enhanced laser desorption/ionization (SELDI) time of flight (TOF) is a mass spectrometry technology for measuring the composition of a sampled protein mixture. A mass spectrum contains peaks corresponding to proteins in the sample. The peak areas are proportional to the measured concentrations of the corresponding proteins. Quantifying peak areas is difficult for existing methods because peak shapes are not constant across a spectrum and because peaks often overlap. We present a new method for quantifying peak areas. Our method decomposes a spectrum into peaks and a baseline using so-called statistical finite mixture models. We illustrate our method in detail on 8 samples from culture media of adipose tissue and globally on 64 samples from serum to compare our method to the standard Ciphergen method. Both methods give similar estimates for singleton peaks, but not for overlapping peaks. The Ciphergen method overestimates the heights of such peaks while our method still gives appropriate estimates. Peak quantification is an important step in pre-processing SELDI-TOF data and improvements therein will pay off in the later biomarker discovery phase.  相似文献   

12.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

13.
Magnetic anomaly detection (MAD) is a passive approach for detection of a ferromagnetic target, and its performance is often limited by external noises. In consideration of one major noise source is the fractal noise (or called 1/f noise) with a power spectral density of 1/fa (0<a<2), which is non-stationary, self-similarity and long-range correlation. Meanwhile the orthonormal wavelet decomposition can play the role of a Karhunen-Loève-type expansion to the 1/f-type signal by its decorrelation abilities, an effective energy detection method based on undecimated discrete wavelet transform (UDWT) is proposed in this paper. Firstly, the foundations of magnetic anomaly detection and UDWT are introduced in brief, while a possible detection system based on giant magneto-impedance (GMI) magnetic sensor is also given out. Then our proposed energy detection based on UDWT is described in detail, and the probabilities of false alarm and detection for given the detection threshold in theory are presented. It is noticeable that no a priori assumptions regarding the ferromagnetic target or the magnetic noise probability are necessary for our method, and different from the discrete wavelet transform (DWT), the UDWT is shift invariant. Finally, some simulations are performed and the results show that the detection performance of our proposed detector is better than that of the conventional energy detector even utilized in the Gaussian white noise, especially when the spectral parameter α is less than 1.0. In addition, a real-world experiment was done to demonstrate the advantages of the proposed method.  相似文献   

14.
15.
Assignment of physical meaning to mass spectrometry (MS) data peaks is an important scientific challenge for metabolomics investigators. Improvements in instrumental mass accuracy reduce the number of spurious database matches, however, this alone is insufficient for accurate, unique high-throughput assignment. We present a method for clustering MS instrumental artifacts and a stochastic local search algorithm for the automated assignment of large, complex MS-based metabolomic datasets. Artifact peaks and their associated source peaks are grouped into “instrumental clusters.” Instrumental clusters, peaks grouped together by shared peak shape in the temporal domain, serve as a guide for the number of assignments necessary to completely explain a given dataset. We refine mass only assignments through the intersection of peak correlation pairs with a database of biochemically relevant interaction pairs. Further refinement is achieved through a stochastic local search optimization algorithm that selects individual assignments for each instrumental cluster. The algorithm works by choosing the peak assignment that maximally explains the connectivity of a given cluster. We demonstrate that this methodology provides a significant advantage over standard methods for the assignment of metabolites in a UPLC-MS diabetes dataset. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
SELDI-TOF mass spectrometer''s compact size and automated, high throughput design have been attractive to clinical researchers, and the platform has seen steady-use in biomarker studies. Despite new algorithms and preprocessing pipelines that have been developed to address reproducibility issues, visual inspection of the results of SELDI spectra preprocessing by the best algorithms still shows miscalled peaks and systematic sources of error. This suggests that there continues to be problems with SELDI preprocessing. In this work, we study the preprocessing of SELDI in detail and introduce improvements. While many algorithms, including the vendor supplied software, can identify peak clusters of specific mass (or m/z) in groups of spectra with high specificity and low false discover rate (FDR), the algorithms tend to underperform estimating the exact prevalence and intensity of peaks in those clusters. Thus group differences that at first appear very strong are shown, after careful and laborious hand inspection of the spectra, to be less than significant. Here we introduce a wavelet/neural network based algorithm which mimics what a team of expert, human users would call for peaks in each of several hundred spectra in a typical SELDI clinical study. The wavelet denoising part of the algorithm optimally smoothes the signal in each spectrum according to an improved suite of signal processing algorithms previously reported (the LibSELDI toolbox under development). The neural network part of the algorithm combines those results with the raw signal and a training dataset of expertly called peaks, to call peaks in a test set of spectra with approximately 95% accuracy. The new method was applied to data collected from a study of cervical mucus for the early detection of cervical cancer in HPV infected women. The method shows promise in addressing the ongoing SELDI reproducibility issues.  相似文献   

17.
Summary A novel algorithm for removing baseline distortions in NMR spectra is presented. The algorithm approximates the baseline as the median of the noise extrema. Consequently, the method does not require that NMR peaks be discriminated from noise peaks. In addition, no assumptions regarding the source or functional form of the distortion are made. The algorithm is shown to remove the baseline artifacts present in a particularly distorted NOESY spectrum and to reveal peaks which had been obscured by the artifacts. The parameters and spectral characteristics (signal-to-noise ratio, NMR peak density, peak linewidths) governing the resolution of the calculated baselines are also explored.  相似文献   

18.
Mass spectrometry data from high-resolution time-of-flight instruments often contain a vast number of noninformative background-ion peaks whose signal is similar to that of peptide peaks. Consequently, seeking peptide signal in these spectra based on a signal-to-noise ratio will remove signal peaks as well as noise. This work characterizes the background as a precursor to seeking peptide-related features. Robust-regression methods are used to estimate distributions for null (background) peak intensities and locations. Defining signal peaks as outliers with respect to these distributions leads to more precision in detecting the isotopic envelope of peaks from low-abundance peptides in high-resolution spectra.  相似文献   

19.
This paper addresses the possibility of mathematically partition and process urine 1H-NMR spectra to enhance the efficiency of the subsequent multivariate data analysis in the context of metabolic profiling of a toxicity study. We show that by processing the NMR data with the peak alignment using reduced set mapping (PARS) algorithm and the use of sparse representation of the data results in the information contained in the original NMR data being preserved with retained resolution but free of the problem of peak shifts. We can now describe a method for differential expression analysis of NMR spectra by using prior knowledge, i.e., the onset of dosing, a partitioning not possible to achieve using raw or bucketed data. In addition we also outline a scheme for soft removal of “biological noise” from the aligned data: exhaustive bio-noise subtraction (EBS). The result is a straightforward protocol for detection of peaks that appear as a consequence of the drug response. In other words, it is possible to elucidate peak origin, either from endogenous substances or from the administered drug/biomarkers. The partition of data originating from the normally regulating metabolome can, furthermore, be analyzed free of the superimposed biological noise. The proposed protocol results in enhanced interpretability of the processed data, i.e., a more refined metabolic trace, simplification of detection of consistent biomarkers, and a simplified search for metabolic end products of the administered drug.  相似文献   

20.

Motivation

Mass spectrometry is a high throughput, fast, and accurate method of protein analysis. Using the peaks detected in spectra, we can compare a normal group with a disease group. However, the spectrum is complicated by scale shifting and is also full of noise. Such shifting makes the spectra non-stationary and need to align before comparison. Consequently, the preprocessing of the mass data plays an important role during the analysis process. Noises in mass spectrometry data come in lots of different aspects and frequencies. A powerful data preprocessing method is needed for removing large amount of noises in mass spectrometry data.

Results

Hilbert-Huang Transformation is a non-stationary transformation used in signal processing. We provide a novel algorithm for preprocessing that can deal with MALDI and SELDI spectra. We use the Hilbert-Huang Transformation to decompose the spectrum and filter-out the very high frequencies and very low frequencies signal. We think the noise in mass spectrometry comes from many sources and some of the noises can be removed by analysis of signal frequence domain. Since the protein in the spectrum is expected to be a unique peak, its frequence domain should be in the middle part of frequence domain and will not be removed. The results show that HHT, when used for preprocessing, is generally better than other preprocessing methods. The approach not only is able to detect peaks successfully, but HHT has the advantage of denoising spectra efficiently, especially when the data is complex. The drawback of HHT is that this approach takes much longer for the processing than the wavlet and traditional methods. However, the processing time is still manageable and is worth the wait to obtain high quality data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号