首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Time-Of-Flight Mass Spectrometry (TOF-SIMS) was used to determine elemental and biomolecular ions from isolated protein samples. We identified a set of 23 mass-to-charge ratio (m/z) peaks that represent signatures for distinguishing biological samples. The 23 peaks were identified by Singular Value Decomposition (SVD) and Canonical Analysis (CA) to find the underlying structure in the complex mass-spectra data sets. From this modified data, SVD was used to identify sets of m/z peaks, and we used these patterns from the TOF-SIMS data to predict the biological source from which individual mass spectra were generated. The signatures were validated using an additional data set different from the initial training set used to identify the signatures. We present a simple method to identify multiple variables required for sample classification based on mass spectra that avoids overfit. This is important in a variety of studies using mass spectrometry, including the ability to identify proteins in complex mixtures and for the identification of new biomarkers.  相似文献   

2.
Hexadecadien-1-ol and the derivatives (acetate and aldehyde) with a conjugated diene system have recently been identified from a pheromone gland extract of the persimmon fruit moth (Stathmopoda masinissa), a pest insect of persimmon fruits distributed in East Asia. The alcohol and acetate showed their base peaks at m/z 79 in a GC-MS analysis by electron impact ionization, but the aldehyde produced a unique base peak at m/z 84, suggesting a 4,6-diene structure. To confirm this inference, four geometrical isomers of each 4,6-hexadecadienyl compound were synthesized by two different routes in which one of two double bonds was furnished in a highly stereoselective manner. Separation of the two isomers synthesized together by each route was facilely accomplished by preparative HPLC. Their mass spectra coincided well with those of natural components, indicating that they were available for use as authentic standards for determining the configuration of the natural pheromone. Furthermore, other hexadecadienyl compounds, including the conjugated diene system between the 3- and 10-positions, were synthesized to accumulate the spectral data of pheromone candidates. 5,7-Hexadecadienal interestingly showed the base peak at m/z 80; meanwhile, the base peaks of its alcohol and acetate were detected at m/z 79 like the corresponding 4,6-dienes. The base peaks of all 6,8-, 7,9-, and 8,10-dienes universally appeared at m/z 67 like 9,11-, 10,12-, and 13,15-dienes, the spectra of which have already been published. Although 3,5-hexadecadienal was not prepared, base peaks at m/z 67 and 79 were recorded for the alcohol and acetate, respectively.  相似文献   

3.
A high-throughput software pipeline for analyzing high-performance mass spectral data sets has been developed to facilitate rapid and accurate biomarker determination. The software exploits the mass precision and resolution of high-performance instrumentation, bypasses peak-finding steps, and instead uses discrete m/z data points to identify putative biomarkers. The technique is insensitive to peak shape, and works on overlapping and non-Gaussian peaks which can confound peak-finding algorithms. Methods are presented to assess data set quality and the suitability of groups of m/z values that map to peaks as potential biomarkers. The algorithm is demonstrated with serum mass spectra from patients with and without ovarian cancer. Biomarker candidates are identified and ranked by their ability to discriminate between cancer and noncancer conditions. Their discriminating power is tested by classifying unknowns using a simple distance calculation, and a sensitivity of 95.6% and a specificity of 97.1% are obtained. In contrast, the sensitivity of the ovarian cancer blood marker CA125 is approximately 50% for stage I/II and approximately 80% for stage III/IV cancers. While the generalizability of these markers is currently unknown, we have demonstrated the ability of our analytical package to extract biomarker candidates from high-performance mass spectral data.  相似文献   

4.
Two chemical ionization mass spectrometric methods were developed for direct determination of deuterium in water in the range of 0.0-0.6% 2H2O. One of them utilizes the batch inlet system, methane as the reagent gas, and the peak matching device of a magnetic sector mass spectrometer. The second one utilizes the directly-coupled gas chromatograph of a quadrupole mass spectrometer and computer control for ion selection and data processing. In this method the water itself serves as the reagent gas. The deuterium concentration is calculated from the ratio of ion intensities at m/z 20 (2HH2O+) and m/z 19 (H3O+). We have used these methods to determine total body water in 350 human subjects, which entailed making 900 measurements over a period of four years. Comparisons were made in 200 subjects of our results with those obtained by the creatinine method. No significant differences were found.  相似文献   

5.
Peak detection is a key step in the analysis of SELDI-TOF-MS spectra, but the current default method has low specificity and poor peak annotation. To improve data quality, scientists still have to validate the identified peaks visually, a tedious and time-consuming process, especially for large data sets. Hence, there is a genuine need for methods that minimize manual validation. We have previously reported a multi-spectral signal detection method, called RS for 'region of significance', with improved specificity. Here we extend it to include a peak quantification algorithm based on annotated regions of significance (ARS). For each spectral region flagged as significant by RS, we first identify a dominant spectrum for determining the number of peaks and the m/z region of these peaks. From each m/z region of peaks, a peak template is extracted from all spectra via the principal component analysis. Finally, with the template, we estimate the amplitude and location of the peak in each spectrum with the least-squares method and refine the estimation of the amplitude via the mixture model.We have evaluated the ARS algorithm on patient samples from a clinical study. Comparison with the standard method shows that ARS (i) inherits the superior specificity of RS, and (ii) gives more accurate peak annotations than the standard method. In conclusion, we find that ARS alleviates the main problems in the preprocessing of SELDI-TOF spectra. The R-package ProSpect that implements ARS is freely available for academic use at http://www.meb.ki.se/ yudpaw.  相似文献   

6.
Serum protein profiling by mass spectrometry has achieved attention as a promising technology in oncoproteomics. We performed a systematic review of published reports on protein profiling as a diagnostic tool for breast cancer. The MEDLINE, EMBASE, and COCHRANE databases were searched for original studies reporting discriminatory protein peaks for breast cancer as either protein identity or as m/ z values in the period from January 1995 to October 2006. To address the important aspect of reproducibility of mass spectrometry data across different clinical studies, we compared the published lists of potential discriminatory peaks with those peaks detected in an original MALDI MS protein profiling study performed by our own research group. A total of 20 protein/peptide profiling studies were eligible for inclusion in the systematic review. Only 3 reports included information on protein identity. Although the studies revealed a considerable heterogeneity in relation to experimental design, biological variation, preanalytical conditions, methods of computational data analysis, and analytical reproducibility of profiles, we found that 45% of peaks previously reported to correlate with breast cancer were also detected in our experimental study. Furthermore, 25% of these redetected peaks also showed a significant difference between cases and controls in our study. Thus, despite known problems related to reproducibility, we were able to demonstrate overlap in peaks between clinical studies indicating some convergence toward a set of common discriminating, reproducible peaks for breast cancer. These peaks should be further characterized for identification of the protein identity and validated as biomarkers for breast cancer.  相似文献   

7.
For our analysis of the data from the First Annual Proteomics Data Mining Conference, we attempted to discriminate between 24 disease spectra (group A) and 17 normal spectra (group B). First, we processed the raw spectra by (i) correcting for additive sinusoidal noise (periodic on the time scale) affecting most spectra, (ii) correcting for the overall baseline level, (iii) normalizing, (iv) recombining fractions, and (v) using variable-width windows for data reduction. Also, we identified a set of polymeric peaks (at multiples of 180.6 Da) that is present in several normal spectra (B1-B8). After data processing, we found the intensities at the following mass to charge (m/z) values to be useful discriminators: 3077, 12 886 and 74 263. Using these values, we were able to achieve an overall classification accuracy of 38/41 (92.6%). Perfect classification could be achieved by adding two additional peaks, at 2476 and 6955. We identified these values by applying a genetic algorithm to a filtered list of m/z values using Mahalanobis distance between the group means as a fitness function.  相似文献   

8.
We have developed an automated procedure for aligning peaks in multiple TOF spectra that eliminates common timing errors and small variations in spectrometer output. Our method incorporates high-resolution peak detection, re-binning, and robust linear data fitting in the time domain. This procedure aligns label-free (uncalibrated) peaks to minimize the variation in each peak's location from one spectrum to the next, while maintaining a high number of degrees of freedom. We apply our method to replicate pooled-serum spectra from multiple laboratories and increase peak precision (t/sigma(t)) to values limited only by small random errors (with sigma(t) less than one time count in 89 out of 91 instances, 13 peaks in seven datasets). The resulting high precision allowed for an order of magnitude improvement in peak m/z reproducibility. We show that the CV for m/z is 0.01% (100 ppm) for 12 out of the 13 peaks that were observed in all datasets between 2995 and 9297 Da.  相似文献   

9.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.  相似文献   

10.
11.
The plasma peptide component (PPC) from ten melanoma (Mel), breast cancer (BC) and healthy individuals was examined by a combination of RP-HPLC, surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) and tandem mass spectrometry. A three peak pattern (2023, 2039, 2053.5 m/z) was primarily observed in melanoma. Two peaks (2236.1 and of 2356.3 m/z) were found only in BC samples. Fibrinogen alpha and inter-alpha-trypsin inhibitor heavy chain H4 fragments were absent in both tumor samples.  相似文献   

12.
MOTIVATION: Independent component analysis (ICA) is a signal processing technique that can be utilized to recover independent signals from a set of their linear mixtures. We propose ICA for the analysis of signals obtained from large proteomics investigations such as clinical multi-subject studies based on MALDI-TOF MS profiling. The method is validated on simulated and experimental data for demonstrating its capability of correctly extracting protein profiles from MALDI-TOF mass spectra. RESULTS: The comparison on peak detection with an open-source and two commercial methods shows its superior reliability in reducing the false discovery rate of protein peak masses. Moreover, the integration of ICA and statistical tests for detecting the differences in peak intensities between experimental groups allows to identify protein peaks that could be indicators of a diseased state. This data-driven approach demonstrates to be a promising tool for biomarker-discovery studies based on MALDI-TOF MS technology. AVAILABILITY: The MATLAB implementation of the method described in the article and both simulated and experimental data are freely available at http://www.unich.it/proteomica/bioinf/.  相似文献   

13.
A peak is a pair of real values (x,y), where x is the time when peak of height y is registered. In the peak alignment problem, we are given two sequences of peaks, and our task is to align the sequences allowing some basic edit operations on the peaks. We study an instance of the peak alignment problem that arises in the analysis of Mass Spectrometry data in Systems Biology. There the measurement technique guarantees that two peaks (x,y), (x',y') can only be considered the same if x is close enough to x', and y is close enough to y'. We review some methods to do alignment under such restrictions on matches.  相似文献   

14.
Scherl A  Tsai YS  Shaffer SA  Goodlett DR 《Proteomics》2008,8(14):2791-2797
Although mass spectrometers are capable of providing high mass accuracy data, assignment of true monoisotopic precursor ion mass is complicated during data-dependent ion selection for LC-MS/MS analysis of complex mixtures. The complication arises when chromatographic peak widths for a given analyte exceed the time required to acquire a precursor ion mass spectrum. The result is that many measured monoisotopic masses are misassigned due to calculation from a single mass spectrum with poor ion statistics based on only a fraction of the total available ions for a given analyte. Such data in turn produces errors in automated database searches, where precursor m/z value is one search parameter. We propose here a postacquisition approach to correct misassigned monoisotopic m/z values that involves peak detection over the entire elution profile and correction of the precursor ion monoisotopic mass. As a result of using this approach to reprocess shotgun proteomic data we increased peptide sequence assignments by 10% while reducing the estimated false positive ratio from 1 to 0.2%. We also show that 4% of the salvaged identifications may be accounted for by correction of mixed tandem mass spectra resulting from fragmentation of multiple peptides simultaneously, a situation which we refer to as accidental CID.  相似文献   

15.
We derive the optimal number of peaks (defined as the minimum number that provides the required efficiency of spectra identification) in the theoretical spectra as a function of (i) the experimental accuracy, sigma, of the measured ratio m/z; (ii) experimental spectrum density; (iii) size of the database; (iv) number of peaks in the theoretical spectra; and (v) types of ions that the peaks represent. We show that if theoretical spectra are constructed including b and y ions alone, then for sigma = 0.5, which is typical for high-throughput data, peptide chains of eight amino acids or longer can be identified based on the positions of peaks alone, at a rate of false identification below 1%. To discriminate between shorter peptides, additional (e.g., intensity-inferred) information is necessary. We derive the dependence of the probability of false identification on the number of peaks in the theoretical spectra and on the types of ions that the peaks represent. Our results suggest that the class of mass spectrum identification problems, for which more elaborate development of fragmentation rules (such as intensity model) is required, can be reduced to the problems that involve homologous peptides.  相似文献   

16.
AIM: Application of MALDI-TOF MS for characterization of strains of Salmonella enterica subsp. enterica. METHODS AND RESULTS: Whole cells were analysed by MALDI-TOF MS. Spectra with a maximum of 500 mass peaks between (m/z) 0 and 25000 were examined for consensus peaks manually and by a computer software algorithm. Consensus peaks were observed by both methods for spectra of Salmonella enterica serovars Derby, Hadar, Virchow, Anatum, Typhimurium and Enteritidis. CONCLUSIONS: Differences in numbers of consensus peaks in spectra obtained by manual and computer comparison indicated that development of the software involving statistical analysis of peak accuracy is necessary. SIGNIFICANCE AND IMPACT OF THE STUDY: Development of an analysis system for peak profiles in whole cell MALDI-TOF MS spectra to enable intra and interlaboratory comparison.  相似文献   

17.
Novel multi-hydroxylated primary fatty amides produced by direct amidation of 7,10-dihydroxy-8(E)-octadecenoic acid and 7,10,12-trihydroxy-8(E)-octadecenoic acid were characterized by GC-MS and NMR. The amidation reactions were catalyzed by immobilized Pseudozyma (Candida) antarctica lipase B (Novozym 435) in organic solvent with ammonium carbamate. The mass spectra of the underivatized products exhibited characteristic primary amide peaks at m/z 59 and m/z 72 that differed in peak intensities. Other peaks present were consistent with cleavage next to the hydroxyl groups. The mass spectra of the silylated amidation products showed the correct molecular weight and the typical fragmentation pattern of silylated hydroxy compounds. The mass spectra, together with proton and 13C NMR data, suggest that the products of lipase-catalyzed direct amidation of 7,10-dihydroxy-8(E)-octadecenoic acid and 7,10,12-trihydroxy-8(E)-octadecenoic acid are, 7,10-dihydroxy-8(E)-octadecenamide and 7,10,12-trihydroxy-8(E)-octadecenamide acid, respectively. Amidation of multi-hydroxylated fatty acids had increased the melting point, but reduced the surface active property of the resulting primary amides.  相似文献   

18.
Protein profiling in blood serum by fractionation and MS analysis has been applied in mice to assess its applicability as a fast, economical alternative to current DNA and RNA analyses for diagnosis of neuromuscular disorders. Mass spectra of peptides and proteins were generated using serum from dystrophin-deficient mdx and control mice by WCX ClinProt bead fractionation, followed by MALDI-MS. Double cross-validatory linear discriminant and logistic regression data analysis methods were compared with a new Bayesian logistic regression method. These were evaluated on their ability to discriminate between healthy and dystrophic samples, and to identify the discriminatory peaks in the mass spectra. All three approaches classified the spectra with comparable misclassification rates (between 18.4 and 20.6%), with much overlap between the differential peaks identified between the methods. The differential peak pattern from the Bayesian method was sparser and easier to interpret than from the other two methods, without compromising classifying strength. One of the two main differentiating peaks at m/z 3908 was identified as an N-terminal peptide of coagulation Factor XIIIa, previously identified in human serum. This work underlines the translational aspect of serum protein profiling in mice and supports a further study with serum from patients with neuromuscular disorders.  相似文献   

19.
A ganglioside of unknown structure (ganglioside X) was purified from chicken brain at embryonic day 12 (E12) and characterized for its structure. Ganglioside X was reactive with a monoclonal antibody A2B5 and migrated below GH1c on thin-layer chromatography (TLC). Extensive treatment of ganglioside X with Clostridium perfringens sialidase produced a single ganglioside product. This ganglioside was identified as GM1 based upon its chromatographic mobility and reactivity to cholera toxin B subunit and anti-GM1 antibody. Partial hydrolysis of ganglioside X by sialidase generated several degradation products including GH1c, GP1c, and GQ1c. Electrospray ionization (ESI)-mass spectrometry (MS) of the permethylated derivative of ganglioside X produced a triple-charged parent ion peak at m/z 1355, which corresponded with the gangliotetraose oligosaccharide structure having seven sialic acids and ceramide with the molecular mass of 566 (as non-methylated form). Collision-induced dissociation (CID)-MS(2) showed fragment ions including those at m/z 1066 and 1931; these two ions matched the structures of (NeuAc)(3)-Gal-Glc-Cer and (NeuAc)(4)-Gal-GalNAc, respectively. These structures were confirmed by CID-MS(3) of the corresponding peaks. Based upon these findings, the structure of ganglioside X was identified as NeuAc-NeuAc-NeuAc-NeuAc-Galbeta1-3GalNAcbeta1-4(NeuAc-NeuAc-NeuAcalpha2-3)Galbeta1-4Glcbeta1-1'Cer. This ganglioside was designated as GS1c. A developmental study demonstrated that GS1c was expressed in chicken brain during a period from E6 to E13 and thereafter decreased rapidly in its concentration. The present study suggests that GS1c may play a specific role in early development of chicken brain.  相似文献   

20.
Markey MK  Tourassi GD  Floyd CE 《Proteomics》2003,3(9):1678-1679
A classification and regression tree (CART) model was trained to classify 41 clinical specimens as disease/nondisease based on 26 variables computed from the mass-to-charge ratio (m/z) and peak heights of proteins identified by mass spectroscopy. The CART model built on all of the specimens (no cross-validation) had an error rate of 4/41 = 10%. The CART model suggests that mass spectra peaks in the 8000-10,000, 20,000-30,000, 45,000-60, 000, and >125,000 m/z ranges may be valuable in distinguishing between the disease/nondisease specimens. The area under the receiver operating characteristics curve was 0.80 +/- 0.07 for leave-one-out cross-validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号