首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Assignment of physical meaning to mass spectrometry (MS) data peaks is an important scientific challenge for metabolomics investigators. Improvements in instrumental mass accuracy reduce the number of spurious database matches, however, this alone is insufficient for accurate, unique high-throughput assignment. We present a method for clustering MS instrumental artifacts and a stochastic local search algorithm for the automated assignment of large, complex MS-based metabolomic datasets. Artifact peaks and their associated source peaks are grouped into “instrumental clusters.” Instrumental clusters, peaks grouped together by shared peak shape in the temporal domain, serve as a guide for the number of assignments necessary to completely explain a given dataset. We refine mass only assignments through the intersection of peak correlation pairs with a database of biochemically relevant interaction pairs. Further refinement is achieved through a stochastic local search optimization algorithm that selects individual assignments for each instrumental cluster. The algorithm works by choosing the peak assignment that maximally explains the connectivity of a given cluster. We demonstrate that this methodology provides a significant advantage over standard methods for the assignment of metabolites in a UPLC-MS diabetes dataset. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

2.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

3.
The random forest classification method was applied to classify samples from 76 breast cancer patients and 77 controls whose proteomic profile had been obtained using mass spectrometry. The analysis consisted of two stages, the detection of peaks from the profiles and the construction of a classification rule using random forests. Using a peak detection method based on finding common local maxima in the smoothed sample spectra, 444 peaks were detected, reducing to 365 robust peaks found in at least 7 out of 10 random subsets of samples. Subjects were classified as cases or controls using the random forest algorithm applied to the 365 peaks. Based on the prediction of the status of out-of-bag samples, the total error rate was 16.3%, with a sensitivity of 81.6% and a specificity of 85.7%. Measures of importance of each of the peaks were calculated to identify regions of the spectrum influencing the classification, and the four most important peaks were identified as mz3863_13, mz2943_12, mz3193_44 and mz8925_94. Combining initial peak detection with the random forest algorithm provides a high-performance classification system for proteomic data, with unbiased estimates of future performance.  相似文献   

4.
Preanalytical variables play a key role in discovery of biomarkers. Although the effect of several preanalytical variables on the mass spectral profiles has been studied extensively, little is known about long-term storage of serum samples. This is important because samples used in case-control or epidemiological studies are usually stored for a long time before analysis. Here we evaluated long-term storage effects on mass spectral peak patterns of serum peptides extracted using weak cation exchange magnetic beads. For this, 20 serum samples stored at −80 °C were divided equally into two groups based on their storage time. We found that intensities of 26 mass spectral peaks significantly varied between these two groups. Intensities of these peaks significantly correlated with storage time. Genetic algorithm-based models generated using these 26 peaks could classify 63 additional samples into these two groups with 100% and 96% accuracy, respectively. We also show that storing samples for 10 months at −80 and −20 °C results in the appearance/disappearance or intensity variation of peaks, some of which were previously reported as disease biomarkers.  相似文献   

5.
Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventional database search approach to protein identification has inherent difficulties in comprehensive detection of protein modifications. In recent years, increasing efforts have been devoted to developing unrestrictive approaches to modification identification, but they often suffer from their lack of speed. This paper presents a statistical algorithm named DeltAMT (Delta Accurate Mass and Time) for fast detection of abundant protein modifications from tandem mass spectra with high-accuracy precursor masses. The algorithm is based on the fact that the modified and unmodified versions of a peptide are usually present simultaneously in a sample and their spectra are correlated with each other in precursor masses and retention times. By representing each pair of spectra as a delta mass and time vector, bivariate Gaussian mixture models are used to detect modification-related spectral pairs. Unlike previous approaches to unrestrictive modification identification that mainly rely upon the fragment information and the mass dimension in liquid chromatography-tandem mass spectrometry, the proposed algorithm makes the most of precursor information. Thus, it is highly efficient while being accurate and sensitive. On two published data sets, the algorithm effectively detected various modifications and other interesting events, yielding deep insights into the data. Based on these discoveries, the spectral identification rates were significantly increased and many modified peptides were identified.  相似文献   

6.
Three swamp buffalo bulls aged 1.5, 1.10 and 2 years were submitted to frequent blood sampling every 15 m during a period of 25 h using an indwelling infusion set. Plasma LH and testosterone were quantified by radioimmunoassay technique. The levels of the two hormones in each individual exhibited episodic and nonrhythmic patterns. The number of LH peaks varied according to individval, ranging from no peak in one bull to 2 in the other two bulls. The mean LH concentrations during the period of study for each bull were 0.74, 0.33 and 1.17 ng/ml. Whereas the number of testosterone peaks varied between 1-10 and the average testosterone concentrations were 0.1, 0.33 and 0.55 ng/ml for the younger to the older bulls respectively. The testosterone peaks related to the LH peaks in each individual bull.  相似文献   

7.
Flow cytometry analysis was applied to swine chromosomes prepared from phytohemagglutinin (PHA) stimulated peripheral blood lymphocytes. Flow karyotypes from both sexes and from t(3;7) translocation carrier females were obtained. A certain number of chromosome pairs could be assigned to various peaks. In fact, 13 peaks were observed for 18 autosomal pairs plus X and Y. Moreover, abnormalities owing to the t(3;7) translocation were readily observable. The number of base pairs for chromosomes associated with the various peaks was estimated by comparison with human flow karyotypes. The following four peaks were thus sorted: the peak assumed to represent the translocated chromosome 7 plus the normals associated with it; the corresponding peak from a normal swine; the peak assumed to contain among others the normal chromosome 7; and finally the peak corresponding to swine chromosome 1. Chromosomes of each peak were collected on Pall Biodyne membrane. Following appropriate denaturation and prehybridization, the four samples were hybridized with a human leucocyte antigen (HLA) class I 32P-labelled cDNA probe, representing most of the coding sequence of the HLA B7 gene. The results confirmed previous data from other techniques that assigned the swine MHC(SLA) to chromosome 7. Subsequently, sorted samples were hybridized with a porcine genomic Interferon alpha probe in order to confirm the mapping of this gene family on porcine chromosome 1.  相似文献   

8.
MOTIVATION: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatment-related morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the 'peak probability contrast' technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrix-assisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. RESULTS: Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data.  相似文献   

9.
Zou J  Hong G  Guo X  Zhang L  Yao C  Wang J  Guo Z 《PloS one》2011,6(10):e26294

Background

There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached.

Results

In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased.

Conclusions

Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.  相似文献   

10.
One of the challenges of using mass spectrometry for metabolomic analyses of samples consisting of thousands of compounds is that of peak identification and alignment. This paper addresses the issue of aligning mass spectral data from different samples in order to determine average component m/z peak values. The alignment scheme developed takes the instrument m/z measurement error into consideration in order to heuristically align two or more samples using a technique comparable to automated visual inspection and alignment. The results obtained using mass spectral profiles of replicate human urine samples suggest that this heuristic alignment approach is more efficient than other approaches using hierarchical clustering algorithms. The output consists of an average m/z and intensity value for the spectral components together with the number of matches from the different samples. One of the major advantages of using this alignment strategy is that it eliminates the boundary problem that occurs when using predetermined fixed bins to identify and combine peaks for averaging and the efficient runtime allows large datasets to be processed quickly.  相似文献   

11.
Systemic-onset juvenile idiopathic arthritis (SJIA) is a disease of unknown etiology with an unpredictable response to treatment. We examined two groups of patients to determine whether there are serum protein profiles reflective of active disease and predictive of response to therapy. The first group (n = 8) responded to conventional therapy. The second group (n = 15) responded to an experimental antibody to the IL-6 receptor (MRA). Paired sera from each patient were analyzed before and after treatment, using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). Despite the small number of patients, highly significant and consistent differences were observed before and after response to therapy in all patients. Of 282 spectral peaks identified, 23 had mean signal intensities significantly different (P < 0.001) before treatment and after response to treatment. The majority of these differences were observed regardless of whether patients responded to conventional therapy or to MRA. These peaks represent potential biomarkers of active disease. One such peak was identified as serum amyloid A, a known acute-phase reactant in SJIA, validating the SELDI-TOF MS platform as a useful technology in this context. Finally, profiles from serum samples obtained at the time of active disease were compared between the two patient groups. Nine peaks had mean signal intensities significantly different (P < 0.001) between active disease in patients who responded to conventional therapy and in patients who failed to respond, suggesting a possible profile predictive of response. Collectively, these data demonstrate the presence of serum proteomic profiles in SJIA that are reflective of active disease and suggest the feasibility of using the SELDI-TOF MS platform used as a tool for proteomic profiling and discovery of novel biomarkers in autoimmune diseases.  相似文献   

12.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.  相似文献   

13.
The two-dimensional data obtained from GC-MS has been used qualitatively and quantitatively to determine the components of the volatile fractions of Schisandra chinensis obtained by six different extraction methods. Sub-window factor analysis (SFA) was employed to confirm the identities of components determined in different samples. With the help of SFA, and other chemometric techniques, peak purity in the chromatograms was determined, and overlapping peaks were resolved to yield a pure chromatographic profile and mass spectrum for each component. It is demonstrated that the accuracy of qualitative and quantitative analysis may be greatly enhanced using chemometric resolution methods, such methods being particularly valuable with respect to the analysis of complex samples such as traditional Chinese medicines. It is further demonstrated that different extraction methods give rise to volatile fractions of S. chinensis which differ qualitatively and quantitatively in their composition.  相似文献   

14.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short‐time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut‐off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin‐bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease‐free patients to detect peaks with S/N≥2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.  相似文献   

15.
Probabilities for secondary cancer incidence have been estimated for a patient with Hodgkin's disease for whom treatment has been planned with different radiation modalities using photons and protons. The ICRP calculation scheme has been used to calculate cancer incidence from dose distributions. For this purpose, target volumes as well as critical structures have been outlined in the CT set of a patient with Hodgkin's disease. Dose distributions have been calculated using conventional as well as intensity-modulated treatment techniques using photon and proton radiation. The cancer incidence has been derived from the mean doses for each organ. The results of this work are: (a) Intensity-modulated treatment of Hodgkin's disease using nine photon fields (15 MV) results in nearly the same cancer incidence as treating with two opposed photon fields (6 MV). (b) Intensity-modulated treatment using nine proton fields (maximum energy 177.25 MeV) results in nearly the same cancer incidence as treating with one proton field (160 MeV). (c) Irradiation with protons using the spot scanning technique decreases the avoidable cancer incidence compared to photon treatment by a factor of about two. This result is independent of the number of beams used. Our work suggests that there are radiotherapy indications in which intensity-modulated treatments will result in little or no reduction of cancer incidence compared to conventional treatments. However, proton treatment can result in a lower cancer incidence than photon treatment.  相似文献   

16.
Cerebrospinal fluid (CSF) is a potential source of biomarkers for many disorders of the central nervous system, including Alzheimer disease (AD). Prior to comparing CSF samples between individuals to identify patterns of disease-associated proteins, it is important to examine variation within individuals over a short period of time so that one can better interpret potential changes in CSF between individuals as well as changes within a given individual over a longer time span. In this study, we analyzed 12 CSF samples, composed of pairs of samples from six individuals, obtained 2 weeks apart. Multiaffinity depletion, two-dimensional DIGE, and tandem mass spectrometry were used. A number of proteins whose abundance varied between the two time points was identified for each individual. Some of these proteins were commonly identified in multiple individuals. More importantly, despite the intraindividual variations, hierarchical clustering and multidimensional scaling analysis of the proteomic profiles revealed that two CSF samples from the same individual cluster the closest together and that the between-subject variability is much larger than the within-subject variability. Among the six subjects, comparison between the four cognitively normal and the two very mildly demented subjects also yielded some proteins that have been identified in previous AD biomarker studies. These results validate our method of identifying differences in proteomic profiles of CSF samples and have important implications for the design of CSF biomarker studies for AD and other central nervous system disorders.  相似文献   

17.
This paper presents computational methods to analyze MALDI-TOF mass spectrometry data for quantitative comparison of peptides and glycans in serum. The methods are applied to identify candidate biomarkers in serum samples of 203 participants from Egypt; 73 hepatocellular carcinoma (HCC) cases, 52 patients with chronic liver disease (CLD) consisting of cirrhosis and fibrosis cases, and 78 population controls. Two complementary sample preparation methods were applied prior to generating mass spectra: (1) low molecular weight (LMW) enrichment of each serum sample was carried out for MALDI-TOF quantification of peptides, and (2) glycans were enzymatically released from proteins in each serum sample and permethylated for MALDI-TOF quantification of glycans. A peak selection algorithm was applied to identify the most useful peptide and glycan peaks for accurate detection of HCC cases from high-risk population of patients with CLD. In addition to global peaks selected by the whole population based approach, where identically labeled patients are treated as a single group, subgroup-specific peaks were identified by searching for peaks that are differentially abundant in a subgroup of patients only. The peak selection process was preceded by peak screening, where we eliminated peaks that have significant association with covariates such as age, gender, and viral infection based on the peptide and glycan spectra from population controls. The performance of the selected peptide and glycan peaks was evaluated in terms of their ability in detecting HCC cases from patients with CLD in a blinded validation set and through the cross-validation method. Finally, we investigated the possibility of using both peptides and glycans in a panel to enhance the diagnostic capability of these candidate markers. Further evaluation is needed to examine the potential clinical utility of the candidate peptide and glycan markers identified in this study.  相似文献   

18.
Successful clinical development of cancer treatments is aided by the development of molecular markers that allow the identification of patients likely to respond. In the case of broadly cytotoxic drugs, such as the multinuclear series of platinum chemotherapeutic agents that we are evaluating for the treatment of glioma, one route to marker identification is proteomic profiling. We are using the two-dimensional chromatography system, the ProteomeLab PF2D, to compare proteomic profiles of glioma cells in culture before and after drug treatment. The existing software tools allowed the rapid identification of peaks increased by treatment of a given drug as compared with control untreated cells. To compare across these pairs, we developed new software, called the MetaComparison Tool (MCT). The MCT uses the chromatographic characteristics of peaks as identifiers, an approach that was validated by mass spectrometry of two independent isolations of a peak, from cells that were treated with two different platinum compounds. The MCT made it possible to rapidly query whether a given peak responded to more than one treatment and so allowed the identification of peaks that were specific to a given drug. As a result, this analysis greatly reduced the list of peaks whose isolation and downstream analysis by mass spectrometry is warranted, accelerating the search for protein markers of response.  相似文献   

19.
A method for identification and quantitation of insect juvenile hormones (JH) has been developed using capillary gas chromatography-chemical ionization (isobutane)-ion-trap mass spectroscopy. The method does not require derivatization of samples or use of selected ion monitoring. Analysis over a mass range of 60-350 u allowed for identification of as little as 0.01 pmol of individual JH homologs. Quantitative analysis was based on the ion intensities of six diagnostic ions and the summed intensities of these ions for each homolog. The ratio of diagnostic ions did not vary significantly over a range of concentrations from 2.7 to 200 pg. The technique was used to identify and quantify the amounts of JH homologs secreted by individual retrocerebral complexes from the moth Manduca sexta maintained in tissue culture and to identify JH III from hexane extracts of hemolymph of the Caribbean fruit fly. No discrimination due to disparate abundance ratios of the individual homologs was found when analyzing natural product samples differing in concentration by at least fivefold. The technique allows for facile, concrete identification and quantitation of biologically relevant amounts of JH. The ability to analyze samples without derivatization or fractionation by chromatographic methods, coupled with data acquisition over a broad mass range, provides levels of accuracy and confidence greater than those of other methods.  相似文献   

20.
A high-throughput software pipeline for analyzing high-performance mass spectral data sets has been developed to facilitate rapid and accurate biomarker determination. The software exploits the mass precision and resolution of high-performance instrumentation, bypasses peak-finding steps, and instead uses discrete m/z data points to identify putative biomarkers. The technique is insensitive to peak shape, and works on overlapping and non-Gaussian peaks which can confound peak-finding algorithms. Methods are presented to assess data set quality and the suitability of groups of m/z values that map to peaks as potential biomarkers. The algorithm is demonstrated with serum mass spectra from patients with and without ovarian cancer. Biomarker candidates are identified and ranked by their ability to discriminate between cancer and noncancer conditions. Their discriminating power is tested by classifying unknowns using a simple distance calculation, and a sensitivity of 95.6% and a specificity of 97.1% are obtained. In contrast, the sensitivity of the ovarian cancer blood marker CA125 is approximately 50% for stage I/II and approximately 80% for stage III/IV cancers. While the generalizability of these markers is currently unknown, we have demonstrated the ability of our analytical package to extract biomarker candidates from high-performance mass spectral data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号