首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wagner C  Sefkow M  Kopka J 《Phytochemistry》2003,62(6):887-900
The non-supervised construction of a mass spectral and retention time index data base (MS/RI library) from a set of plant metabolic profiles covering major organs of potato (Solanum tuberosum), tobacco (Nicotiana tabaccum), and Arabidopsis thaliana, was demonstrated. Typically 300-500 mass spectral components with a signal to noise ratio > or =75 were obtained from GC/EI-time-of-flight (TOF)-MS metabolite profiles of methoxyaminated and trimethylsilylated extracts. Profiles from non-sample controls contained approximately 100 mass spectral components. A MS/RI library of 6205 mass spectral components was accumulated and applied to automated identification of the model compounds galactonic acid, a primary metabolite, and 3-caffeoylquinic acid, a secondary metabolite. Neither MS nor RI alone were sufficient for unequivocal identification of unknown mass spectral components. However library searches with single bait mass spectra of the respective reference substance allowed clear identification by mass spectral match and RI window. Moreover, the hit lists of mass spectral searches were demonstrated to comprise candidate components of highly similar chemical nature. The search for the model compound galactonic acid allowed identification of gluconic and gulonic acid among the top scoring mass spectral components. Equally successful was the exemplary search for 3-caffeoylquinic acid, which led to the identification of quinic acid and of the positional isomers, 4-caffeoylquinic acid, 5-caffeoylquinic acid among other still non-identified conjugates of caffeic and quinic acid. All identifications were verified by co-analysis of reference substances. Finally we applied hierarchical clustering to a complete set of pair-wise mass spectral comparisons of unknown components and reference substances with known chemical structure. We demonstrated that the resulting clustering tree depicted the chemical nature of the reference substances and that most of the nearest neighbours represented either identical components, as judged by co-elution, or conformational isomers exhibiting differential retention behaviour. Unknown components could be classified automatically by grouping with the respective branches and sub-branches of the clustering tree.  相似文献   

2.
HPLC coupled with normal phase electron ionisation (EI) and atmospheric pressure chemical ionisation (APCI)/ mass spectrometry methods has been applied to identify 17 known neutral limonoid aglycones from Citrus sources. The HPLC-MS data from the known limonoids provided chromatographic characteristics, APCI-derived molecular weight data and EI fragmentation data for each limonoid. EI fragmentation patterns for the limonoids were correlated with structural characteristics. The EI fragmentation patterns coupled with APCI-derived molecular weights were utilised as a potential method by which to discern the structural character of unknown citrus limonoids.  相似文献   

3.
A strategic method with high speed and sensitivity is outlined for the analysis of mucin-type oligosaccharide from the jelly coat of Xenopus laevis. The method relies primarily on mass spectrometric techniques, in this case matrix-assisted laser desorption/ionization Fourier-transform mass spectrometry (MALDI-FTMS) and collision-induced dissociation (CID). Separation with isolation of the oligosaccharides was streamlined to couple well with mass spectrometry allowing the rapid determination of all detectable components from both neutral and anionic species. Partial structures of anionic components, composed primarily of sulfate esters, were obtained with CID. For neutral species, a method that allowed the complete structural determination using mass spectrometry was used. The method builds on the structure of small number of known compounds to determine unknown structures from the same biological source. In this example, a small number of oligosaccharides, elucidated previously by NMR, were used to develop a set of substructural motifs that were characterized by CID. The presence of the motifs in the CID spectra were then used to determine the structures of unknown compounds that were in abundances too small for NMR analysis.  相似文献   

4.
In this paper, the possibility of using a multiple ionization mode approach of GC/MS was developed for the simultaneous hair testing of common drugs of abuse in Asia, including amphetamines (amphetamine, AP; methamphetamine, MA; methylenedioxy amphetamine, MDA; methylenedioxy methamphetamine, MDMA; methylenedioxy ethylamphetamine, MDEA), ketamine (ketamine, K; norketamine, NK), and opiates (morphine, MOR; codeine, COD; 6-acetylmorphine, 6-AM). This strategy integrated the characteristics of gas chromatography-mass spectrometry (GC-MS) using electron impact ionization (EI) and negative chemical ionization (NCI). Hair samples (25 mg) were washed, cut, and incubated overnight at 25 degrees C in methanol-trifluoroacetic acid (methanol-TFA). The samples were extracted by solid phase extraction (SPE) procedure, derivatized using heptafluorobutyric acid anhydride (HFBA) at 70 degrees C for 30 min, and the derivatives analyzed by GC-MS with EI and NCI. The limit of detection (LOD) with GC/EI-MS analysis obtained were 0.03 ng/mg for AP, MA, MDA, MDMA, and MDEA; 0.05 ng/mg for K, NK, MOR, and COD; and 0.08 ng/mg for 6-AM. The LOD of GC/NCI-MS analysis was much lower than GC/EI-MS analysis. The LOD obtained were 30 pg/mg for AP and MDA in GC/EI-MS and 2 pg/mg in GC/NCI-MS. Therefore, the sensitivity of AP and MDA in GC/NCI-MS was improved from 15-fold compared with EI. The sensitivity of AP, MA, MDA, MDMA, MDEA, MOR, and COD was improved from 15- to 60-fold compared with EI. In addition, the sensitivity of 6-AM increased 8-fold through selection of m/z 197 for the quantitative ion. Moreover, K and NK could dramatically improve their sensitivity at 200- and 2000-fold. The integration of GC/EI-MS and GC/NCI-MS can obtain the high sensitivity and complementary results of drugs of abuse in hair. Six hair samples from known drug abusers were examined by this new strategy. These results show that integrating the characteristics of GC/EI-MS and GC/NCI-MS were not only enhancement of the sensitivity but also avoid wrong results and wrong interpretations of correct results.  相似文献   

5.
The application of Gas Chromatography (GC)–Atmospheric Pressure Chemical Ionization (APCI)–Time-of-Flight Mass Spectrometry (TOF-MS) is presented for sterol analysis in human plasma. A commercial APCI interface was modified to ensure a well-defined humidity which is essential for controlled ionization. In the first step, optimization regarding flow rates of auxiliary gases was performed by using a mixture of model analytes. Secondly, the qualitative and quantitative analysis of sterols including oxysterols, cholesterol precursors, and plant sterols as trimethylsilyl-derivatives was successfully carried out. The characteristics of APCI together with the very good mass accuracy of TOF-MS data enable the reliable identification of relevant sterols in complex matrices. Linear calibration lines and plausible results for healthy volunteers and patients could be obtained whereas all mass signals were extracted with an extraction width of 20 ppm from the full mass data set. One advantage of high mass accuracy can be seen in the fact that from one recorded run any search for m/z can be performed.  相似文献   

6.
Eubacterial genomes have highly variable GC content (0.17-0.75) and the primary mechanism of such variability remains unknown. The place to look for is what actually catalyzes the synthesis of DNA, where DNA polymerase III is at the center stage, particularly one of its 10 subunits--the alpha subunit. According to the dimeric combination of alpha subunits, GC contents of eubacterial genomes were partitioned into three groups with distinct GC content variation spectra: dnaE1 (full-spectrum), dnaE2/dnaE1 (high-GC), and polC/dnaE3 (low-GC). Therefore, genomic GC content variability is believed to be governed primarily by the alpha subunit grouping of DNA polymerase III; it is of essence in genome composition analysis to take full account of such a grouping principle. Since horizontal gene transfer is very frequent among bacterial genomes, exceptions of the grouping scheme, a few percents of the total, are readily identifiable and should be excluded from in-depth analyses on nucleotide compositions.  相似文献   

7.

Background  

Metabolomic studies are targeted at identifying and quantifying all metabolites in a given biological context. Among the tools used for metabolomic research, mass spectrometry is one of the most powerful tools. However, metabolomics by mass spectrometry always reveals a high number of unknown compounds which complicate in depth mechanistic or biochemical understanding. In principle, mass spectrometry can be utilized within strategies of de novo structure elucidation of small molecules, starting with the computation of the elemental composition of an unknown metabolite using accurate masses with errors <5 ppm (parts per million). However even with very high mass accuracy (<1 ppm) many chemically possible formulae are obtained in higher mass regions. In automatic routines an additional orthogonal filter therefore needs to be applied in order to reduce the number of potential elemental compositions. This report demonstrates the necessity of isotope abundance information by mathematical confirmation of the concept.  相似文献   

8.

Background

In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed.

Methodology/Principal Findings

The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries.

Conclusions/Significance

High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data.  相似文献   

9.
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision‐induced dissociation (CID) higher energy collisional dissociation (HCD), electron‐capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full‐length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%.  相似文献   

10.
Tetramethylene disulfotetramine (tetramine) is a rodenticide associated with numerous poisonings was extracted and quantified in human urine using both gas chromatography/mass spectrometry (GC/MS) and GC/tandem mass spectrometry (MS/MS). 1200 μL samples were prepared using a 13C4-labeled internal standard, a 96-well format, and a polydivinyl-benzene solid phase extraction sorbent bed. Relative extraction recovery was greater than 80% at 100 ng/mL. Following extraction, samples were preconcentrated by evaporation at 60 °C, and reconstituted in 50 μL acetonitrile. One-microliter was injected in a splitless mode on both instruments similarly equipped with 30 m × 0.25 mm × 25 μm, 5% phenyl-methylpolysiloxane gas chromatography columns. A quantification ion and a confirmation ion (GC/MS) or analogous selected reaction monitoring transitions (GC/MS/MS) were integrated for all reported results. The method was characterized for precision (5.92–13.4%) and accuracy (96.4–111%) using tetramine-enriched human urine pools between 5 and 250 ng/mL. The method limit of detection was calculated to be 2.34 and 3.87 ng/mL for GC/MS and GC/MS/MS, respectively. A reference range of 100 unexposed human urine samples was analyzed for potential endogenous interferences on both instruments—none were detected. Based on previous literature values for tetramine poisonings, this urinary method should be suitable for measuring low, moderate, and severe tetramine exposures.  相似文献   

11.
More than 150 molecular species were detected in a single glycoconjugate fraction obtained from urine of a congenital disorders of glycosylation (CDG) patient by use of high-resolution FT-ICR MS. With respect to its high-mass accuracy and resolving power, FT-ICR MS represents an ideal tool for analysis of single components in complex glycoconjugate mixtures obtained from body fluids. The presence of overlapping nearly isobaric ionic species in glycoconjugate mixtures obtained from CDG patient's urine was postulated from fragmentation data of several precursor ions obtained by nanoESI Q-TOF CID. Their existence was confirmed by high-resolution/high-mass accuracy FT-ICR MS detection. High-resolution FT-ICR mass spectra can, therefore, be generally considered for glycoscreening of complex mixture samples in a single stage. From the accurate molecular ion mass determinations the composition of glycoconjugate species can be identified. Particular enhancement of identification is offered by computer-assisted calculations in combination with monosaccharide building block analysis, which can be extended by considerations of non-carbohydrate modifications, such as amino acids, phosphates and sulfates. Taking advantage of this strategy, the number of compositions assigned to mass peaks was significantly increased in a fraction obtained from urine by size exclusion and anion exchange chromatography.  相似文献   

12.
An overview is presented of gas chromatography/mass spectrometry (GC/MS) and liquid chromatography/mass spectrometry (LC/MS), the two major hyphenated techniques employed in metabolic profiling that complement direct 'fingerprinting' methods such as atmospheric pressure ionization (API) quadrupole time-of-flight MS, API Fourier transform MS, and NMR. In GC/MS, the analytes are normally derivatized prior to analysis in order to reduce their polarity and facilitate chromatographic separation. The electron ionization mass spectra obtained are reproducible and suitable for library matching, mass spectral collections being readily available. In LC/MS, derivatization and library matching are at an early stage of development and mini-reviews are provided. Chemical derivatization can dramatically increase the sensitivity and specificity of LC/MS methods for less polar compounds and provides additional structural information. The potential of derivatization for metabolic profiling in LC/MS is demonstrated by the enhanced analysis of plant extracts, including the potential to measure volatile acids such as formic acid, difficult to achieve by GC/MS. The important role of mass spectral library creation and usage in these techniques is discussed and illustrated by examples.  相似文献   

13.
Essential oils and hydrosols were extracted from rosemary harvested in different seasons, and the chemical compositions of volatile components in the two fractions were analyzed by gas chromatography–mass spectrometry (GC–MS). Enantiomers of some volatile components were also analyzed by enantioselective GC–MS. Classification of aroma components based on chemical groups revealed that essential oils contained high levels of monoterpene hydrocarbons but hydrosols did not. Furthermore, the enantiomeric ratios within some volatile components were different from each other; for example, only the (S)-form was observed for limonene and the (R)-form was dominant for verbenone. These indicate the importance of determining the enantiomer composition of volatile components for investigating the physiological and psychological effects on humans. Overall, enantiomeric ratios were determined by volatile components, with no difference between essential oils and hydrosols or between seasons.  相似文献   

14.
Fecal water is a complex mixture of various metabolites with a wide range of physicochemical properties and boiling points. The analytical method developed here provides a qualitative and quantitative gas chromatography/mass spectrometry (GC/MS) analysis, with high sensitivity and efficiency, coupled with derivatization of ethyl chloroformate in aqueous medium. The water/ethanol/pyridine ratio was optimized to 12:6:1, and a two-step derivatization with an initial pH regulation of 0.1 M sodium bicarbonate was developed. The deionized water exhibited better extraction efficiency for fecal water compounds than did acidified and alkalized water. Furthermore, more amino acids were extracted from frozen fecal samples than from fresh samples based on multivariate statistical analysis and univariate statistical validation on GC/MS data. Method validation by 34 reference standards and fecal water samples showed a correlation coefficient higher than 0.99 for each of the standards, and the limit of detection (LOD) was from 10 to 500 pg on-column for most of the standards. The analytical equipment exhibited excellent repeatability, with the relative standard deviation (RSD) lower than 4% for standards and lower than 7% for fecal water. The derivatization method also demonstrated good repeatability, with the RSD lower than 6.4% for standards (except 3,4-dihydroxyphenylacetic acid) and lower than 10% for fecal water (except dicarboxylic acids). The qualitative means by searching the electron impact (EI) mass spectral database, chemical ionization (CI) mass spectra validation, and reference standards comparison totally identified and structurally confirmed 73 compounds, and the fecal water compounds of healthy humans were also quantified. This protocol shows a promising application in metabolome analysis based on human fecal water samples.  相似文献   

15.
Fatty acid methyl ester analysis (FAME) by gas chromatography coupled to mass spectrometry (GC‐MS) is a widely used technique in biodiesel/bioproduct (e.g. poly‐unsaturated fatty acids, PUFA) research but typically does not allow distinguishing between bound and free fatty acids. To understand and optimize biosynthetic pathways, however, the origin of the fatty acid is an important information. Furthermore the annotation of PUFAs is compromised in classical GC‐EI‐MS because the precursor molecular ion is missing. In the present protocol an alkaline methyl esterification step with TMS derivatization enabling the simultaneous analysis of bound and free fatty acids but also further lipids such as sterols in one GC‐MS chromatogram is combined. This protocol is applied to different lipid extracts from single cell algae to higher plants: Chlorella vulgaris, Chlamydomonas reinhardtii, Coffea arabica, Pisum sativum and Cuscuta japonica. Further, field ionization (GC‐FI‐MS) is introduced for a better annotation of fatty acids and exact determination of the number of double bonds in PUFAs. The proposed workflow provides a convenient strategy to analyze algae and other plant crop systems with respect to their capacity for third generation biodiesel and high‐quality bioproducts for nutrition such as PUFAs.  相似文献   

16.
Thymus caespititius Brot. is an important aromatic species, due to synthesis and production of essential oils for the pharmaceutical and food industries. In the present study, levels of essential oils from two chemotypes, including carvacrol/thymol (CT) and sabinene/carvacrol (SC), were evaluated in proliferating shoot cultures (6–12 subcultures following establishment) and compared to those from field-grown plants. The essential oils were isolated by hydrodistillation and analysed by gas chromatography (GC) and GC–mass spectrometry (GC–MS). Cultures grown under in vitro culture conditions, evaluated over six subcultures, were found to maintain stable composition of essential oils. For the CT chemotype, carvacrol (42 %) and thymol (23 %) were the main essential oil components detected in field-grown plants; in proliferating shoot cultures the levels detected attained 17–25 % in the case of carvacrol and 18–23 % in that of thymol, closely followed by carvacryl acetate (15–23 %) and thymyl acetate (11–15 %). For the SC chemotype, carvacrol (13–28 %), sabinene (18–45 %), and thymol (9–12 %) were the main essential oil components detected in both field-grown and proliferating shoot cultures. Our experiments showed that the essential oil composition in proliferating shoot cultures was not only stable, but also qualitatively similar to that of field-grown plants, notwithstanding minor quantitative differences.  相似文献   

17.
In this paper, an optimized protocol was established and validated for the metabonomic profiling in rat urine using GC/MS. The urine samples were extracted by methanol after treatment with urease to remove excessive urea, then the resulted supernatant was dried, methoximated, trimethylsilylated, and analyzed by GC/MS. Forty-nine endogenous metabolites were separated and identified in GC/MS chromatogram, of which 26 identified compounds were selected for quantitative analysis to evaluate the linearity, precision, and sensitivity of the method. It showed good linearity between mass spectrometry responses and relative concentrations of the 26 endogenous compounds over the range from 0.063 to 1.000 (v/v, urine/urine+water) and satisfactory reproducibility with intra-day and inter-days precision values all below 15%. The metabonomic profiling method based on GC/MS was successfully applied to urine samples from hyperlipidemia model rats. Obviously, separated clustering of model rats and the control rats were shown by principal components analysis (PCA); time-dependent metabonomic modification was detected as well. It was suggested that metabonomic profiling based on GC/MS be a robust method for urine samples.  相似文献   

18.
The utilization of rhizomes from the genus Atractylodes has been challenging due to their closely related origins. In this study, we developed an analytical strategy to differentiate Atractylodes lancea (A. lancea), Atractylodes chinensis (A. chinensis), Atractylodes japonica (A. japonica), and Atractylodes macrocephala (A. macrocephala), and compared their volatile compositions. Gas chromatography-mass spectrometry (GC/MS) was used to analyze the volatile profiles of essential oils extracted from 59 batches of samples. Chemometric methods enabled a better understanding of the differences in volatile oils between the four species and identified significant components affecting their classification and quality. A total of 50 volatile components were identified from the essential oils by GC/MS. Unsupervised and supervised chemometric analyses accurately distinguished A. lancea, A. chinensis, A. japonica, and A. macrocephala. Furthermore, five characteristic chemical markers, namely hinesol, β-eudesmol, atractylon, atractylodin and atractylenolide I, were obtained, and their respective percentage contents in individual species and samples were determined. This study provides a valuable reference for the quality evaluation of medicinal plants with essential oils and holds significance for species differentiation and the rational clinical application of Atractylodes herbs.  相似文献   

19.
Isobaric tagging, via TMT or iTRAQ, is widely used in quantitative proteomics. To date, tandem mass spectrometric analysis of isobarically-labeled peptides with hybrid ion trap–orbitrap (LTQ-OT) instruments has been mainly carried out with higher-energy C-trap dissociation (HCD) or pulsed q dissociation (PQD). HCD provides good fragmentation of the reporter-ions, but peptide sequence-ion recovery is generally poor compared to collision-induced dissociation (CID). Herein, we describe an approach where CID and HCD spectra are combined. The approach ensures efficiently both identification and relative quantification of proteins. Tandem mass tags (TMTs) were used to label digests of human plasma and LC-MS/MS was performed with an LTQ-OT instrument. Different HCD collision energies were tested. The benefits to use CID and HCD with respect to HCD alone were demonstrated in terms of number of identifications, subsequent number of quantifiable proteins, and quantification accuracy. A program was developed to merge the peptide sequence-ion m/z range from CID spectra and the reporter-ion m/z range from HCD spectra, and alternatively to separate both spectral data into different files. As parallel CID in the LTQ almost doesn't affect the analysis duty cycle, the procedure should become a standard for quantitative analyses of proteins with isobaric tagging using LTQ-OT instruments.  相似文献   

20.
Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches.Since the introduction of electron capture dissociation (ECD)1 in 1998 (1), electron-based peptide dissociation technologies have played an important role in analyzing intact proteins and post-translational modifications (2). However, until recently, this research-grade technology was available only to a small number of laboratories because it was commercially unavailable, required experience for operation, and could be implemented only with expensive FT-ICR instruments. The discovery of electron-transfer dissociation (ETD) (3) enabled an ECD-like technology to be implemented in (relatively cheap) ion-trap instruments. Nowadays, many researchers are employing the ETD technology for tandem mass spectra generation (49).Although the hardware technologies to generate ETD spectra are maturing rapidly, software technologies to analyze ETD spectra are still in infancy. There are two major approaches to analyzing tandem mass spectra: de novo sequencing and database search. Both approaches find the best-scoring peptide either among all possible peptides (de novo sequencing) or among all peptides in a protein database (database search). Although de novo sequencing is emerging as an alternative to database search, database search remains a more accurate (and thus preferred) method of spectral interpretation, so here we focus on the database search approach.Numerous database search engines are currently available, including SEQUEST (10), Mascot (11), OMSSA (12), X!Tandem (13), and InsPecT (14). However, most of them are inadequate for the analysis of ETD spectra because they are optimized for collision induced dissociation (CID) spectra that show different fragmentation propensities than those of ETD spectra. Additionally, the existing tandem mass spectrometry (MS/MS) tools are biased toward the analysis of tryptic peptides because trypsin is usually used for CID, and thus not suitable for the analysis of nontryptic peptides that are common for ETD. Therefore, even though some database search engines support the analysis of ETD spectra (e.g. SEQUEST, Mascot, and OMSSA), their performance remains suboptimal when it comes to analyzing ETD spectra. Recently, an ETD-specific database search tool (Z-Core) was developed; however it does not significantly improve over OMSSA (15).We present a new database search tool (MS-GFDB) that significantly outperforms existing database search engines in the analysis of ETD spectra, and performs equally well on nontryptic peptides. MS-GFDB employs the generating function approach (MS-GF) that computes rigorous p values of peptide-spectrum matches (PSMs) based on the spectrum-specific score histogram of all peptides (16).2 MS-GF p values are dependent only on the PSM (and not on the database), thus can be used as an alternative scoring function for the database search.Computing p values requires a scoring model evaluating qualities of PSMs. MS-GF adopts a probabilistic scoring model (MS-Dictionary scoring model) described in Kim et al., 2009 (17), considering multiple features including product ion types, peak intensities and mass errors. To define the parameters of this scoring model, MS-GF only needs a set of training PSMs.3 This set of PSMs can be obtained in a variety of ways: for example, one can generate CID/ETD pairs and use peptides identified by CID to form PSMs for ETD. Alternatively, one can generate spectra from a purified protein (when PSMs can be inferred from the accurate parent mass alone) or use a previously developed (not necessary optimal) tool to generate training PSMs. From these training PSMs, MS-GF automatically derives scoring parameters without assuming any prior knowledge about the specifics of a particular peptide fragmentation method (e.g. ETD, CID, etc.) and/or proteolytic origin of the peptides. MS-GF was originally designed for the analysis of CID spectra, but now it has been extended to other types of spectra generated by various fragmentation techniques and/or various enzymes. We show that MS-GF can be successfully applied to novel types of spectra (e.g. ETD of Lys-N peptides (18, 19)) by simply retraining scoring parameters without any modification. Note that although the same scoring model is used for different types of spectra, the parameters derived to score different types of spectra are dissimilar.We compared the performance of MS-GFDB with Mascot on a large ETD data set and found that it generated many more peptide identifications for the same false discovery rates (FDR). For example, at 1% peptide level FDR, MS-GFDB identified 9450 unique peptides from 81,864 ETD spectra of Lys-N peptides whereas Mascot only identified 3672 unique peptides, ≈160% increase in the number of peptide identifications (a similar improvement is observed for ETD spectra of tryptic peptides).4 MS-GFDB also showed a significant 28% improvement in the number of identified peptides from CID spectra of tryptic peptides (16,203 peptides as compared with 12,658 peptides identified by Mascot).The ETD technology complements rather than replaces CID because both technologies have some advantages: CID for smaller peptides with small charges, ETD for larger and multiply charged peptides (20, 21). An alternative way to utilize ETD is to use it in conjunction with CID because CID and ETD generate complementary sequence information (20, 22, 23). ETD-enabled instruments often support generating both CID and ETD spectra (CID/ETD pairs) for the same peptide. Although the CID/ETD pairs promise a great improvement in peptide identification, the full potential of such pairs has not been fully realized yet. In the case of de novo sequencing, de novo sequencing tools utilizing CID/ETD pairs indeed result in more accurate de novo peptide sequencing than traditional CID-based algorithms (23, 24, 25). However, in the case of database search, the argument that the use of CID/ETD pairs improves peptide identifications remains poorly substantiated. A few tools are developed to use CID/ETD (or CID/ECD) pairs for the database search but they are limited to preprocessing/postprocessing of the spectral data before or following running a traditional database search tool (26, 27). Nielsen et al., 2005 (22) pioneered the combined use of CID and ECD for the database search. Given a CID/ECD pair, they generated a combined spectrum comprised only of complementary pairs of peaks, and searched it with Mascot.5 However, this approach is hard to generalize to less accurate CID/ETD pairs generated by ion-trap instruments because there is a higher chance that the identified complementary pairs of peaks are spurious. More importantly, using traditional MS/MS tools (such as Mascot) for the database search of the combined spectrum is inappropriate, because they are not optimized for analyzing such combined spectra; a better approach would be to develop a new database search tool tailored for the combined spectrum. Recently, Molina et al., 2008 (26) studied database search of CID/ETD pairs using Spectrum Mill (Agilent Technologies, Santa Clara, CA) and came to a counterintuitive conclusion that using only CID spectra identifies 12% more unique peptides than using CID/ETD pairs. We believe that it is an acknowledgment of limitations of the traditional MS/MS database search tools for the analysis of multiple spectra generated from a single peptide.In this paper, we modify the generating function approach for interpreting CID/ETD pairs and further apply it to improve the database search with CID/ETD pairs. In contrast to previous approaches, our scoring is specially designed to interpret CID/ETD pairs and can be generalized to analyzing any type of multiple spectra generated from a single peptide. When CID/ETD pairs from trypsin digests are used, MS-GFDB identified 13% and 27% more peptides compared with the case when only CID spectra and only ETD spectra are used, respectively. The difference was even more prominent when CID/ETD pairs from Lys-N digests were used, with 41% and 33% improvement over CID only and ETD only, respectively.Assigning a p value to a PSM greatly helped researchers to evaluate the quality of peptide identifications. We now turn to the problem of assigning a p value to a peptide-spectrum-spectrum match (PS2M) when two spectra in PS2M are generated by different fragmentation technologies (e.g. ETD and CID). We argue that assigning statistical significance to a PS2M (or even PSnM) is a prerequisite for rigorous CID/ETD analyses. To our knowledge, MS-GFDB is the first tool to generate statistically rigorous p values of PSnMs.The MS-GFDB executable and source code is available at the website of Center for Computational Mass Spectrometry at UCSD (http://proteomics.ucsd.edu). It takes a set of spectra (CID, ETD, or CID/ETD pairs) and a protein database as an input and outputs peptide matches. If the input is a set of CID/ETD pairs, it outputs the best scoring peptide matches and their p values (1) using only CID spectra, (2) using only ETD spectra, and (3) using combined spectra of CID/ETD pairs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号