首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone.  相似文献   

We describe the application of a peptide retention time reversed phase liquid chromatography (RPLC) prediction model previously reported (Petritis et al. Anal. Chem. 2003, 75, 1039) for improved peptide identification. The model uses peptide sequence information to generate a theoretical (predicted) elution time that can be compared with the observed elution time. Using data from a set of known proteins, the retention time parameter was incorporated into a discriminant function for use with tandem mass spectrometry (MS/MS) data analyzed with the peptide/protein identification program SEQUEST. For singly charged ions, the number of confident identifications increased by 12% when the elution time metric is included compared to when mass spectral data is the sole source of information in the context of a Drosophila melanogaster database. A 3-4% improvement was obtained for doubly and triply charged ions for the same biological system. Application to the larger Rattus norvegicus (rat) and human proteome databases resulted in an 8-9% overall increase in the number of confident identifications, when both the discriminant function and elution time are used. The effect of adding "runner-up" hits (peptide matches that are not the highest scoring for a spectra) from SEQUEST is also explored, and we find that the number of confident identifications is further increased by 1% when these hits are also considered. Finally, application of the discriminant functions derived in this work with approximately 2.2 million spectra from over three hundred LC-MS/MS analyses of peptides from human plasma protein resulted in a 16% increase in confident peptide identifications (9022 vs 7779) using elution time information. Further improvements from the use of elution time information can be expected as both the experimental control of elution time reproducibility and the predictive capability are improved.  相似文献   

With great biological interest in post-translational modifications (PTMs), various approaches have been introduced to identify PTMs using MS/MS. Recent developments for PTM identification have focused on an unrestrictive approach that searches MS/MS spectra for all known and possibly even unknown types of PTMs at once. However, the resulting expanded search space requires much longer search time and also increases the number of false positives (incorrect identifications) and false negatives (missed true identifications), thus creating a bottleneck in high throughput analysis. Here we introduce MODa, a novel "multi-blind" spectral alignment algorithm that allows for fast unrestrictive PTM searches with no limitation on the number of modifications per peptide while featuring over an order of magnitude speedup in relation to existing approaches. We demonstrate the sensitivity of MODa on human shotgun proteomics data where it reveals multiple mutations, a wide range of modifications (including glycosylation), and evidence for several putative novel modifications. Based on the reported findings, we argue that the efficiency and sensitivity of MODa make it the first unrestrictive search tool with the potential to fully replace conventional restrictive identification of proteomics mass spectrometry data.  相似文献   

Recent advances in instrument control and enrichment procedures have enabled us to quantify large numbers of phosphoproteins and record site-specific phosphorylation events. An intriguing problem that has arisen with these advances is to accurately validate where phosphorylation events occur, if possible, in an automated manner. The problem is difficult because MS/MS spectra of phosphopeptides are generally more complicated than those of unmodified peptides. For large scale studies, the problem is even more evident because phosphorylation sites are based on single peptide identifications in contrast to protein identifications where at least two peptides from the same protein are required for identification. To address this problem we have developed an integrated strategy that increases the reliability and ease for phosphopeptide validation. We have developed an off-line titanium dioxide (TiO(2)) selective phosphopeptide enrichment procedure for crude cell lysates. Following enrichment, half of the phosphopeptide fractionated sample is enzymatically dephosphorylated, after which both samples are subjected to LC-MS/MS. From the resulting MS/MS analyses, the dephosphorylated peptide is used as a reference spectrum against the original phosphopeptide spectrum, in effect generating two peptide spectra for the same amino acid sequence, thereby enhancing the probability of a correct identification. The integrated procedure is summarized as follows: 1) enrichment for phosphopeptides by TiO(2) chromatography, 2) dephosphorylation of half the sample, 3) LC-MS/MS-based analysis of phosphopeptides and corresponding dephosphorylated peptides, 4) comparison of peptide elution profiles before and after dephosphorylation to confirm phosphorylation, and 5) comparison of MS/MS spectra before and after dephosphorylation to validate the phosphopeptide and its phosphorylation site. This phosphopeptide identification represents a major improvement as compared with identifications based only on single MS/MS spectra and probability-based database searches. We investigated an applicability of this method to crude cell lysates and demonstrate its application on the large scale analysis of phosphorylation sites in differentiating mouse myoblast cells.  相似文献   

Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

We describe an integrated suite of algorithms and software for general accurate mass and time (AMT) tagging data analysis of mass spectrometry data. The AMT approach combines identifications from liquid chromatography (LC) tandem mass spectrometry (MS/MS) data with peptide accurate mass and retention time locations from high-resolution LC-MS data. Our workflow includes the traditional AMT approach, in which MS/MS identifications are located in external databases, as well as methods based on more recent hybrid instruments such as the LTQ-FT or Orbitrap, where MS/MS identifications are embedded with the MS data. We demonstrate our AMT workflow's utility for general data synthesis by combining data from two dissimilar biospecimens. Specifically, we demonstrate its use relevant to serum biomarker discovery by identifying which peptides sequenced by MS/MS analysis of tumor tissue may also be present in the plasma of tumor-bearing and control mice. The analysis workflow, referred to as msInspect/AMT, extends and combines existing open-source platforms for LC-MS/MS (CPAS) and LC-MS (msInspect) data analysis and is available in an unrestricted open-source distribution.  相似文献   

The accurate mass and time (AMT) tag strategy has been recognized as a powerful tool for high-throughput analysis in liquid chromatography–mass spectrometry (LC–MS)-based proteomics. Due to the complexity of the human proteome, this strategy requires highly accurate mass measurements for confident identifications. We have developed a method of building a reference map that allows relaxed criteria for mass errors yet delivers high confidence for peptide identifications. The samples used for generating the peptide database were produced by collecting cysteine-containing peptides from T47D cells and then fractionating the peptides using strong cationic exchange chromatography (SCX). LC–tandem mass spectrometry (MS/MS) data from the SCX fractions were combined to create a comprehensive reference map. After the reference map was built, it was possible to skip the SCX step in further proteomic analyses. We found that the reference-driven identification increases the overall throughput and proteomic coverage by identifying peptides with low intensity or complex interference. The use of the reference map also facilitates the quantitation process by allowing extraction of peptide intensities of interest and incorporating models of theoretical isotope distribution.  相似文献   

A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

At present, mass spectrometry provides a rapid and sensitive means for making conclusive protein identifications from complex mixtures. Sequencing tryptic peptides derived from proteolyzed protein samples, also known as the "Bottom Up" approach, is the mass spectrometric gold standard for identifying unknowns. An alternative technology, "Top Down" characterization, is emerging as a viable option for protein identifications, which involves analyzing the intact unknowns for accurate mass and amino acid sequence tags. In this paper, both characterization methods were employed to more comprehensively differentiate two early-eluting peaks in a process-scale size-exclusion chromatography (SEC) step for a recombinant, immunoglobulin gamma-1 (IgG-1) fusion protein. The contents of each SEC peak were enzymatically digested, and the resulting peptides were mapped using reversed-phase (RP) HPLC-ion trap MS. Many low-level UV signals were observed among the fusion protein-related peptide peaks. These unknowns were collected, concentrated, and analyzed using nanoelectrospray (nanoES) collision-induced dissociation (CID) tandem (MS/MS) mass spectrometry for identification. The peptide sequencing experiments resulted in the identification of twenty host cell-related proteins. Following peptide mapping, the contents of the two SEC peaks were protein mass profiled using on-line RP HPLC coupled to a high-resolution, quadrupole time-of-flight (Qq/TOF) MS. Unknown proteins were also collected, concentrated, and dissociated using nanoES CID MS/MS. Intact protein CID experiments and accurate molecular weight information allowed for the identification of three full length host cell-derived proteins and numerous clips from these and additional proteins. The accurate molecular weight values allowed for the assignment of N- and C-terminal processing, which is difficult to conclusively access from peptide mapping data. The peptide-mapping experiments proved to be far more effective for making protein identifications from complex mixtures, whereas the protein mass profiling was useful for assessing modifications and distinguishing protein clips from full length species.  相似文献   

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

A very popular approach in proteomics is the so-called "shotgun LC-MS/MS" strategy. In its mostly used form, a total protein digest is separated by ion exchange fractionation in the first dimension followed by off- or on-line RP LC-MS/MS. We replaced the first dimension by isoelectric focusing in the liquid phase using the Off-Gel device producing 15 fractions. As peptides are separated by their isoelectric point in the first dimension and hydrophobicity in the second, those experimentally derived parameters (pI and R(T)) can be used for the validation of potentially identified peptides. We applied this strategy to a cellular extract of Drosophila Kc167 cells and identified peptides with two different database search engines, namely PHENYX and SEQUEST, with PeptideProphet validation of the SEQUEST results. PHENYX returned 7582 potential peptide identifications and SEQUEST 7629. The SEQUEST results were reduced to 2006 identifications by validation with PeptideProphet. Validation of the PeptideProphet, SEQUEST and PHENYX results by pI and R(T) parameters confirmed 1837 PeptideProphet identifications while in the remainder of the SEQUEST results another 1130 peptides were found to be likely hits. The validation on PHENYX resulted in the fixation of a solid p-value threshold of <1 x 10(-04) that sets by itself the correct identification confidence to >95%, and a final count of 2034 highly confident peptide identifications was achieved after pI and R(T) validation. Although the PeptideProphet and PHENYX datasets have a very high confidence the overlap of common identifications was only at 79.4%, to be explained by the fact that data interpretation was done searching different protein databases with two search engines of different algorithms. The approach used in this study allowed for an automated and improved data validation process for shotgun proteomics projects producing MS/MS peptide identification results of very high confidence.  相似文献   

We present a method for peptide and protein identification based on LC-MS profiling. The method identified peptides at high-throughput without expending the sequencing time necessary for CID spectra based identification. The measurable peptide properties of mass and liquid chromatographic elution conditions are used to characterize and differentiate peptide features, and these peptide features are matched to a reference database from previously acquired and archived LC-MS/MS experiments to generate sequence assignments. The matches are scored according to the probability of an overlap between the peptide feature and the database peptides resulting in a ranked list of possible peptide sequences for each peptide submitted. This method resulted in 6 times more peptide sequence identifications from a single LC-MS analysis of yeast than from shotgun peptide sequencing using LC-MS/MS.  相似文献   

Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software.Microorganisms have evolved their cellular metabolism to generate energy for life in unusual environments (1), and their capabilities are of great interest in the production of renewable bioenergy and could contribute toward managing the world''s current energy and climate crisis (2). Genomics studies have increased the number of sequenced bioenergy-related microbial genomes and revealed the possible biological reactions involved in bioenergy production (3). Studies of photosynthetic microorganisms, for example, have yielded insights into how they harvest solar energy and use it to produce bioenergy products (4). Despite this importance of microorganisms, the characterization of diverse microbial phenotypes by proteomics tandem mass spectrometry (MS/MS) has been limited. The dominant approaches for MS/MS analysis heavily rely on the availability of completely annotated genomes (i.e. accurate protein databases) (57), yet most microorganisms populating the planet have unsequenced or poorly annotated genomes. Thus it remains challenging to identify proteins from environmental and unculturable organisms.One solution to protein identification in a species with no sequenced genome is to use the genomes of closely related species (8). This requires matching MS/MS data to slightly different peptides in amino acid sequences (polymorphic, orthologous peptides); but matching shifted masses of peptides and their fragment ions is computationally expensive and challenging. Moreover, different species-specific post-translational modifications (PTMs)1 can make the cross-species identification more complex. The common computational approach is tolerantly matching de novo sequences derived from MS/MS data to the database while allowing for amino acid mutations and modifications (911). However, this approach critically depends on good de novo interpretations, which are nearly always partially incorrect and yield high-quality subsequences only for a small fraction of all spectra. The blind database search approach, developed to identify peptides with unexpected modifications, can also be used to directly match MS/MS data from unknown species to a database of closely related species, but its utilization is limited because of its exceptionally large search space (1218). These spectrum-database matching approaches to cross-species identification pose significant challenges in its speed and sensitivity with a huge database, which leads to a much longer search time and more false positive identifications (19, 20).As a complementary approach to spectrum-database matching, spectral library searching is an emerging and promising approach (21). A spectral library is a large collection of identified MS/MS spectra, and an unknown query spectrum can then be identified by direct spectral matching to the library. The great advantage of this approach is the reduction of search space and the use of fragmentation patterns of peptides. The spectral networks approach expands this concept to the identification of modified peptides in MS/MS data sets (22, 23). Spectral networks do not directly search a database, but groups MS/MS spectra by computing the pairwise similarity between MS/MS spectra of peptide variants and then constructs networks where each spectrum defines a node and each significant spectral pair, highly correlated in the fragmentation pattern, defines an edge (Fig. 1). In spectral networks, identification of spectra belonging to the same subnetwork should be related and thus the peptide sequence for an identified spectrum can be propagated to neighboring unidentified spectra.Open in a separate windowFig. 1.Overview of multi-species spectral networks. Nodes represent individual spectra and edges between nodes represent significant pairwise alignment between spectra; edges are labeled with amino acid mutations (dotted edges) or parent mass differences (solid edges). In spectral networks, a peptide and its related variants are ideally grouped into a single subnetwork. If at least one spectrum in a subnetwork is annotated (filled node), all the neighboring spectra (unfilled nodes) can potentially become identified by propagating the annotation over network edges. For example, all spectra in the subnetwork of “peptide A” (top left, blue network) can be annotated via up to three iterative propagations, first from A to {A1, A2, A3}, second from {A2, A3} to {A4, A5}, and third from {A4, A5} to A6. This paradigm can be equally applied to cross-species data analysis, as “peptide L” identified in species 1 (top middle, olive-colored network) is propagated to a node unidentified in species 2, identifying its orthologous “peptide l”, with a serine to alanine polymorphism. Thus, spectral networks enable the detection of orthologous peptide pairs between different species.We recently reported that a vast number of polymorphic, orthologous peptides across species are present in MS/MS data sets (24). We propose a new approach in cross-species proteomics research that aggregates MS/MS of multiple related species followed by spectral networks analysis of the pooled data to capitalize on pairs of spectra from orthologous peptides, as shown in Fig. 1. This approach does not require advance knowledge of the genomes for all species, and enables the identification of novel, polymorphic peptides across species via interspecies propagation. Compared with previous approaches, cross-species spectral network analysis has two major advantages. First, by matching spectra to spectra instead of spectra to database sequences, spectral networks only consider the sequence variability of peptides present in the samples instead of considering all possible variability across the whole database of related species; thus the performance of spectral networks is independent of database size. Second, the analysis of the set of highly related spectra increases the reliability in identifying polymorphic peptides in that multiple different spectra can support the same novel identification. The utility of spectral networks can be also expanded to the proteomic analysis of microbial communities that often contain hundreds of distinct organisms (25, 26). But despite the success of spectral networks in low complexity data sets (22, 23), the analysis of large multi-species proteomics data requires significantly higher reliability in spectral similarity scores because the number of pairwise spectral comparisons grows quadratically with the number of spectra.In this work, we present algorithmic and statistical advances to spectral networks to improve its utility with large and diverse spectral data sets. To statistically assess the significance of spectral alignments in pairing millions of spectra, we propose Align-GF (generating function for spectral alignment) to compute rigorous p values of a spectral pair based on the complete score histogram of all possible alignments between two spectra. We show that Align-GF successfully addressed the reliability challenge in a large data set analysis and demonstrated its utility by leading to a 4-fold increase in the sensitivity of spectral pairs. Even with this dramatically improved accuracy, a very small number of incorrect pairs in a network can still complicate propagation of annotations. To further progress toward the ideal scenario where each subnetwork consists of only spectra from a single peptide family, we introduce new procedures to split mixed networks from different peptide families and show that these effectively eliminate many false spectral pairs. Finally, we propose the first approach to calculation of false discovery rate (FDR) for spectral networks propagation of identifications from unmodified to progressively more modified peptides. The proposed FDR estimation was conservative and was more rigorous for highly modified peptides, and thus now makes propagation results comparable to other peptide identification approaches.The cross-species spectral networks techniques proposed here enabled the proteomic analysis of three different Cyanothece species, including a strain where the genome sequence is not known. Cyanobacteria are one of the most diverse and widely distributed microorganisms and have received significant consideration as satisfying various demands required in bioenergy generation (27). We show that spectral networks can improve peptide identification by up to 38% compared with mainstream approaches, including many polymorphic and modified peptides. Spectral networks could identify peptides with highly divergent sequences (with 7 amino acid mutations) by leveraging networks of variant peptides, and one example subnetwork of species-specific variants of phycobilisome proteins reflects the diversity of photosynthetic light-harvesting strategies (28). Our approach thus demonstrates the potential gains in multi-species proteomics and sets the stage for related developments in higher-complexity metaproteomics samples. Finally, spectral networks revealed many unidentified subnetworks containing only unidentified spectra, thus strongly suggesting the presence of novel peptides that are missing from current protein databases. Although we illustrate the potential of our approach on a specific set of bioenergy-related species, we note that the proposed approach is generic and should be applicable to any other set of related species. The diversity of biologically important protein families could be studied by comparing closely and more remotely related species.  相似文献   

Mass spectrometers that provide high mass accuracy such as FT-ICR instruments are increasingly used in proteomic studies. Although the importance of accurately determined molecular masses for the identification of biomolecules is generally accepted, its role in the analysis of shotgun proteomic data has not been thoroughly studied. To gain insight into this role, we used a hybrid linear quadrupole ion trap/FT-ICR (LTQ FT) mass spectrometer for LC-MS/MS analysis of a highly complex peptide mixture derived from a fraction of the yeast proteome. We applied three data-dependent MS/MS acquisition methods. The FT-ICR part of the hybrid mass spectrometer was either not exploited, used only for survey MS scans, or also used for acquiring selected ion monitoring scans to optimize mass accuracy. MS/MS data were assigned with the SEQUEST algorithm, and peptide identifications were validated by estimating the number of incorrect assignments using the composite target/decoy database search strategy. We developed a simple mass calibration strategy exploiting polydimethylcyclosiloxane background ions as calibrant ions. This strategy allowed us to substantially improve mass accuracy without reducing the number of MS/MS spectra acquired in an LC-MS/MS run. The benefits of high mass accuracy were greatest for assigning MS/MS spectra with low signal-to-noise ratios and for assigning phosphopeptides. Confident peptide identification rates from these data sets could be doubled by the use of mass accuracy information. It was also shown that improving mass accuracy at a cost to the MS/MS acquisition rate substantially lowered the sensitivity of LC-MS/MS analyses. The use of FT-ICR selected ion monitoring scans to maximize mass accuracy reduced the number of protein identifications by 40%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号