首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One of the challenges associated with large-scale proteome analysis using tandem mass spectrometry (MS/MS) and automated database searching is to reduce the number of false positive identifications without sacrificing the number of true positives found. In this work, a systematic investigation of the effect of 2MEGA labeling (N-terminal dimethylation after lysine guanidination) on the proteome analysis of a membrane fraction of an Escherichia coli cell extract by 2-dimensional liquid chromatography MS/MS is presented. By a large-scale comparison of MS/MS spectra of native peptides with those from the 2MEGA-labeled peptides, the labeled peptides were found to undergo facile fragmentation with enhanced a1 or a1-related (a(1)-17 and a(1)-45) ions derived from all N-terminal amino acids in the MS/MS spectra; these ions are usually difficult to detect in the MS/MS spectra of nonderivatized peptides. The 2MEGA labeling alleviated the biased detection of arginine-terminated peptides that is often observed in MALDI and ESI MS experiments. 2MEGA labeling was found not only to increase the number of peptides and proteins identified but also to generate enhanced a1 or a1-related ions as a constraint to reduce the number of false positive identifications. In total, 640 proteins were identified from the E. coli membrane fraction, with each protein identified based on peptide mass and sequence match of one or more peptides using MASCOT database search algorithm from the MS/MS spectra generated by a quadrupole time-of-flight mass spectrometer. Among them, the subcellular locations of 336 proteins are presently known, including 258 membrane and membrane-associated proteins (76.8%). Among the classified proteins, there was a dramatic increase in the total number of integral membrane proteins identified in the 2MEGA-labeled sample (153 proteins) versus the unlabeled sample (77 proteins).  相似文献   

2.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

3.
Hu Y  Li Y  Lam H 《Proteomics》2011,11(24):4702-4711
Spectral library searching is a promising alternative to sequence database searching in peptide identification from MS/MS spectra. The key advantage of spectral library searching is the utilization of more spectral features to improve score discrimination between good and bad matches, and hence sensitivity. However, the coverage of reference spectral library is limited by current experimental and computational methods. We developed a computational approach to expand the coverage of spectral libraries with semi-empirical spectra predicted from perturbing known spectra of similar sequences, such as those with single amino acid substitutions. We hypothesized that the peptide of similar sequences should produce similar fragmentation patterns, at least in most cases. Our results confirm our hypothesis and specify when this approach can be applied. In actual spectral searching of real data sets, the sensitivity advantage of spectral library searching over sequence database searching can be mostly retained even when all real spectra are replaced by semi-empirical ones. We demonstrated the applicability of this approach by detecting several known non-synonymous single-nucleotide polymorphisms in three large human data sets by spectral searching.  相似文献   

4.
The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies.  相似文献   

5.
Comprehensive proteome profiling of breast cancer tissue samples is challenging, as the tissue samples contain many proteins with varying concentrations and modifications. We report an effective sample preparation strategy combined with liquid chromatography (LC) electrospray ionization (ESI) quadrupole time-of-flight (QTOF) MS/MS for proteome analysis of human breast cancer tissue. The complexity of the breast cancer tissue proteome was reduced by using protein precipitation from a tissue extract, followed by sequential protein solubilization in solvents of different solubilizing strength. The individual fractions of protein mixtures or subproteomes were subjected to trypsin digestion and the resultant peptides were separated by strong cation exchange (SCX) chromatography, followed by reversed-phase capillary LC combined with high resolution and high accuracy ESI-QTOF MS/MS. This approach identified 14407 unique peptides from 3749 different proteins based on peptide matches with scores above the threshold scores at the 95% confidence level in MASCOT database search of the acquired MS/MS spectra. The false positive rate of peptide matches was determined to be 0.95% by using the target-decoy sequence search strategy. On the basis of gene ontology categorization, the identified proteins represented a wide variety of biological functions, cellular processes, and cellular locations.  相似文献   

6.
We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples.  相似文献   

7.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

8.
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

9.
Wenguang Shao  Kan Zhu  Henry Lam 《Proteomics》2013,13(22):3273-3283
Spectral library searching is a maturing approach for peptide identification from MS/MS, offering an alternative to traditional sequence database searching. Spectral library searching relies on direct spectrum‐to‐spectrum matching between the query data and the spectral library, which affords better discrimination of true and false matches, leading to improved sensitivity. However, due to the inherent diversity of the peak location and intensity profiles of real spectra, the resulting similarity score distributions often take on unpredictable shapes. This makes it difficult to model the scores of the false matches accurately, necessitating the use of decoy searching to sample the score distribution of the false matches. Here, we refined the similarity scoring in spectral library searching to enable the validation of spectral search results without the use of decoys. We rank‐transformed the peak intensities to standardize all spectra, making it possible to fit a parametric distribution to the scores of the nontop‐scoring spectral matches. The statistical significance of the top‐scoring match can then be estimated in a rigorous manner according to Extreme Value Theory. The overall result is a more robust and interpretable measure of the quality of the spectral match, which can be obtained without decoys. We tested this refined similarity scoring function on real datasets and demonstrated its effectiveness. This approach reduces search time, increases sensitivity, and extends spectral library searching to situations where decoy spectra cannot be readily generated, such as in searching unidentified and nonpeptide spectral libraries.  相似文献   

10.
A strategy based on isotope labeling of peptides and liquid chromatography matrix-assisted laser desorption ionization mass spectrometry (LC-MALDI MS) has been employed to accurately quantify and confidently identify differentially expressed proteins between an E-cadherin-deficient human carcinoma cell line (SCC9) and its transfectants expressing E-cadherin (SCC9-E). Proteins extracted from each cell line were tryptically digested and the resultant peptides were labeled individually with either d(0)- or d(2)-formaldehyde. The labeled peptides were combined and the peptide mixture was separated and fractionated by a strong cation exchange (SCX) column. Peptides from each SCX fraction were further separated by a microbore reversed-phase (RP) LC column. The effluents were then directly spotted onto a MALDI target using a heated droplet LC-MALDI interface. After mixing with a MALDI matrix, individual sample spots were analyzed by MALDI quadrupole time-of-flight MS, using an initial MS scan to quantify the dimethyl labeled peptide pairs. MS/MS analysis was then carried out on the peptide pairs having relative peak intensity changes of greater than 2-fold. The MS/MS spectra were subjected to database searching for protein identification. The search results were further confirmed by comparing the MS/MS spectra of the peptide pairs. Using this strategy, we detected and compared relative peak intensity changes of 5480 peptide pairs. Among them, 320 peptide pairs showed changes of greater than 2-fold. MS/MS analysis of these changing pairs led to the identification of 49 differentially expressed proteins between the parental SCC9 cells and SCC9-E transfectants. These proteins were determined to be involved in different pathways regulating cytoskeletal organization, cell adhesion, epithelial polarity, and cell proliferation. The changes in protein expression were consistent with increased cell-cell and cell-matrix adhesion and decreased proliferation in SCC9-E cells, in line with E-cadherin tumor suppressor activity. Finally, the accuracy of the MS quantification and subcellular localization for 6 differentially expressed proteins were validated by immunoblotting and immunofluorescence assays.  相似文献   

11.
Ahrné E  Ohta Y  Nikitin F  Scherl A  Lisacek F  Müller M 《Proteomics》2011,11(20):4085-4095
The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high-throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false-positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on-going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.  相似文献   

12.
Searching a spectral library for the identification of protein MS/MS data has proven to be a fast and accurate method, while yielding a high identification rate. We investigated the potential to increase peptide discovery rate, with little increase in computational time, by constructing a workflow based on a sequence search with Phenyx followed by a library search with SpectraST. Searching a consensus library compiled from the search results of the prior Phenyx search increased the number of confidently matched spectra by up to 156%. Additionally matched spectra by SpectraST included noisy spectra, spectra representing missed cleaved peptides as well as spectra from post‐translationally modified peptides.  相似文献   

13.
In a typical shotgun proteomics experiment, a significant number of high‐quality MS/MS spectra remain “unassigned.” The main focus of this work is to improve our understanding of various sources of unassigned high‐quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.  相似文献   

14.
Zhang X  Li Y  Shao W  Lam H 《Proteomics》2011,11(6):1075-1085
Spectral library searching has been recently proposed as an alternative to sequence database searching for peptide identification from MS/MS. We performed a systematic comparison between spectral library searching and sequence database searching using a wide variety of data to better demonstrate, and understand, the superior sensitivity of the former observed in preliminary studies. By decoupling the effect of search space, we demonstrated that the success of spectral library searching is primarily attributable to the use of real library spectra for matching, without which the sensitivity advantage largely disappears. We further determined the extent to which the use of real peak intensities and non-canonical fragments, both under-utilized information in sequence database searching, contributes to the sensitivity advantage. Lastly, we showed that spectral library searching is disproportionately more successful in identifying low-quality spectra, and complex spectra of higher- charged precursors, both important frontiers in peptide sequencing. Our results answered important outstanding questions about this promising yet unproven method using well-controlled computational experiments and sound statistical approaches.  相似文献   

15.
基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。  相似文献   

16.
Advances in liquid chromatography‐mass spectrometry have facilitated the incorporation of proteomic studies to many biology experimental workflows. Data‐independent acquisition platforms, such as sequential window acquisition of all theoretical mass spectra (SWATH‐MS), offer several advantages for label‐free quantitative assessment of complex proteomes over data‐dependent acquisition (DDA) approaches. However, SWATH data interpretation requires spectral libraries as a detailed reference resource. The guinea pig (Cavia porcellus) is an excellent experimental model for translation to many aspects of human physiology and disease, yet there is limited experimental information regarding its proteome. To overcome this knowledge gap, a comprehensive spectral library of the guinea pig proteome is generated. Homogenates and tryptic digests are prepared from 16 tissues and subjected to >200 DDA runs. Analysis of >250 000 peptide‐spectrum matches resulted in a library of 73 594 peptides from 7666 proteins. Library validation is provided by i) analyzing externally derived SWATH files ( https://doi.org/10.1016/j.jprot.2018.03.023 ) and comparing peptide intensity quantifications; ii) merging of externally derived data to the base library. This furnishes the research community with a comprehensive proteomic resource that will facilitate future molecular‐phenotypic studies using (re‐engaging) the guinea pig as an experimental model of relevance to human biology. The spectral library and raw data are freely accessible in the MassIVE repository (MSV000083199).  相似文献   

17.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

18.
High‐resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2‐Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2‐Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2‐Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non‐SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2‐Deisotoper can be used via a user interface or as a command‐line tool.  相似文献   

19.
The discovery of unanticipated protein modifications is one of the most challenging problems in proteomics. Whereas widely used algorithms such as Sequest and Mascot enable mapping of modifications when the mass and amino acid specificity are known, unexpected modifications cannot be identified with these tools. We have developed an algorithm and software called P-Mod, which enables discovery and sequence mapping of modifications to target proteins known to be represented in the analysis or identified by Sequest. P-Mod matches MS/MS spectra to peptide sequences in a search list. For spectra of modified peptides, P-Mod calculates mass differences between search peptide sequences and MS/MS precursors and localizes the mass shift to a sequence position in the peptide. Because modifications are detected as mass shifts, P-Mod does not require the user to guess at masses or sequence locations of modifications. P-Mod uses extreme value statistics to assign p value estimates to sequence-to-spectrum matches. The reported p values are scaled to account for the number of comparisons, so that error rates do not increase with the expanded search lists that result from incorporating potential peptide modifications. Combination of P-Mod searches from multiple LC-MS/MS analyses and multiple samples revealed previously unreported BSA modifications, including a novel decarboxymethylation or D-->G substitution at position 579 of the protein. P-Mod can serve a unique role in the identification of protein modifications both from exogenous and endogenous sources and may be useful for identifying modified protein forms as biomarkers for toxicity and disease processes.  相似文献   

20.
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号