首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
De novo peptide sequencing via tandem mass spectrometry.   总被引:10,自引:0,他引:10  
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.  相似文献   

2.
Clustering millions of tandem mass spectra   总被引:1,自引:0,他引:1  
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.  相似文献   

3.
Current efforts aimed at developing high-throughput proteomics focus on increasing the speed of protein identification. Although improvements in sample separation, enrichment, automated handling, mass spectrometric analysis, as well as data reduction and database interrogation strategies have done much to increase the quality, quantity and efficiency of data collection, significant bottlenecks still exist. Various separation techniques have been coupled with tandem mass spectrometric (MS/MS) approaches to allow a quicker analysis of complex mixtures of proteins, especially where a high number of unambiguous protein identifications are the exception, rather than the rule. MS/MS is required to provide structural / amino acid sequence information on a peptide and thus allow protein identity to be inferred from individual peptides. Currently these spectra need to be manually validated because: (a) the potential of false positive matches i.e., protein not in database, and (b) observed fragmentation trends may not be incorporated into current MS/MS search algorithms. This validation represents a significant bottleneck associated with high-throughput proteomic strategies. We have developed CHOMPER, a software program which reduces the time required to both visualize and confirm MS/MS search results and generate post-analysis reports and protein summary tables. CHOMPER extracts the identification information from SEQUEST MS/MS search result files, reproduces both the peptide and protein identification summaries, provides a more interactive visualization of the MS/MS spectra and facilitates the direct submission of manually validated identifications to a database.  相似文献   

4.
Jens Allmer 《Amino acids》2010,38(4):1075-1087
Determining the differential expression of proteins under different conditions is of major importance in proteomics. Since mass spectrometry-based proteomics is often used to quantify proteins, several labelling strategies have been developed. While these are generally more precise than label-free quantitation approaches, they imply specifically designed experiments which also require knowledge about peptides that are expected to be measured and need to be modified. We recently designed the 2DB database which aids storage, analysis, and publication of data from mass spectrometric experiments to identify proteins. This database can aid identifying peptides which can be used for quantitation. Here an extension to the database application, named MSMAG, is presented which allows for more detailed analysis of the distribution of peptides and their associated proteins over the fractions of an experiment. Furthermore, given several biological samples in the database, label-free quantitation can be performed. Thus, interesting proteins, which may warrant further investigation, can be identified en passant while performing high-throughput proteomics studies.  相似文献   

5.
Cell culture is a fundamental tool in proteomics where mammalian cells are cultured in vitro using a growth medium often supplemented with 5–15% FBS. Contamination by bovine proteins is difficult to avoid because of adherence to the plastic vessel and the cultured cells. We have generated peptides from bovine serum using four sample preparation methods and analyzed the peptides by high mass accuracy LC‐MS/MS. Distinguishing between bovine and human peptides is difficult because of a considerable overlap of identical tryptic peptide sequences. Pitfalls in interpretation, different database search strategies to minimize erroneous identifications and an augmented contaminant database are presented.  相似文献   

6.
Sequence determination of peptides is a crucial step in mass spectrometry–based proteomics. Peptide sequences are determined either by database search or by de novo sequencing using tandem mass spectrometry. Determination of all the theoretical expected peptide fragments and eliminating false discoveries remains a challenge in proteomics. Developing standards for evaluating the performance of mass spectrometers and algorithms used for identification of proteins is important for proteomics studies. The current study is focused on these aspects by using synthetic peptides. A total of 599 peptides were designed from in silico tryptic digest with 1 or 2 missed cleavages from 199 human proteins, and synthetic peptides corresponding to these sequences were obtained. The peptides were mixed together, and analysis was carried out using liquid chromatography–electrospray ionization tandem mass spectrometry on a Q-Exactive HF mass spectrometer. The peptides and proteins were identified with SEQUEST program. The analysis was carried out using the proteomics workflows. A total of 573 peptides representing 196 proteins could be identified, and a spectral library was created for these peptides. Analysis parameters such as “no enzyme selection” gave the maximum number of detected peptides as compared with trypsin in the selection. False discoveries could be identified. This study highlights the limitations of peptide detection and the need for developing powerful algorithms along with tools to evaluate mass spectrometers and algorithms. It also shows the limitations of peptide detection even with high-end mass spectrometers. The mass spectral data are available in ProteomeXchange with accession no. PXD017992.  相似文献   

7.
采用自动在线纳流多维液相色谱 串联质谱联用的方法分离和鉴定蔗糖密度梯度离心法分离和富集的小鼠肝脏质膜蛋白质 .以强阳离子交换柱为第一相 ,反相柱为第二相 ,在两相之间连接一预柱脱盐和浓缩肽段 .用含去污剂的溶剂提取细胞质膜中的蛋白质 ,获得的质膜蛋白质经酶解和适当的酸化后通过离子交换柱吸附 ,分别用 10个不同浓度的乙酸铵盐溶液进行分段洗脱 .洗脱物经预柱脱盐和浓缩后进入毛细管反相柱进行反相分离 ,分离后的肽段直接进入质谱仪离子源进行一级和二级质谱分析 .质谱仪采得的数据经计算机处理后用Mascot软件进行蛋白质数据库搜寻 ,共鉴定出 12 6种蛋白质 ,其中 4 1种为膜蛋白 ,包括与膜相关的蛋白质和具有多个跨膜区的整合膜蛋白 ,为建立质膜蛋白质组学研究的适宜方法和质膜蛋白质数据库提供了有价值的基础性研究资料 .  相似文献   

8.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

9.
We report on the analysis of endogenous peptides in cerebrospinal fluid (CSF) by mass spectrometry. A method was developed for preparation of peptide extracts from CSF. Analysis of the extracts by offline LC-MALDI MS resulted in the detection of 3,000-4,000 peptide-like features. Out of these, 730 peptides were identified by MS/MS. The majority of these peptides have not been previously reported in CSF. The identified peptides were found to originate from 104 proteins, of which several have been reported to be involved in different disorders of the central nervous system. These results support the notion that CSF peptidomics may be viable complement to proteomics in the search of biomarkers of CNS disorders.  相似文献   

10.
MS/MS combined with database search methods can identify the proteins present in complex mixtures. High throughput methods that infer probable peptide sequences from enzymatically digested protein samples create a challenge in how best to aggregate the evidence for candidate proteins. Typically the results of multiple technical and/or biological replicate experiments must be combined to maximize sensitivity. We present a statistical method for estimating probabilities of protein expression that integrates peptide sequence identifications from multiple search algorithms and replicate experimental runs. The method was applied to create a repository of 797 non-homologous zebrafish (Danio rerio) proteins, at an empirically validated false identification rate under 1%, as a resource for the development of targeted quantitative proteomics assays. We have implemented this statistical method as an analytic module that can be integrated with an existing suite of open-source proteomics software.  相似文献   

11.
High resolution proteomics approaches have been successfully utilized for the comprehensive characterization of the cell proteome. However, in the case of quantitative proteomics an open question still remains, which quantification strategy is best suited for identification of biologically relevant changes, especially in clinical specimens. In this study, a thorough comparison of a label-free approach (intensity-based) and 8-plex iTRAQ was conducted as applied to the analysis of tumor tissue samples from non-muscle invasive and muscle-invasive bladder cancer. For the latter, two acquisition strategies were tested including analysis of unfractionated and fractioned iTRAQ-labeled peptides. To reduce variability, aliquots of the same protein extract were used as starting material, whereas to obtain representative results per method further sample processing and MS analysis were conducted according to routinely applied protocols. Considering only multiple-peptide identifications, LC-MS/MS analysis resulted in the identification of 910, 1092 and 332 proteins by label-free, fractionated and unfractionated iTRAQ, respectively. The label-free strategy provided higher protein sequence coverage compared to both iTRAQ experiments. Even though pre-fraction of the iTRAQ labeled peptides allowed for a higher number of identifications, this was not accompanied by a respective increase in the number of differentially expressed changes detected. Validity of the proteomics output related to protein identification and differential expression was determined by comparison to existing data in the field (Protein Atlas and published data on the disease). All methods predicted changes which to a large extent agreed with published data, with label-free providing a higher number of significant changes than iTRAQ. Conclusively, both label-free and iTRAQ (when combined to peptide fractionation) provide high proteome coverage and apparently valid predictions in terms of differential expression, nevertheless label-free provides higher sequence coverage and ultimately detects a higher number of differentially expressed proteins. The risk for receiving false associations still exists, particularly when analyzing highly heterogeneous biological samples, raising the need for the analysis of higher sample numbers and/or application of adjustment for multiple testing.  相似文献   

12.
Abstract Several approaches exist for the quantification of proteins in complex samples processed by liquid chromatography-mass spectrometry followed by fragmentation analysis (MS2). One of these approaches is label-free MS2-based quantification, which takes advantage of the information computed from MS2 spectrum observations to estimate the abundance of a protein in a sample. As a first step in this approach, fragmentation spectra are typically matched to the peptides that generated them by a search algorithm. Because different search algorithms identify overlapping but non-identical sets of peptides, here we investigate whether these differences in peptide identification have an impact on the quantification of the proteins in the sample. We therefore evaluated the effect of using different search algorithms by examining the reproducibility of protein quantification in technical repeat measurements of the same sample. From our results, it is clear that a search engine effect does exist for MS2-based label-free protein quantification methods. As a general conclusion, it is recommended to address the overall possibility of search engine-induced bias in the protein quantification results of label-free MS2-based methods by performing the analysis with two or more distinct search engines.  相似文献   

13.
14.
Selected reaction monitoring (SRM) is a mass spectrometry method with documented ability to quantify proteins accurately and reproducibly using labeled reference peptides. However, the use of labeled reference peptides becomes impractical if large numbers of peptides are targeted and when high flexibility is desired when selecting peptides. We have developed a label-free quantitative SRM workflow that relies on a new automated algorithm, Anubis, for accurate peak detection. Anubis efficiently removes interfering signals from contaminating peptides to estimate the true signal of the targeted peptides. We evaluated the algorithm on a published multisite data set and achieved results in line with manual data analysis. In complex peptide mixtures from whole proteome digests of Streptococcus pyogenes we achieved a technical variability across the entire proteome abundance range of 6.5-19.2%, which was considerably below the total variation across biological samples. Our results show that the label-free SRM workflow with automated data analysis is feasible for large-scale biological studies, opening up new possibilities for quantitative proteomics and systems biology.  相似文献   

15.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

16.
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.  相似文献   

17.
Chromatographed peptide signals form the basis of further data processing that eventually results in functional information derived from data‐dependent bottom‐up proteomics assays. We seek to rank LC/MS parent ions by the quality of their extracted ion chromatograms. Ranked extracted ion chromatograms act as an intuitive physical/chemical preselection filter to improve the quality of MS/MS fragment scans submitted for database search. We identify more than 4900 proteins when considering detector shifts of less than 7 ppm. High quality parent ions for which the database search yields no hits become candidates for subsequent unrestricted analysis for PTMs. Following this rational approach, we prioritize identification of more than 5000 spectrum matches from modified peptides and confirmed the presence of acetylaldehyde‐modified His/Lys. We present a logical workflow that scores data‐dependent selected ion chromatograms and leverage information about semianalytical LC/LC dimension prior to MS. Our method can be successfully used to identify unexpected modifications in peptides with excellent chromatography characteristics, independent of fragmentation pattern and activation methods. We illustrate analysis of ion chromatograms detected in two different modes by RF linear ion trap and electrostatic field orbitrap.  相似文献   

18.
The discovery of unanticipated protein modifications is one of the most challenging problems in proteomics. Whereas widely used algorithms such as Sequest and Mascot enable mapping of modifications when the mass and amino acid specificity are known, unexpected modifications cannot be identified with these tools. We have developed an algorithm and software called P-Mod, which enables discovery and sequence mapping of modifications to target proteins known to be represented in the analysis or identified by Sequest. P-Mod matches MS/MS spectra to peptide sequences in a search list. For spectra of modified peptides, P-Mod calculates mass differences between search peptide sequences and MS/MS precursors and localizes the mass shift to a sequence position in the peptide. Because modifications are detected as mass shifts, P-Mod does not require the user to guess at masses or sequence locations of modifications. P-Mod uses extreme value statistics to assign p value estimates to sequence-to-spectrum matches. The reported p values are scaled to account for the number of comparisons, so that error rates do not increase with the expanded search lists that result from incorporating potential peptide modifications. Combination of P-Mod searches from multiple LC-MS/MS analyses and multiple samples revealed previously unreported BSA modifications, including a novel decarboxymethylation or D-->G substitution at position 579 of the protein. P-Mod can serve a unique role in the identification of protein modifications both from exogenous and endogenous sources and may be useful for identifying modified protein forms as biomarkers for toxicity and disease processes.  相似文献   

19.
Post-translational modifications (PTMs) play key roles in the regulation of biological functions of proteins. Although some progress has been made in identifying several PTMs using existing approaches involving a combination of affinity-based enrichment and mass spectrometric analysis, comprehensive identification of PTMs remains a challenging problem in proteomics because of the dynamic complexities of PTMs in vivo and their low abundance. We describe here a strategy for rapid, efficient, and comprehensive identification of PTMs occurring in biological processes in vivo. It involves a selectively excluded mass screening analysis (SEMSA) of unmodified peptides during liquid chromatography-electrospray ionization-quadrupole-time-of-flight tandem mass spectrometry (LC-ESI-q-TOF MS/MS) through replicated runs of a purified protein on two-dimensional gel. A precursor ion list of unmodified peptides with high mass intensities was obtained during the initial run followed by exclusion of these unmodified peptides in subsequent runs. The exclusion list can grow as long as replicate runs are iteratively performed. This enables the identifications of modified peptides with precursor ions of low intensities by MS/MS sequencing. Application of this approach in combination with the PTM search algorithm MODi to GAPDH protein in vivo modified by oxidative stress provides information on multiple protein modifications (19 types of modification on 42 sites) with >92% peptide coverage and the additional potential for finding novel modifications, such as transformation of Cys to Ser. On the basis of the information of precursor ion m/z, quantitative analysis of PTM was performed for identifying molecular changes in heterogeneous protein populations. Our results show that PTMs in mammalian systems in vivo are more complicated and heterogeneous than previously reported. We believe that this strategy has significant potential because it permits systematic characterization of multiple PTMs in functional proteomics.  相似文献   

20.
Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号