期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Triplex protein quantification based on stable isotope labeling by peptide dimethylation applied to cell and tissue lysates

Boersema PJ Aye TT van Veen TA Heck AJ Mohammed S 《Proteomics》2008,8(22):4624-4632

Stable isotope labeling is at present one of the most powerful methods in quantitative proteomics. Stable isotope labeling has been performed at both the protein as well as the peptide level using either metabolic or chemical labeling. Here, we present a straightforward and cost-effective triplex quantification method that is based on stable isotope dimethyl labeling at the peptide level. Herein, all proteolytic peptides are chemically labeled at their alpha- and epsilon-amino groups. We use three different isotopomers of formaldehyde to enable the parallel analysis of three different samples. These labels provide a minimum of 4 Da mass difference between peaks in the generated peptide triplets. The method was evaluated based on the quantitative analysis of a cell lysate, using a typical "shotgun" proteomics experiment. While peptide complexity was increased by introducing three labels, still more than 1300 proteins could be identified using 60 microg of starting material, whereby more than 600 proteins could be quantified using at least four peptides per protein. The triplex labeling was further utilized to distinguish specific from aspecific cAMP binding proteins in a chemical proteomics experiment using immobilized cAMP. Thereby, differences in abundance ratio of more than two orders of magnitude could be quantified. 相似文献

2.

Re-fraction: a machine learning approach for deterministic identification of protein homologues and splice variants in large-scale MS-based proteomics

Yang P Humphrey SJ Fazakerley DJ Prior MJ Yang G James DE Yang JY 《Journal of proteome research》2012,11(5):3035-3045

A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences. 相似文献

3.

胰蛋白酶镜像酶LysargiNase的开发及其在蛋白质组学研究中的应用

张俊令彭雪辉王富强徐平《生物工程学报》2019,35(5):741-748

蛋白质组学是系统鉴定、定量蛋白质及其翻译后修饰形式,并研究这些蛋白质生物学功能的学科。目前,基于质谱的鸟枪法蛋白质组学技术是蛋白质组学研究的主要手段之一,其技术流程是先将蛋白质组样品经位点特异性蛋白酶消化形成肽组,再进行高效液相色谱分离和质谱检测。而位点特异性蛋白酶对蛋白质样品的消化是质谱检测的前提和基础。随着蛋白质组学研究的深入,多种位点特异性蛋白酶被先后开发利用;而切割发生在相应氨基酸的N端,与传统的C端蛋白酶互为镜像的蛋白酶的鉴定、开发、特性研究和广泛使用更是为蛋白质组学研究提供了新的工具。文中对最近发现的胰蛋白酶的镜像酶——赖氨酸精氨酸N端蛋白酶(LysargiNase)的特点及其应用进行综述,为国内外学者更加广泛的使用创造条件。相似文献

4.

Analysis of the tryptic search space in UniProt databases

下载免费PDF全文

Emanuele Alpi Johannes Griss Alan Wilter Sousa da Silva Benoit Bely Ricardo Antunes Hermann Zellner Daniel Ríos Claire O'Donovan Juan Antonio Vizcaíno Maria J. Martin 《Proteomics》2015,15(1):48-57

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism‐specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease‐associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide‐level identifications in the main MS‐based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism‐specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS‐based bottom‐up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes. 相似文献

5.

Analysis of the Arabidopsis cytosolic ribosome proteome provides detailed insights into its components and their post-translational modification

Carroll AJ Heazlewood JL Ito J Millar AH 《Molecular & cellular proteomics : MCP》2008,7(2):347-369

Finding gene-specific peptides by mass spectrometry analysis to pinpoint gene loci responsible for particular protein products is a major challenge in proteomics especially in highly conserved gene families in higher eukaryotes. We used a combination of in silico approaches coupled to mass spectrometry analysis to advance the proteomics insight into Arabidopsis cytosolic ribosomal composition and its post-translational modifications. In silico digestion of all 409 ribosomal protein sequences in Arabidopsis defined the proportion of theoretical gene-specific peptides for each gene family and highlighted the need for low m/z cutoffs of MS ion selection for MS/MS to characterize low molecular weight, highly basic ribosomal proteins. We undertook an extensive MS/MS survey of the cytosolic ribosome using trypsin and, when required, chymotrypsin and pepsin. We then used custom software to extract and filter peptide match information from Mascot result files and implement high confidence criteria for calling gene-specific identifications based on the highest quality unambiguous spectra matching exclusively to certain in silico predicted gene- or gene family-specific peptides. This provided an in-depth analysis of the protein composition based on 1446 high quality MS/MS spectra matching to 795 peptide sequences from ribosomal proteins. These identified peptides from five gene families of ribosomal proteins not identified previously, providing experimental data on 79 of the 80 different types of ribosomal subunits. We provide strong evidence for gene-specific identification of 87 different ribosomal proteins from these 79 families. We also provide new information on 30 specific sites of co- and post-translational modification of ribosomal proteins in Arabidopsis by initiator methionine removal, N-terminal acetylation, N-terminal methylation, lysine N-methylation, and phosphorylation. These site-specific modification data provide a wealth of resources for further assessment of the role of ribosome modification in influencing translation in Arabidopsis. 相似文献

6.

Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage

Van PT Schmid AK King NL Kaur A Pan M Whitehead K Koide T Facciotti MT Goo YA Deutsch EW Reiss DJ Mallick P Baliga NS 《Journal of proteome research》2008,7(9):3755-3764

The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics. 相似文献

7.

A high-quality catalog of the Drosophila melanogaster proteome

Brunner E Ahrens CH Mohanty S Baetschmann H Loevenich S Potthast F Deutsch EW Panse C de Lichtenberg U Rinner O Lee H Pedrioli PG Malmstrom J Koehler K Schrimpf S Krijgsveld J Kregenow F Heck AJ Hafen E Schlapbach R Aebersold R 《Nature biotechnology》2007,25(5):576-583

Understanding how proteins and their complex interaction networks convert the genomic information into a dynamic living organism is a fundamental challenge in biological sciences. As an important step towards understanding the systems biology of a complex eukaryote, we cataloged 63% of the predicted Drosophila melanogaster proteome by detecting 9,124 proteins from 498,000 redundant and 72,281 distinct peptide identifications. This unprecedented high proteome coverage for a complex eukaryote was achieved by combining sample diversity, multidimensional biochemical fractionation and analysis-driven experimentation feedback loops, whereby data collection is guided by statistical analysis of prior data. We show that high-quality proteomics data provide crucial information to amend genome annotation and to confirm many predicted gene models. We also present experimentally identified proteotypic peptides matching approximately 50% of D. melanogaster gene models. This library of proteotypic peptides should enable fast, targeted and quantitative proteomic studies to elucidate the systems biology of this model organism. 相似文献

8.

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Gloria M Sheynkman James E Johnson Pratik D Jagtap Michael R Shortreed Getiria Onsongo Brian L Frey Timothy J Griffin Lloyd M Smith 《BMC genomics》2014,15(1)

相似文献

9.

A proteome catalog of Drosophila melanogaster: an essential resource for targeted quantitative proteomics

Ahrens CH Brunner E Hafen E Aebersold R Basler K 《Fly》2007,1(3):182-186

Proteomic analyses are critically important for systems biology because important aspects related to the structure, function and control of biological systems are only amenable by direct protein measurements. It has become apparent that the current proteomics technologies are unlikely to allow routine, quantitative measurements of whole proteomes. We have therefore suggested and largely implemented a two-step strategy for quantitative proteome analysis. In a first step, the discovery phase, the proteome observable by mass spectrometry is extensively analyzed. The resulting proteome catalog can then be used to select peptides specific to only one protein, so-called proteotypic peptides (PTPs). It represents the basis to realize sensitive, robust and reproducible measurements based on targeted mass spectrometry of these PTPs in a subsequent scoring phase. In this Extra View we describe the need for such proteome catalogs and their multiple benefits for catalyzing the shift towards targeted quantitative proteomic analysis and beyond. We use the Insulin signaling cascade as a representative example to illustrate the limitations of currently used proteomics approaches for the specific analysis of individual pathway components, and describe how the recently published Drosophila proteome catalog already helped to overcome many of these limitations. 相似文献

10.

PhoPepMass: A database and search tool assisting human phosphorylation peptide identification from mass spectrometry data

Menghuan Zhang Hui Cui Lanming Chen Ying Yu Michael O. Glocker Lu Xie 《遗传学报》2018,45(7):381-388

Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html. 相似文献

11.

Enhanced information output from shotgun proteomics data by protein quantification and peptide quality control (PQPQ)

Forshed J Johansson HJ Pernemalm M Branca RM Sandberg A Lehtiö J 《Molecular & cellular proteomics : MCP》2011,10(10):M111.010264

We present a tool to improve quantitative accuracy and precision in mass spectrometry based on shotgun proteomics: protein quantification by peptide quality control, PQPQ. The method is based on the assumption that the quantitative pattern of peptides derived from one protein will correlate over several samples. Dissonant patterns arise either from outlier peptides or because of the presence of different protein species. By correlation analysis, protein quantification by peptide quality control identifies and excludes outliers and detects the existence of different protein species. Alternative protein species are then quantified separately. By validating the algorithm on seven data sets related to different cancer studies we show that data processing by protein quantification by peptide quality control improves the information output from shotgun proteomics. Data from two labeling procedures and three different instrumental platforms was included in the evaluation. With this unique method using both peptide sequence data and quantitative data we can improve the quantitative accuracy and precision on the protein level and detect different protein species. 相似文献

12.

A framework for intelligent data acquisition and real-time database searching for shotgun proteomics

Graumann J Scheltema RA Zhang Y Cox J Mann M 《Molecular & cellular proteomics : MCP》2012,11(3):M111.013185

In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available. 相似文献

13.

The complete peptide dictionary--a meta-proteomics resource

Askenazi M Marto JA Linial M 《Proteomics》2010,10(23):4306-4310

Recent developments in MS-based proteomics have increased the emphasis on peptides as a primary observable. While peptides are identified by tandem mass spectra, the link between peptide and protein remains implicit given the bottom-up nature of the experiment in which proteins are enzymatically digested prior to sequencing. It is therefore useful to provide a fast lookup from peptide to protein in order to systematically establish the broadest possible protein basis for the observed peptides. Here, we describe Pep2Pro, a fast web-service providing protein lookup by peptides covering the entire protein space comprising ～10 million UniRef100 sequences. We demonstrate the usefulness of the service by reanalyzing peptides from two recent meta-proteomic data sets and identifying taxon-specific peptides, thereby implicating individual species as being present in these complex samples. The Pep2Pro web service can be accessed at http://www.pep2pro.org. 相似文献

14.

A bioinformatics workflow for variant peptide detection in shotgun proteomics

Li J Su Z Ma ZQ Slebos RJ Halvey P Tabb DL Liebler DC Pao W Zhang B 《Molecular & cellular proteomics : MCP》2011,10(5):M110.006536

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. 相似文献

15.

基于质谱的选择反应监测技术相关策略和方法的研究进展

常乘吴松锋马洁张伟朱云平《生物化学与生物物理进展》2012,39(11):1118-1127

随着蛋白质组学研究的不断深入,基于质谱的选择反应监测技术(SRM)已经成为以发现生物标志物为代表的定向蛋白质组学研究的重要手段.SRM技术根据假设信息,特异性地获取符合假设条件的质谱信号,去除不符合条件的离子信号干扰,从而得到特定蛋白质的定量信息.SRM技术具有更高的灵敏度和精确性、更大的动态范围等优势.该技术可分为实验设计、数据获取和数据分析三个步骤.在这几个步骤中,最重要的是利用生物信息学手段总结当前实验数据的结果,并用机器学习方法和总结的经验规则进行SRM实验的母离子和子离子对的预测.针对数据质控和定量的生物信息学方法研究在提高SRM数据可靠性方面具有重要作用.此外,为方便SRM的研究,本文还收集、汇总了SRM技术相关的软件、工具和数据库资源.随着质谱仪器的不断发展,新的SRM实验策略以及分析方法、计算工具也应运而生.结合更优化的实验策略、方法,采用更精准的生物信息学算法和工具,SRM在未来蛋白质组学的发展中将发挥更加重要的作用. 相似文献

16.

A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas

Farrah T Deutsch EW Omenn GS Campbell DS Sun Z Bletz JA Mallick P Katz JE Malmström J Ossola R Watts JD Lin B Zhang H Moritz RL Aebersold R 《Molecular & cellular proteomics : MCP》2011,10(9):M110.006353

Human blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments. Applying advanced computational methods developed for the analysis and integration of very large and diverse data sets generated by tandem MS measurements of tryptic peptides, we have now compiled a high-confidence human plasma proteome reference set with well over twice the identified proteins of previous high-confidence sets. It includes a hierarchy of protein identifications at different levels of redundancy following a clearly defined scheme, which we propose as a standard that can be applied to any proteomics data set to facilitate cross-proteome analyses. Further, to aid in development of blood-based diagnostics using techniques such as selected reaction monitoring, we provide a rough estimate of protein concentrations using spectral counting. We identified 20,433 distinct peptides, from which we inferred a highly nonredundant set of 1929 protein sequences at a false discovery rate of 1%. We have made this resource available via PeptideAtlas, a large, multiorganism, publicly accessible compendium of peptides identified in tandem MS experiments conducted by laboratories around the world. 相似文献

17.

Improvements in proteomic metrics of low abundance proteins through proteome equalization using ProteoMiner prior to MudPIT

Fonslow BR Carvalho PC Academia K Freeby S Xu T Nakorchevsky A Paulus A Yates JR 《Journal of proteome research》2011,10(8):3690-3700

Ideally, shotgun proteomics would facilitate the identification of an entire proteome with 100% protein sequence coverage. In reality, the large dynamic range and complexity of cellular proteomes results in oversampling of abundant proteins, while peptides from low abundance proteins are undersampled or remain undetected. We tested the proteome equalization technology, ProteoMiner, in conjunction with Multidimensional Protein Identification Technology (MudPIT) to determine how the equalization of protein dynamic range could improve shotgun proteomics methods for the analysis of cellular proteomes. Our results suggest low abundance protein identifications were improved by two mechanisms: (1) depletion of high abundance proteins freed ion trap sampling space usually occupied by high abundance peptides and (2) enrichment of low abundance proteins increased the probability of sampling their corresponding more abundant peptides. Both mechanisms also contributed to dramatic increases in the quantity of peptides identified and the quality of MS/MS spectra acquired due to increases in precursor intensity of peptides from low abundance proteins. From our large data set of identified proteins, we categorized the dominant physicochemical factors that facilitate proteome equalization with a hexapeptide library. These results illustrate that equalization of the dynamic range of the cellular proteome is a promising methodology to improve low abundance protein identification confidence, reproducibility, and sequence coverage in shotgun proteomics experiments, opening a new avenue of research for improving proteome coverage. 相似文献

18.

Experiments in searching small proteins in unannotated large eukaryotic genomes

Colinge J Cusin I Reffas S Mahé E Niknejad A Rey PA Mattou H Moniatte M Bougueleret L 《Journal of proteome research》2005,4(1):167-174

There is growing interest to use mass spectrometry data to search genome sequences directly. Previous work by other authors demonstrated that this approach is able to correct and complement available genome annotations. We discuss the practical difficulty of searching large eukaryotic genomes with peptide ion trap tandem mass spectra of small proteins (<40 kDa). The challenging problem of automatically identifying peptides that span across exon/intron boundaries is explored for the first time by using experimental data. In a human genome search, we find that roughly 30% of the peptides are missed, due to various reasons, compared to a Swiss-Prot search. We show that this percentage is significantly reduced with improved parent mass accuracy. We finally provide several examples of predicted gene structures that could be improved by proteomics data, in particular by peptides spanning across exon/intron boundaries. 相似文献

19.

Fast and accurate identification of semi-tryptic peptides in shotgun proteomics

Alves P Arnold RJ Clemmer DE Li Y Reilly JP Sheng Q Tang H Xun Z Zeng R Radivojac P 《Bioinformatics (Oxford, England)》2008,24(1):102-109

MOTIVATION: One of the major problems in shotgun proteomics is the low peptide coverage when analyzing complex protein samples. Identifying more peptides, e.g. non-tryptic peptides, may increase the peptide coverage and improve protein identification and/or quantification that are based on the peptide identification results. Searching for all potential non-tryptic peptides is, however, time consuming for shotgun proteomics data from complex samples, and poses a challenge for a routine data analysis. RESULTS: We hypothesize that non-tryptic peptides are mainly created from the truncation of regular tryptic peptides before separation. We introduce the notion of truncatability of a tryptic peptide, i.e. the probability of the peptide to be identified in its truncated form, and build a predictor to estimate a peptide's truncatability from its sequence. We show that our predictions achieve useful accuracy, with the area under the ROC curve from 76% to 87%, and can be used to filter the sequence database for identifying truncated peptides. After filtering, only a limited number of tryptic peptides with the highest truncatability are retained for non-tryptic peptide searching. By applying this method to identification of semi-tryptic peptides, we show that a significant number of such peptides can be identified within a searching time comparable to that of tryptic peptide identification. 相似文献

20.

Improving large-scale proteomics by clustering of mass spectrometry data

Beer I Barnea E Ziv T Admon A 《Proteomics》2004,4(4):950-960

Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects. 相似文献