期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database

Omenn GS States DJ Adamski M Blackwell TW Menon R Hermjakob H Apweiler R Haab BB Simpson RJ Eddes JS Kapp EA Moritz RL Chan DW Rai AJ Admon A Aebersold R Eng J Hancock WS Hefta SA Meyer H Paik YK Yoo JS Ping P Pounds J Adkins J Qian X Wang R Wasinger V Wu CY Zhao X Zeng R Archakov A Tsugita A Beer I Pandey A Pisano M Andrews P Tammen H Speicher DW Hanash SM 《Proteomics》2005,5(13):3226-3245

HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anti-coagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics.med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay-based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease. 相似文献

2.

Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences 总被引：1，自引：0，他引：1

Padliya ND Garrett WM Campbell KB Tabb DL Cooper B 《Proteomics》2007,7(21):3932-3942

LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens. 相似文献

3.

Automated comparative proteomics based on multiplex tandem mass spectrometry and stable isotope labeling

Zhang G Neubert TA 《Molecular & cellular proteomics : MCP》2006,5(2):401-411

Comparative proteomic approaches using isotopic labeling and MS have become increasingly popular. Conventionally quantification is based on MS or extracted ion chromatogram (XIC) signals of differentially labeled peptides. However, in these MS-based experiments, the accuracy and dynamic range of quantification are limited by the high noise levels of MS/XIC data. Here we report a quantitative strategy based on multiplex (derived from multiple precursor ions) MS/MS data. One set of proteins was metabolically labeled with [13C6]lysine and [15N4]arginine; the other set was unlabeled. For peptide analysis after tryptic digestion of the labeled proteins, a wide precursor window was used to include both the light and heavy versions of each peptide for fragmentation. The multiplex MS/MS data were used for both protein identification and quantification. The use of the wide precursor window increased sensitivity, and the y ion pairs in the multiplex MS/MS spectra from peptides containing labeled and unlabeled lysine or arginine offered more information for, and thus the potential for improving, protein identification. Protein ratios were obtained by comparing intensities of y ions derived from the light and heavy peptides. Our results indicated that this method offers several advantages over the conventional XIC-based approach, including increased sensitivity for protein identification and more accurate quantification with more than a 10-fold increase in dynamic range. In addition, the quantification calculation process was fast, fully automated, and independent of instrument and data type. This method was further validated by quantitative analysis of signaling proteins in the EphB2 pathway in NG108 cells. 相似文献

4.

Algorithms and tools for analysis and management of mass spectrometry data

Veltri P 《Briefings in bioinformatics》2008,9(2):144-155

Mass spectrometry (MS) is a technique that is used for biological studies. It consists in associating a spectrum to a biological sample. A spectrum consists of couples of values (intensity, m/z), where intensity measures the abundance of biomolecules (as proteins) with a mass-to-charge ratio (m/z) present in the originating sample. In proteomics experiments, MS spectra are used to identify pattern expressions in clinical samples that may be responsible of diseases. Recently, to improve the identification of peptides/proteins related to patterns, MS/MS process is used, consisting in performing cascade of mass spectrometric analysis on selected peaks. Latter technique has been demonstrated to improve the identification and quantification of proteins/peptide in samples. Nevertheless, MS analysis deals with a huge amount of data, often affected by noises, thus requiring automatic data management systems. Tools have been developed and most of the time furnished with the instruments allowing: (i) spectra analysis and visualization, (ii) pattern recognition, (iii) protein databases querying, (iv) peptides/proteins quantification and identification. Currently most of the tools supporting such phases need to be optimized to improve the protein (and their functionalities) identification processes. In this article we survey on applications supporting spectrometrists and biologists in obtaining information from biological samples, analyzing available software for different phases. We consider different mass spectrometry techniques, and thus different requirements. We focus on tools for (i) data preprocessing, allowing to prepare results obtained from spectrometers to be analyzed; (ii) spectra analysis, representation and mining, aimed to identify common and/or hidden patterns in spectra sets or in classifying data; (iii) databases querying to identify peptides; and (iv) improving and boosting the identification and quantification of selected peaks. We trace some open problems and report on requirements that represent new challenges for bioinformatics. 相似文献

5.

Computational approach for identification and characterization of GPI-anchored peptides in proteomics experiments

Omaetxebarria MJ Elortza F Rodríguez-Suárez E Aloria K Arizmendi JM Jensen ON Matthiesen R 《Proteomics》2007,7(12):1951-1960

Genes that encode glycosylphosphatidylinositol anchored proteins (GPI-APs) constitute an estimated 1-2% of eukaryote genomes. Current computational methods for the prediction of GPI-APs are sensitive and specific; however, the analysis of the processing site (omega- or omega-site) of GPI-APs is still challenging. Only 10% of the proteins that are annotated as GPI-APs have the omega-site experimentally verified. We describe an integrated computational and experimental proteomics approach for the identification and characterization of GPI-APs that provides the means to identify GPI-APs and the derived GPI-anchored peptides in LC-MS/MS data sets. The method takes advantage of sequence features of GPI-APs and the known core structure of the GPI-anchor. The first stage of the analysis encompasses LC-MS/MS based protein identification. The second stage involves prediction of the processing sites of the identified GPI-APs and prediction of the corresponding terminal tryptic peptides. The third stage calculates possible GPI structures on the peptides from stage two. The fourth stage calculates the scores by comparing the theoretical spectra of the predicted GPI-peptides against the observed MS/MS spectra. Automated identification of C-terminal GPI-peptides from porcine membrane dipeptidase, folate receptor and CD59 in complex LC-MS/MS data sets demonstrates the sensitivity and specificity of this integrated computational and experimental approach. 相似文献

6.

Evaluation of prefractionation methods as a preparatory step for multidimensional based chromatography of serum proteins

Barnea E Sorkin R Ziv T Beer I Admon A 《Proteomics》2005,5(13):3367-3375

Prefractionations of proteins prior to their proteolysis, chromatography, and MS/MS analyses help reduce complexity and increase the yield of protein identifications. A number of methods were evaluated here for prefractionating serum samples distributed to the participating laboratories as part of the human Plasma Proteome Project. These methods include strong cation exchange (SCX) chromatography, slicing of SDS-PAGE gel bands, and liquid-phase IEF of the proteins. The fractionated proteins were trypsinized and the resulting peptides were resolved and analyzed by multidimensional protein identification technology coupled to IT MS/MS. The MS/MS spectra were clustered, combined, and searched against the IPI protein databank using Pep-Miner. The identification results were evaluated for the efficacy of the different prefractionation methodologies to identify larger numbers of proteins at higher confidence and to achieve the best coverage of the proteins with the identified peptides. Prefractionation based on SCX resulted in the largest number of identified proteins, followed by gel slices and then the liquid-phase IEF. An important observation was that each of the methods revealed a set of unique proteins, some identified with high confidence. Therefore, for comprehensive identification of the serum proteins, several different prefractionation approaches should be used in parallel. 相似文献

7.

Isotopically labeled crosslinking reagents: resolution of mass degeneracy in the identification of crosslinked peptides

Collins CJ Schilling B Young M Dollinger G Guy RK 《Bioorganic & medicinal chemistry letters》2003,13(22):4023-4026

Mass spectrometry in three dimensions (MS3D) is a newly developed method for the determination of protein structures involving intramolecular chemical crosslinking of proteins, proteolytic digestion of the resulting adducts, identification of crosslinks by mass spectrometry (MS), peak assignment using theoretical mass lists, and computational reduction of crosslinks to a structure by distance geometry methods. To facilitate the unambiguous identification of crosslinked peptides from proteolytic digestion mixtures of crosslinked proteins by MS, we introduced double 18O isotopic labels into the crosslinking reagent to provide the crosslinked peptides with a characteristic isotope pattern. The presence of doublets separated by 4 Da in the mass spectra of these materials allowed ready discrimination between crosslinked and modified peptides, and uncrosslinked peptides using automated intelligent data acquisition (IDA) of MS/MS data. This should allow ready automation of the method for application to whole expressible proteomes. 相似文献

8.

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study 总被引：10，自引：0，他引：10

States DJ Omenn GS Blackwell TW Fermin D Eng J Speicher DW Hanash SM 《Nature biotechnology》2006,24(3):333-338

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously nonannotated gene sequences. 相似文献

9.

Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project

Adamski M Blackwell T Menon R Martens L Hermjakob H Taylor C Omenn GS States DJ 《Proteomics》2005,5(13):3246-3261

The pilot phase of the HUPO Plasma Proteome Project (PPP) is an international collaboration to catalog the protein composition of human blood plasma and serum by analyzing standardized aliquots of reference serum and plasma specimens using a variety of experimental techniques. Data management for this project included collection, integration, analysis, and dissemination of findings from participating organizations world-wide. Accomplishing this task required a communication and coordination infrastructure specific enough to support meaningful integration of results from all participants, but flexible enough to react to changing requirements and new insights gained during the course of the project and to allow participants with varying informatics capabilities to contribute. Challenges included integrating heterogeneous data, reducing redundant information to minimal identification sets, and data annotation. Our data integration workflow assembles a minimal and representative set of protein identifications, which account for the contributed data. It accommodates incomplete concordance of results from different laboratories, ambiguity and redundancy in contributed identifications, and redundancy in the protein sequence databases. Recommendations of the PPP for future large-scale proteomics endeavors are described. 相似文献

10.

The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools

Klimek J Eddes JS Hohmann L Jackson J Peterson A Letarte S Gafken PR Katz JE Mallick P Lee H Schmidt A Ossola R Eng JK Aebersold R Martin DB 《Journal of proteome research》2008,7(1):96-103

Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/. 相似文献

11.

YPED:An Integrated Bioinformatics Suite and Database for Mass Spectrometry-based Proteomics Research

Christopher M.Colangelo Mark Shifman Kei-Hoi Cheung Kathryn L.Stone Nicholas J.Carriero Erol E.Gulcicek TuKiet T.Lam Terence Wu Robert D.Bjornson Can Bruce Angus C.Nairn Jesse Rinehart Perry L.Miller Kenneth R.Williams 《基因组蛋白质组与生物信息学报(英文版)》2015,13(1):25-35

We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database(YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a singlelaboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry(LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring(MRM)/selective reaction monitoring(SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. 相似文献

12.

In silico analysis of accurate proteomics, complemented by selective isolation of peptides

Perez-Riverol Y Sánchez A Ramos Y Schmidt A Müller M Betancourt L González LJ Vera R Padron G Besada V 《Journal of Proteomics》2011,74(10):2071-2082

Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy of molecular mass determination. However, the high complexity and dynamic ranges for any species of proteomic samples, surpass the separation capacity and detection power of the most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny portion of signals is selected for MS/MS experiments and a still considerable number of them do not provide reliable peptide identification. In this article, an in silico analysis for a novel methodology of peptides and proteins identification is described. The approach is based on mass accuracy, isoelectric point (pI), retention time (t(R)) and N-terminal amino acid determination as protein identification criteria regardless of high quality MS/MS spectra. When the methodology was combined with the selective isolation methods, the number of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the methodology, an OFFGEL-LC-MS/MS experiment was also implemented. We compared the more reliable peptide identified with MS/MS information, and peptide identified with three experimental features (pI, t(R), molecular mass). Also, two theoretical assumptions from MS/MS identification (selective isolation of peptides and N-terminal amino acid) were analyzed. Our results show that using the information provided by these features and selective isolation methods we could found the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than 5%. 相似文献

13.

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Gloria M Sheynkman James E Johnson Pratik D Jagtap Michael R Shortreed Getiria Onsongo Brian L Frey Timothy J Griffin Lloyd M Smith 《BMC genomics》2014,15(1)

相似文献

14.

Quantitative proteome analysis using differential stable isotopic labeling and microbore LC-MALDI MS and MS/MS 总被引：2，自引：0，他引：2

Ji C Li L 《Journal of proteome research》2005,4(3):734-742

We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples. 相似文献

15.

Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins

Bandeira N Clauser KR Pevzner PA 《Molecular & cellular proteomics : MCP》2007,6(7):1123-1134

Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins. 相似文献

16.

Proteomic parsimony through bipartite graph analysis improves accuracy and transparency 总被引：1，自引：0，他引：1

Zhang B Chambers MC Tabb DL 《Journal of proteome research》2007,6(9):3549-3557

Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, na?ve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/. 相似文献

17.

Large-scale identification of proteins in human salivary proteome by liquid chromatography/mass spectrometry and two-dimensional gel electrophoresis-mass spectrometry 总被引：12，自引：0，他引：12

Hu S Xie Y Ramachandran P Ogorzalek Loo RR Li Y Loo JA Wong DT 《Proteomics》2005,5(6):1714-1728

Human saliva contains a large number of proteins and peptides (salivary proteome) that help maintain homeostasis in the oral cavity. Global analysis of human salivary proteome is important for understanding oral health and disease pathogenesis. In this study, large-scale identification of salivary proteins was demonstrated by using shotgun proteomics and two-dimensinal gel electrophoresis-mass spectrometry (2-DE-MS). For the shotgun approach, whole saliva proteins were prefractionated according to molecular weight. The smallest fraction, presumably containing salivary peptides, was directly separated by capillary liquid chromatography (LC). However, the large protein fractions were digested into peptides for subsequent LC separation. Separated peptides were analyzed by on-line electrospray tandem mass spectrometry (MS/MS) using a quadrupole-time of flight mass spectrometer, and the obtained spectra were automatically processed to search human protein sequence database for protein identification. Additionally, 2-DE was used to map out the proteins in whole saliva. Protein spots 105 in number were excised and in-gel digested; and the resulting peptide fragments were measured by matrix-assisted laser desorption/ionization-mass spectrometry and sequenced by LC-MS/MS for protein identification. In total, we cataloged 309 proteins from human whole saliva by using these two proteomic approaches. 相似文献

18.

Extending the coverage of spectral libraries: A neighbor‐based approach to predicting intensities of peptide fragmentation spectra

Chao Ji Randy J. Arnold Kevin J. Sokoloski Richard W. Hardy Haixu Tang Predrag Radivojac 《Proteomics》2013,13(5):756-765

Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra. 相似文献

19.

Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

Martens L Nesvizhskii AI Hermjakob H Adamski M Omenn GS Vandekerckhove J Gevaert K 《Proteomics》2005,5(13):3501-3505

With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a centralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identifications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storage and dissemination of raw data and/or peak lists, building on the extensive experience gained during the PPP pilot phase. Finally, some suggestions are made for both immediate and future storage of MS data in public repositories. 相似文献

20.

Rapid validation of protein identifications with the borderline statistical confidence via de novo sequencing and MS BLAST searches

Wielsch N Thomas H Surendranath V Waridel P Frank A Pevzner P Shevchenko A 《Journal of proteome research》2006,5(9):2448-2456

Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence. 相似文献