首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics.  相似文献   

2.
As proteomic data sets increase in size and complexity, the necessity for database‐centric software systems able to organize, compare, and visualize all the proteomic experiments in a lab grows. We recently developed an integrated platform called high‐throughput autonomous proteomic pipeline (HTAPP) for the automated acquisition and processing of quantitative proteomic data, and integration of proteomic results with existing external protein information resources within a lab‐based relational database called PeptideDepot. Here, we introduce the peptide validation software component of this system, which combines relational database‐integrated electronic manual spectral annotation in Java with a new software tool in the R programming language for the generation of logistic regression spectral models from user‐supplied validated data sets and flexible application of these user‐generated models in automated proteomic workflows. This logistic regression spectral model uses both variables computed directly from SEQUEST output in addition to deterministic variables based on expert manual validation criteria of spectral quality. In the case of linear quadrupole ion trap (LTQ) or LTQ‐FTICR LC/MS data, our logistic spectral model outperformed both XCorr (242% more peptides identified on average) and the X!Tandem E‐value (87% more peptides identified on average) at a 1% false discovery rate estimated by decoy database approach.  相似文献   

3.
4.
Kebing Yu  Arthur R. Salomon 《Proteomics》2010,10(11):2113-2122
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab‐based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic data sets is critically important. The high‐throughput autonomous proteomic pipeline described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is composed of a software that controls the acquisition of mass spectral data along with automation of post‐acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user‐configurable lab‐based relational database. The software design of high‐throughput autonomous proteomic pipeline focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples.  相似文献   

5.
6.
Genomic and proteomic data were integrated into the proteogenomic workflow to identify coding genomic variants of Human Embryonic Kidney 293 (HEK‐293) cell line at the proteome level. Shotgun proteome data published by Geiger et al. (2012), Chick et al. (2015), and obtained in this work for HEK‐293 were searched against the customized genomic database generated using exome data published by Lin et al. (2014). Overall, 112 unique variants were identified at the proteome level out of ~1200 coding variants annotated in the exome. Seven identified variants were shared between all the three considered proteomic datasets, and 27 variants were found in any two datasets. Some of the found variants belonged to widely known genomic polymorphisms originated from the germline, while the others were more likely resulting from somatic mutations. At least, eight of the proteins bearing amino acid variants were annotated as cancer‐related ones, including p53 tumor suppressor. In all the considered shotgun datasets, the variant peptides were at the ratio of 1:2.5 less likely being identified than the wild‐type ones compared with the corresponding theoretical peptides. This can be explained by the presence of the so‐called “passenger” mutations in the genes, which were never expressed in HEK‐293 cells. All MS data have been deposited in the ProteomeXchange with the dataset identifier PXD002613 ( http://proteomecentral.proteomexchange.org/dataset/PXD002613 ).  相似文献   

7.
The main goal of many proteomics experiments is an accurate and rapid quantification and identification of regulated proteins in complex biological samples. The bottleneck in quantitative proteomics remains the availability of efficient software to evaluate and quantify the tremendous amount of mass spectral data acquired during a proteomics project. A new software suite, ICPLQuant, has been developed to accurately quantify isotope‐coded protein label (ICPL)‐labeled peptides on the MS level during LC‐MALDI and peptide mass fingerprint experiments. The tool is able to generate a list of differentially regulated peptide precursors for subsequent MS/MS experiments, minimizing time‐consuming acquisition and interpretation of MS/MS data. ICPLQuant is based on two independent units. Unit 1 performs ICPL multiplex detection and quantification and proposes peptides to be identified by MS/MS. Unit 2 combines MASCOT MS/MS protein identification with the quantitative data and produces a protein/peptide list with all the relevant information accessible for further data mining. The accuracy of quantification, selection of peptides for MS/MS‐identification and the automated output of a protein list of regulated proteins are demonstrated by the comparative analysis of four different mixtures of three proteins (Ovalbumin, Horseradish Peroxidase and Rabbit Albumin) spiked into the complex protein background of the DGPF Proteome Marker.  相似文献   

8.
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.  相似文献   

9.
10.
The amount of sample available for clinical and biological proteomic research is often limited and thus significantly restricts clinical and translational research. Recently, we have integrated pressure cycling technology (PCT) assisted sample preparation and SWATH‐MS to perform reproducible proteomic quantification of biopsy‐level tissue samples. Here, we further evaluated the minimal sample requirement of the PCT‐SWATH method using various types of samples, including cultured cells (HeLa, K562, and U251, 500 000 to 50 000 cells) and tissue samples (mouse liver, heart, brain, and human kidney, 3–0.2 mg). The data show that as few as 50 000 human cells and 0.2–0.5 mg of wet mouse and human tissues produced peptide samples sufficient for multiple SWATH‐MS analyses at optimal sample load applied to the system. Generally, the reproducibility of the method increased with decreasing tissue sample amounts. The SWATH maps acquired from peptides derived from samples of varying sizes were essentially identical based on the number, type, and quantity of identified peptides. In conclusion, we determined the minimal sample required for optimal PCT‐SWATH analyses, and found smaller sample size achieved higher quantitative accuracy.  相似文献   

11.
Post‐translational modifications (PTMs) are critical regulators of protein function, and nearly 200 different types of PTM have been identified. Advances in high‐resolution mass spectrometry have led to the identification of an unprecedented number of PTM sites in numerous organisms, potentially facilitating a more complete understanding of how PTMs regulate cellular behavior. While databases have been created to house the resulting data, most of these resources focus on individual types of PTM, do not consider quantitative PTM analyses or do not provide tools for the visualization and analysis of PTM data. Here, we describe the Functional Analysis Tools for Post‐Translational Modifications (FAT‐PTM) database ( https://bioinformatics.cse.unr.edu/fat-ptm/ ), which currently supports eight different types of PTM and over 49 000 PTM sites identified in large‐scale proteomic surveys of the model organism Arabidopsis thaliana. The FAT‐PTM database currently supports tools to visualize protein‐centric PTM networks, quantitative phosphorylation site data from over 10 different quantitative phosphoproteomic studies, PTM information displayed in protein‐centric metabolic pathways and groups of proteins that are co‐modified by multiple PTMs. Overall, the FAT‐PTM database provides users with a robust platform to share and visualize experimentally supported PTM data, develop hypotheses related to target proteins or identify emergent patterns in PTM data for signaling and metabolic pathways.  相似文献   

12.
13.
14.
Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Although such analyses typically assume that a protein's peptide fragments are observed with equal likelihood, only a few so-called 'proteotypic' peptides are repeatedly and consistently identified for any given protein present in a mixture. Using >600,000 peptide identifications generated by four proteomic platforms, we empirically identified >16,000 proteotypic peptides for 4,030 distinct yeast proteins. Characteristic physicochemical properties of these peptides were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism, for a given platform, with >85% cumulative accuracy. Possible applications of proteotypic peptides include validation of protein identifications, absolute quantification of proteins, annotation of coding sequences in genomes, and characterization of the physical principles governing key elements of mass spectrometric workflows (e.g., digestion, chromatography, ionization and fragmentation).  相似文献   

15.
N‐succinimidyloxycarbonylmethyl tris(2,4,6‐trimethoxyphenyl) phosphonium bromide (TMPP‐Ac‐OSu) reacts rapidly, mildly, and specifically with the N‐terminals of proteins and peptides. Thus, it can be developed as an ideal isotope‐coded tag to be used in quantitative proteomics. Here, we present a strategy for light and heavy TMPP‐based quantitative proteomic analysis, in which peptides in a mixture can be quantified using an on‐tip TMPP derivatization approach. To demonstrate the accuracy of this strategy, light and heavy TMPP‐labeled peptides were combined at different ratios and subsequently analyzed by LC‐MS/MS. The MS spectra and scatter plots show that peptide and protein ratios were both consistent with the mixed ratios. We observed a linear correlation between protein ratios and the predicted ratios. In comparison with SILAC method, the TMPP labeling method produced similarly accurate quantitative results with low CVs. In conclusion, our results suggest that this isotope‐coded TMPP method achieved accurate quantification and compatibility with IEF‐based separation. With the inherent advantages of TMPP derivatization, we believe that it holds great promise for future applications in quantitative proteomics analysis.  相似文献   

16.
We report a global proteomic approach for analyzing brain tissue and for the first time a comprehensive characterization of the whole mouse brain proteome. Preparation of the whole brain sample incorporated a highly efficient cysteinyl-peptide enrichment (CPE) technique to complement a global enzymatic digestion method. Both the global and the cysteinyl-enriched peptide samples were analyzed by SCX fractionation coupled with reversed phase LC-MS/MS analysis. A total of 48,328 different peptides were confidently identified (>98% confidence level), covering 7792 nonredundant proteins ( approximately 34% of the predicted mouse proteome). A total of 1564 and 1859 proteins were identified exclusively from the cysteinyl-peptide and the global peptide samples, respectively, corresponding to 25% and 31% improvements in proteome coverage compared to analysis of only the global peptide or cysteinyl-peptide samples. The identified proteins provide a broad representation of the mouse proteome with little bias evident due to protein pI, molecular weight, and/or cellular localization. Approximately 26% of the identified proteins with gene ontology (GO) annotations were membrane proteins, with 1447 proteins predicted to have transmembrane domains, and many of the membrane proteins were found to be involved in transport and cell signaling. The MS/MS spectrum count information for the identified proteins was used to provide a measure of relative protein abundances. The mouse brain peptide/protein database generated from this study represents the most comprehensive proteome coverage for the mammalian brain to date, and the basis for future quantitative brain proteomic studies using mouse models. The proteomic approach presented here may have broad applications for rapid proteomic analyses of various mouse models of human brain diseases.  相似文献   

17.
The quantification of changes in protein abundance in complex biological specimens is essential for proteomic studies in basic and applied research. Here we report on the development and validation of the DeepQuanTR software for identification and quantification of differentially expressed proteins using LC‐MALDI‐MS. Following enzymatic digestion, HPLC peptide separation and normalization of MALDI‐MS signal intensities to the ones of internal standards, the software extracts peptide features, adjusts differences in HPLC retention times and performs a relative quantification of features. The annotation of multiple peptides to the corresponding parent protein allows the definition of a Protein Quant Value, which is related to protein abundance and which allows inter‐sample comparisons. The performance of DeepQuanTR was evaluated by analyzing 24 samples deriving from human serum spiked with different amounts of four proteins and eight complex samples of vascular proteins, derived from surgically resected human kidneys with cancer following ex vivo perfusion with a reactive ester biotin derivative. The identification and experimental validation of proteins, which were differentially regulated in cancerous lesions as compared with normal kidney, was used to demonstrate the power of DeepQuanTR. This software, which can easily be used with established proteomic methodologies, facilitates the relative quantification of proteins derived from a wide variety of different samples.  相似文献   

18.
19.
Progress in MS‐based methods for veterinary research and diagnostics is lagging behind compared to the human research, and proteome data of domestic animals is still not well represented in open source data repositories. This is particularly true for the equine species. Here we present a first Equine PeptideAtlas encompassing high‐resolution tandem MS analyses of 51 samples representing a selection of equine tissues and body fluids from healthy and diseased animals. The raw data were processed through the Trans‐Proteomic Pipeline to yield high quality identification of proteins and peptides. The current release comprises 24 131 distinct peptides representing 2636 canonical proteins observed at false discovery rates of 0.2% at the peptide level and 1.4% at the protein level. Data from the Equine PeptideAtlas are available for experimental planning, validation of new datasets, and as a proteomic data mining resource. The advantages of the Equine PeptideAtlas are demonstrated by examples of mining the contents for information on potential and well‐known equine acute phase proteins, which have extensive general interest in the veterinary clinic. The extracted information will support further analyses, and emphasizes the value of the Equine PeptideAtlas as a resource for the design of targeted quantitative proteomic studies.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号