首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When analyzing proteins in complex samples using tandem mass spectrometry of peptides generated by proteolysis, the inference of proteins can be ambiguous, even with well-validated peptides. Unresolved questions include whether to show all possible proteins vs a minimal list, what to do when proteins are inferred ambiguously, and how to quantify peptides that bridge multiple proteins, each with distinguishing evidence. Here we describe IsoformResolver, a peptide-centric protein inference algorithm that clusters proteins in two ways, one based on peptides experimentally identified from MS/MS spectra, and the other based on peptides derived from an in silico digest of the protein database. MS/MS-derived protein groups report minimal list proteins in the context of all possible proteins, without redundantly listing peptides. In silico-derived protein groups pull together functionally related proteins, providing stable identifiers. The peptide-centric grouping strategy used by IsoformResolver allows proteins to be displayed together when they share peptides in common, providing a comprehensive yet concise way to organize protein profiles. It also summarizes information on spectral counts and is especially useful for comparing results from multiple LC-MS/MS experiments. Finally, we examine the relatedness of proteins within IsoformResolver groups and compare its performance to other protein inference software.  相似文献   

2.
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Due to the existence of degenerate peptides and 'one-hit wonders', it is very difficult to determine which proteins are present in the sample. In this paper, we review existing protein inference methods and classify them according to the source of peptide identifications and the principle of algorithms. It is hoped that the readers will gain a good understanding of the current development in this field after reading this review and come up with new protein inference algorithms.  相似文献   

3.
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.  相似文献   

4.
Information about peptides and proteins in urine can be used to search for biomarkers of early stages of various diseases. The main technology currently used for identification of peptides and proteins is tandem mass spectrometry, in which peptides are identified by mass spectra of their fragmentation products. However, the presence of the fragmentation stage decreases sensitivity of analysis and increases its duration. We have developed a method for identification of human urinary proteins and peptides. This method based on the accurate mass and time tag (AMT) method does not use tandem mass spectrometry. The database of AMT tags containing more than 1381 AMT tags of peptides has been constructed. The software for database filling with AMT tags, normalizing the chromatograms, database application for identification of proteins and peptides, and their quantitative estimation has been developed. The new procedures for peptide identification by tandem mass spectra and the AMT tag database are proposed. The paper also lists novel proteins that have been identified in human urine for the first time.  相似文献   

5.
We have developed a new approach for the analysis of interacting interfaces in protein complexes and protein quaternary structure based on cross-linking in the solid state. Protein complexes are freeze-dried under vacuum, and cross-links are introduced in the solid phase by dehydrating the protein in a nonaqueous solvent creating peptide bonds between amino and carboxyl groups of the interacting peptides. Cross-linked proteins are digested into peptides with trypsin in both H2(16)O and H(2)18O and then readily distinguished in mass spectra by characteristic 8 atomic mass unit (amu) shifts reflecting incorporation of two 18O atoms into each C terminus of proteolytic peptides. Computer analysis of mass spectrometry (MS) and MS/MS data is used to identify the cross-linked peptides. We demonstrated specificity and reproducibility of our method by cross-linking homo-oligomeric protein complexes of glutathione-S-transferase (GST) from Schistosoma japonicum alone or in a mixture of many other proteins. Identified cross-links were predominantly of amide origin, but six esters and thioesters were also found. The cross-linked peptides were validated against the GST monomer and dimer X-ray structures and by experimental (MS/MS) analyses. Some of the identified cross-links matched interacting peptides in the native 3D structure of GST, indicating that the structure of GST and its oligomeric complex remained primarily intact after freeze-drying. The pattern of oligomeric GST obtained in solid state was the same as that obtained in solution by Ru (II) Bpy(3)2+ catalyzed, oxidative "zero-length" cross-linking, confirming that it is feasible to use our strategy for analyzing the molecular interfaces of interacting proteins or peptides.  相似文献   

6.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

7.
8.
Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy of molecular mass determination. However, the high complexity and dynamic ranges for any species of proteomic samples, surpass the separation capacity and detection power of the most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny portion of signals is selected for MS/MS experiments and a still considerable number of them do not provide reliable peptide identification. In this article, an in silico analysis for a novel methodology of peptides and proteins identification is described. The approach is based on mass accuracy, isoelectric point (pI), retention time (t(R)) and N-terminal amino acid determination as protein identification criteria regardless of high quality MS/MS spectra. When the methodology was combined with the selective isolation methods, the number of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the methodology, an OFFGEL-LC-MS/MS experiment was also implemented. We compared the more reliable peptide identified with MS/MS information, and peptide identified with three experimental features (pI, t(R), molecular mass). Also, two theoretical assumptions from MS/MS identification (selective isolation of peptides and N-terminal amino acid) were analyzed. Our results show that using the information provided by these features and selective isolation methods we could found the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than 5%.  相似文献   

9.
Though many software packages have been developed to perform label-free quantification of proteins in complex biological samples using peptide intensities generated by LC-MS/MS, two critical issues are generally ignored in this field: (i) peptides have multiple elution patterns across runs in an experiment, and (ii) many peptides cannot be used for protein quantification. To address these two key issues, we have developed a novel alignment method to enable accurate peptide peak retention time determination and multiple filters to eliminate unqualified peptides for protein quantification. Repeatability and linearity have been tested using six very different samples, i.e., standard peptides, kidney tissue lysates, HT29-MTX cell lysates, depleted human serum, human serum albumin-bound proteins, and standard proteins spiked in kidney tissue lysates. At least 90.8% of the proteins (up to 1,390) had CVs ≤ 30% across 10 technical replicates, and at least 93.6% (up to 2,013) had R(2) ≥ 0.9500 across 7 concentrations. Identical amounts of standard protein spiked in complex biological samples achieved a CV of 8.6% across eight injections of two groups. Further assessment was made by comparing mass spectrometric results to immunodetection, and consistent results were obtained. The new approach has novel and specific features enabling accurate label-free quantification.  相似文献   

10.
定量蛋白质组研究是蛋白质组研究的热点和难点,而液相色谱质谱技术已经被广泛地应用于蛋白质的定性和定量研究.该研究建立和优化了一种基于液相色谱质谱联用技术的蛋白质组非标记定量方法,并对两种肽段质谱检测计数的归一化算法进行了比较,结果发现ASC法要优于Rsc法.最后,将建立的方法应用于肝癌细胞模型HepG2和HepG2-HBx细胞系的差异蛋白质组表达研究.质谱鉴定结果用聚类分析软件cluster3.0进行分析,最后鉴定出107个重叠蛋白,其中9个蛋白质表达上调(Ratio>1.75),6个蛋白质表达下调(Ratio<0.5),这些蛋白质均与肝癌发生和恶化密切相关.结果表明,该技术操作简单、方便,具有较高的灵敏度和动态范围,利用该方法进行差异蛋白质组研究和发现生物标志物在理论和临床上具有十分重要的意义.  相似文献   

11.
Assessment of differential protein abundance from the observed properties of detected peptides is an essential part of protein profiling based on shotgun proteomics. However, the abundance observed for shared peptides may be due to contributions from multiple proteins that are affected differently by a given treatment. Excluding shared peptides eliminates this ambiguity but may significantly decrease the number of proteins for which abundance estimates can be obtained. Peptide sharing within a family of biologically related proteins does not cause ambiguity if family members have a common response to treatment. On the basis of this concept, we have developed an approach for including shared peptides in the analysis of differential protein abundance in protein profiling. Data from a recent proteomics study of lung tissue from mice exposed to lipopolysaccharide, cigarette smoke, and a combination of these agents are used to illustrate our method. Starting from data where about half of the implicated database protein involved shared peptides, 82% of the affected proteins were grouped into families, based on FASTA annotation, with closure on peptide sharing. In many cases, a common abundance relative to control was sufficient to explain ion-current peak areas for peptides, both unique and shared, that identified biologically related proteins in a peptide-sharing closure group. On the basis of these results, we propose that peptide-sharing closure groups provide a way to include abundance data for shared peptides in quantitative protein profiling by high-throughput mass spectrometry.  相似文献   

12.
Drugs that inhibit important protein-protein interactions are hard to find either by screening or rational design, at least so far. Most drugs on the market that target proteins today are therefore aimed at well-defined binding pockets in proteins. While computer-aided design is widely used to facilitate the drug discovery process for binding pockets, its application to the design of inhibitors that target the protein surface initially seems to be limited because of the increased complexity of the task. Previously, we had started to develop a computational combinatorial design approach based on the well-known 'multiple copy simultaneous search' (MCSS) procedure to tackle this problem. In order to identify sequence patterns of potential inhibitor peptides, a three-step procedure is employed: first, using MCSS, the locations of specific functional groups on the protein surface are identified; second, after constructing the peptide main chain based on the location of favorite locations of N-methylacetamide groups, functional groups corresponding to amino acid side chains are selected and connected to the main chain C(alpha) atoms; finally, the peptides generated in the second step are aligned and probabilities of amino acids at each position are calculated from the alignment scheme. Sequence patterns of potential inhibitors are determined based on the propensities of amino acids at each C(alpha) position. Here we report the optimization of inhibitor peptides using the sequence patterns determined by our method. Several short peptides derived from our prediction inhibit the Ras--Raf association in vitro in ELISA competition assays, radioassays and biosensor-based assays, demonstrating the feasibility of our approach. Consequently, our method provides an important step towards the development of novel anti-Ras agents and the structure-based design of inhibitors of protein--protein interactions.  相似文献   

13.
Site‐specific chemical cross‐linking in combination with mass spectrometry analysis has emerged as a powerful proteomic approach for studying the three‐dimensional structure of protein complexes and in mapping protein–protein interactions (PPIs). Building on the success of MS analysis of in vitro cross‐linked proteins, which has been widely used to investigate specific interactions of bait proteins and their targets in various organisms, we report a workflow for in vivo chemical cross‐linking and MS analysis in a multicellular eukaryote. This approach optimizes the in vivo protein cross‐linking conditions in Arabidopsis thaliana, establishes a MudPIT procedure for the enrichment of cross‐linked peptides, and develops an integrated software program, exhaustive cross‐linked peptides identification tool (ECL), to identify the MS spectra of in planta chemical cross‐linked peptides. In total, two pairs of in vivo cross‐linked peptides of high confidence have been identified from two independent biological replicates. This work demarks the beginning of an alternative proteomic approach in the study of in vivo protein tertiary structure and PPIs in multicellular eukaryotes.  相似文献   

14.
Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, na?ve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

15.
We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples.  相似文献   

16.
Peptide detectability is defined as the probability that a peptide is identified in an LC-MS/MS experiment and has been useful in providing solutions to protein inference and label-free quantification. Previously, predictors for peptide detectability trained on standard or complex samples were proposed. Although the models trained on complex samples may benefit from the large training data sets, it is unclear to what extent they are affected by the unequal abundances of identified proteins. To address this challenge and improve detectability prediction, we present a new algorithm for the iterative learning of peptide detectability from complex mixtures. We provide evidence that the new method approximates detectability with useful accuracy and, based on its design, can be used to interpret the outcome of other learning strategies. We studied the properties of peptides from the bacterium Deinococcus radiodurans and found that at standard quantities, its tryptic peptides can be roughly classified as either detectable or undetectable, with a relatively small fraction having medium detectability. We extend the concept of detectability from peptides to proteins and apply the model to predict the behavior of a replicate LC-MS/MS experiment from a single analysis. Finally, our study summarizes a theoretical framework for peptide/protein identification and label-free quantification.  相似文献   

17.
A concept of unique peptides(CUP)was proposed and implemented to identify whole-cell proteins from tandem mass spectrometry(MS/MS)ion spectra.A unique peptide is defined as a peptide,irrespective of its length,that exists only in one protein of a proteome of interest,despite the fact that this peptide may appear more than once in the same protein.Integrating CUP,a two-step whole-cell protein identification strategy was developed to further increase the confidence of identified proteins.A dataset containing 40,243 MS/MS ion spectra of Saccharomyces cerevisiae and protein identification tools including Mascot and SEQUEST were used to illustrate the proposed concept and strategy.Without implementing CUP,the proteins identified by SEQUEST are 2.26 fold of those identified by Mascot.When CUP was applied,the proteins bearing unique peptides identified by SEQUEST are3.89 fold of those identified by Mascot.By cross-comparing two sets of identified proteins,only 89 common proteins derived from CUP were found.The key discrepancy between identified proteins was resulted from the filtering criteria employed by each protein identification tool.According to the origin of peptides classified by CUP and the commonality of proteins recognized by protein identification tools,all identified proteins were cross-compared,resulting in four groups of proteins possessing different levels of assigned confidence.  相似文献   

18.
Among differential proteomic methods based on stable isotopic labeling, isotope‐coded protein labeling (ICPL) is a recent non‐isobaric technique devised to label primary amines found in proteins. ICPL overcomes some of the disadvantages found in other chemical‐labeling techniques, such as iTRAQ or ICAT. However, previous analyses revealed that more than 30% of the proteins identified in regular ICPL generally remain unquantified. In this study, we describe a modified version of ICPL, named Post‐digest ICPL, that makes it possible to label and thus to quantify all the peptides in a sample (bottom–up approach). Optimization and validation of this Post‐digest ICPL approach were performed using a standard protein mixture and complex protein samples. Using this strategy, the number of proteins that were identified and quantified was greatly increased in comparison with regular ICPL and cICAT approaches. The pros and cons of this improvement are discussed. This complementary approach to traditional ICPL was applied to the analysis of modification of protein abundances in the model bacterium Cupriavidus metallidurans CH34 after cultivation under simulated microgravity. In this context, two different systems – a 2‐D clinorotation and 3‐D random positioning device – were used and the results were compared and discussed.  相似文献   

19.
Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventional database search approach to protein identification has inherent difficulties in comprehensive detection of protein modifications. In recent years, increasing efforts have been devoted to developing unrestrictive approaches to modification identification, but they often suffer from their lack of speed. This paper presents a statistical algorithm named DeltAMT (Delta Accurate Mass and Time) for fast detection of abundant protein modifications from tandem mass spectra with high-accuracy precursor masses. The algorithm is based on the fact that the modified and unmodified versions of a peptide are usually present simultaneously in a sample and their spectra are correlated with each other in precursor masses and retention times. By representing each pair of spectra as a delta mass and time vector, bivariate Gaussian mixture models are used to detect modification-related spectral pairs. Unlike previous approaches to unrestrictive modification identification that mainly rely upon the fragment information and the mass dimension in liquid chromatography-tandem mass spectrometry, the proposed algorithm makes the most of precursor information. Thus, it is highly efficient while being accurate and sensitive. On two published data sets, the algorithm effectively detected various modifications and other interesting events, yielding deep insights into the data. Based on these discoveries, the spectral identification rates were significantly increased and many modified peptides were identified.  相似文献   

20.
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号