首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When analyzing proteins in complex samples using tandem mass spectrometry of peptides generated by proteolysis, the inference of proteins can be ambiguous, even with well-validated peptides. Unresolved questions include whether to show all possible proteins vs a minimal list, what to do when proteins are inferred ambiguously, and how to quantify peptides that bridge multiple proteins, each with distinguishing evidence. Here we describe IsoformResolver, a peptide-centric protein inference algorithm that clusters proteins in two ways, one based on peptides experimentally identified from MS/MS spectra, and the other based on peptides derived from an in silico digest of the protein database. MS/MS-derived protein groups report minimal list proteins in the context of all possible proteins, without redundantly listing peptides. In silico-derived protein groups pull together functionally related proteins, providing stable identifiers. The peptide-centric grouping strategy used by IsoformResolver allows proteins to be displayed together when they share peptides in common, providing a comprehensive yet concise way to organize protein profiles. It also summarizes information on spectral counts and is especially useful for comparing results from multiple LC-MS/MS experiments. Finally, we examine the relatedness of proteins within IsoformResolver groups and compare its performance to other protein inference software.  相似文献   

2.
Currently the bottom up approach is the most popular for characterizing protein samples by mass spectrometry. This is mainly attributed to the fact that the bottom up approach has been successfully optimized for high throughput studies. However, the bottom up approach is associated with a number of challenges such as loss of linkage information between peptides. Previous publications have addressed some of these problems which are commonly referred to as protein inference. Nevertheless, all previous publications on the subject are oversimplified and do not represent the full complexity of the proteins identified. To this end we present here SIR (spectra based isoform resolver) that uses a novel transparent and systematic approach for organizing and presenting identified proteins based on peptide spectra assignments. The algorithm groups peptides and proteins into five evidence groups and calculates sixteen parameters for each identified protein that are useful for cases where deterministic protein inference is the goal. The novel approach has been incorporated into SIR which is a user-friendly tool only concerned with protein inference based on imports of Mascot search results. SIR has in addition two visualization tools that facilitate further exploration of the protein inference problem.  相似文献   

3.
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.  相似文献   

4.
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido.  相似文献   

5.
While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.  相似文献   

6.
M Blein-Nicolas  H Xu  D de Vienne  C Giraud  S Huet  M Zivy 《Proteomics》2012,12(18):2797-2801
Inferring protein abundances from peptide intensities is the key step in quantitative proteomics. The inference is necessarily more accurate when many peptides are taken into account for a given protein. Yet, the information brought by the peptides shared by different proteins is commonly discarded. We propose a statistical framework based on a hierarchical modeling to include that information. Our methodology, based on a simultaneous analysis of all the quantified peptides, handles the biological and technical errors as well as the peptide effect. In addition, we propose a practical implementation suitable for analyzing large data sets. Compared to a method based on the analysis of one protein at a time (that does not include shared peptides), our methodology proved to be far more reliable for estimating protein abundances and testing abundance changes. The source codes are available at http://pappso.inra.fr/bioinfo/all_p/.  相似文献   

7.
In complex systems with many degrees of freedom such as peptides and proteins, there exists a huge number of local-minimum-energy states. Conventional simulations in the canonical ensemble are of little use, because they tend to get trapped in states of these energy local minima. A simulation in generalized ensemble performs a random walk in potential energy space and can overcome this difficulty. From only one simulation run, one can obtain canonical-ensemble averages of physical quantities as functions of temperature by the single-histogram and/or multiple-histogram reweighting techniques. In this article we review uses of the generalized-ensemble algorithms in biomolecular systems. Three well-known methods, namely, multicanonical algorithm, simulated tempering, and replica-exchange method, are described first. Both Monte Carlo and molecular dynamics versions of the algorithms are given. We then present three new generalized-ensemble algorithms that combine the merits of the above methods. The effectiveness of the methods for molecular simulations in the protein folding problem is tested with short peptide systems.  相似文献   

8.
HLA-A2 is the most frequent HLA molecule in Caucasians with HLA-A*0201 representing the most frequent allele; it was also the first human HLA allele for which peptide binding prediction was developed. The Bioinformatics and Molecular Analysis Section of the National Institutes of Health (BIMAS) and the University of Tübingen (Syfpeithi) provide the most popular prediction algorithms of peptide/MHC interaction on the World Wide Web. To test these predictions, HLA-A*0201-binding nine-amino acid peptides were searched by both algorithms in 19 structural CMV proteins. According to Syfpeithi, the top 2% of predicted peptides should contain the naturally presented epitopes in 80% of predictions (www.syfpeithi.de). Because of the high number of predicted peptides, the analysis was limited to 10 randomly chosen proteins. The top 2% of peptides predicted by both algorithms were synthesized corresponding to 261 peptides in total. PBMC from 10 HLA-A*0201-positive and CMV-seropositive healthy blood donors were tested by ex vivo stimulation with all 261 peptides using crossover peptide pools. IFN-gamma production in T cells measured by CFC was used as readout. However, only one peptide was found to be stimulating in one single donor. As a result of this work, we report a potential new T cell target protein, one previously unknown CD8-T cell-stimulating peptide, and an extensive list of CMV-derived potentially strong HLA-A*0201-binding peptides that are not recognized by T cells of HLA-A*0201-positive CMV-seropositive donors. We conclude that MHC/peptide binding predictions are helpful for locating epitopes in known target proteins but not necessarily for screening epitopes in proteins not known to be T cell targets.  相似文献   

9.
Abstract Several approaches exist for the quantification of proteins in complex samples processed by liquid chromatography-mass spectrometry followed by fragmentation analysis (MS2). One of these approaches is label-free MS2-based quantification, which takes advantage of the information computed from MS2 spectrum observations to estimate the abundance of a protein in a sample. As a first step in this approach, fragmentation spectra are typically matched to the peptides that generated them by a search algorithm. Because different search algorithms identify overlapping but non-identical sets of peptides, here we investigate whether these differences in peptide identification have an impact on the quantification of the proteins in the sample. We therefore evaluated the effect of using different search algorithms by examining the reproducibility of protein quantification in technical repeat measurements of the same sample. From our results, it is clear that a search engine effect does exist for MS2-based label-free protein quantification methods. As a general conclusion, it is recommended to address the overall possibility of search engine-induced bias in the protein quantification results of label-free MS2-based methods by performing the analysis with two or more distinct search engines.  相似文献   

10.
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. AVAILABILITY: http://cytoprophet.cse.nd.edu.  相似文献   

11.
Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, na?ve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

12.
Single-cell microinjection has been successfully used to deliver exogenous proteins, cDNA constructs, peptides, drugs and particles into transfection-challenged cells. With precisely controlled delivery dosage and timing, microinjection has been used in many studies of primary cultured cells, transgenic animal production, in vitro fertilization and RNA inference. This review discusses the advantages and limits of microinjection as a mechanical delivery method and its applications to attached and suspended cells.  相似文献   

13.
鸟枪法蛋白质鉴定质量控制方法研究进展   总被引:1,自引:0,他引:1  
鸟枪法串联质谱蛋白质鉴定策略由于其高可靠和高效率而被广泛应用于蛋白质组学研究中,这种方法直接对蛋白质混合物进行酶切,以肽段为鉴定单元,继而推导真实的样品蛋白质.由于利用质谱图推导肽段存在一定的假阳性率,而且直接对蛋白质混合物的酶切也导致了肽段和蛋白质之间关联信息的丢失,所鉴定的蛋白质难免存在部分不可靠结果.因此,蛋白质鉴定的质量控制在蛋白质组学研究中极为重要.蛋白质鉴定的质量控制包含两大类主要方法,其一为利用肽段进行蛋白质组装,当前最常用也被证明最有效的方法是使用简约原则,即用最少的蛋白质解释所有鉴定肽段,现有的方法可以分为布尔型和概率型,其二为鉴定蛋白质的可靠性评估,包括单个蛋白质鉴定置信度和蛋白质鉴定整体水平的假阳性率计算.综合各种可辅助蛋白质鉴定的先验信息,构建普适的概率统计模型,是目前蛋白质鉴定质量控制方法的发展趋势.  相似文献   

14.
Peptide detectability is defined as the probability that a peptide is identified in an LC-MS/MS experiment and has been useful in providing solutions to protein inference and label-free quantification. Previously, predictors for peptide detectability trained on standard or complex samples were proposed. Although the models trained on complex samples may benefit from the large training data sets, it is unclear to what extent they are affected by the unequal abundances of identified proteins. To address this challenge and improve detectability prediction, we present a new algorithm for the iterative learning of peptide detectability from complex mixtures. We provide evidence that the new method approximates detectability with useful accuracy and, based on its design, can be used to interpret the outcome of other learning strategies. We studied the properties of peptides from the bacterium Deinococcus radiodurans and found that at standard quantities, its tryptic peptides can be roughly classified as either detectable or undetectable, with a relatively small fraction having medium detectability. We extend the concept of detectability from peptides to proteins and apply the model to predict the behavior of a replicate LC-MS/MS experiment from a single analysis. Finally, our study summarizes a theoretical framework for peptide/protein identification and label-free quantification.  相似文献   

15.
Cell surface display of proteins/peptides has been established based on mechanisms of localizing proteins to the cell surface. In contrast to conventional intracellular and extracellular (secretion) expression systems, this method, generally called an arming technology, is particularly effective when using yeasts as a host, because the control of protein folding that is often required for the preparation of proteins can be natural. This technology can be employed for basic and applied research purposes. In this review, I describe various strategies for the construction of engineered yeasts and provide an outline of the diverse applications of this technology to industrial processes such as the production of biofuels and chemicals, as well as bioremediation and health-related processes. Furthermore, this technology is suitable for novel protein engineering and directed evolution through high-throughput screening, because proteins/peptides displayed on the cell surface can be directly analyzed using intact cells without concentration and purification. Functional proteins/peptides with improved or novel functions can be created using this beneficial, powerful, and promising technique.  相似文献   

16.
Seed storage proteins are a major component of mature seeds. They are utilized as protein sources in foods. We designed seed storage proteins containing bioactive peptides based on their three-dimensional structures. Furthermore, to create crops with enhanced food qualities, we developed transgenic crops producing seed storage proteins with bioactive peptides. This strategy promises to prevent lifestyle-related diseases by simple daily food consumption. In this review, we discuss a strategy to develop transgenic crops to improve human health by advanced utilization of seed storage proteins.  相似文献   

17.
Generation and propagation of radical reactions on proteins   总被引:7,自引:0,他引:7  
The oxidation of proteins by free radicals is thought to play a major role in many oxidative processes within cells and is implicated in a number of human diseases as well as ageing. This review summarises information on the formation of radicals on peptides and proteins and how radical damage may be propagated and transferred within protein structures. The emphasis of this article is primarily on the deleterious actions of radicals generated on proteins, and their mechanisms of action, rather than on enzymatic systems where radicals are deliberately formed as transient intermediates. The final section of this review examines the control of protein oxidation and how such damage might be limited by antioxidants.  相似文献   

18.
Seed storage proteins are a major component of mature seeds. They are utilized as protein sources in foods. We designed seed storage proteins containing bioactive peptides based on their three-dimensional structures. Furthermore, to create crops with enhanced food qualities, we developed transgenic crops producing seed storage proteins with bioactive peptides. This strategy promises to prevent lifestyle-related diseases by simple daily food consumption. In this review, we discuss a strategy to develop transgenic crops to improve human health by advanced utilization of seed storage proteins.  相似文献   

19.
Convergent evolution with combinatorial peptides   总被引:1,自引:0,他引:1  
Once the sequence of a genome is in hand, understanding the function of its encoded proteins becomes a task of paramount importance. Much like the biochemists who first outlined different biochemical pathways, many genomic scientists are engaged in determining which proteins interact with which proteins, thereby establishing a protein interaction network. While these interactions have evolved in regard to their specificity, affinity and cellular function over billions of years, it is possible in the laboratory to isolate peptides from combinatorial libraries that bind to the same proteins with similar specificity, affinity and primary structures, which resemble those of the natural interacting proteins. We have termed this phenomenon 'convergent evolution'. In this review, we highlight various examples of convergent evolution that have been uncovered in experiments dissecting protein-protein interactions with combinatorial peptides. Thus, a fruitful approach for mapping protein-protein interactions is to isolate peptide ligands to a target protein and identify candidate interacting proteins in a sequenced genome by computer analysis.  相似文献   

20.
Choi H 《Proteomics》2012,12(10):1663-1668
Protein complex identification is an important goal of protein-protein interaction analysis. To date, development of computational methods for detecting protein complexes has been largely motivated by genome-scale interaction data sets from high-throughput assays such as yeast two-hybrid or tandem affinity purification coupled with mass spectrometry (TAP-MS). However, due to the popularity of small to intermediate-scale affinity purification-mass spectrometry (AP-MS) experiments, protein complex detection is increasingly discussed in local network analysis. In such data sets, protein complexes cannot be detected using binary interaction data alone because the data contain interactions with tagged proteins only and, as a result, interactions between all other proteins remain unobserved, limiting the scope of existing algorithms. In this article, we provide a pragmatic review of network graph-based computational algorithms for protein complex analysis in global interactome data, without requiring any computational background. We discuss the practical gap in applying these algorithms to recently surging small to intermediate-scale AP-MS data sets, and review alternative clustering algorithms using quantitative proteomics data and their limitations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号