首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础.  相似文献   

2.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

3.
Delahunty CM  Yates JR 《BioTechniques》2007,43(5):563, 565, 567 passim
Large-scale biology emerged out of the efforts to sequence genomes of important organisms. Based on resources created by whole genome sequencing, large-scale analyses of messenger RNA (mRNA) and protein expression are now possible. With the availability of large amounts of genomic sequence information, a convenient method for the identification and analysis of proteins based on proteolytic digestion into peptides emerged. Processes to fragment peptides using collision-activated dissociation (CAD) in tandem mass spectrometers and computer algorithms to match the tandem mass spectra of peptides to sequences in databases enable rapid identification of amino acid sequences, and hence proteins, present in mixtures. The inherent complexity of the peptide mixtures has necessitated improvements in methodology for mass spectrometry (MS) analysis of peptides.  相似文献   

4.
Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

5.
DBToolkit: processing protein databases for peptide-centric proteomics   总被引:2,自引:0,他引:2  
SUMMARY: DBToolkit is a user-friendly, easily extensible tool that allows the processing of protein sequence databases to peptide-centric sequence databases. This processing is primarily aimed at enhancing the useful information content of these databases for use as optimized search spaces for efficient identification of peptide fragmentation spectra obtained by mass spectrometry. In addition, DBToolkit can be used to reliably solve a range of other typical tasks in processing sequence databases. AVAILABILITY: DBToolkit is open source under the GNU GPL license. The source code, full user and developer documentation and cross-platform binaries are freely downloadable from the project website at http://genesis.UGent.be/dbtoolkit/ CONTACT: lennart.martens@UGent.be  相似文献   

6.
Exploring the proteome of Plasmodium   总被引:2,自引:0,他引:2  
With the entire genomic sequence of several species of Plasmodium soon to be available, researchers are now focusing on methods to study gene and protein expression at the whole organism level. Traditional methods of characterising and identifying large numbers of proteins from a complex protein mixture have relied predominantly on two-dimensional gel electrophoresis combined with N-terminal sequencing or mass spectrometry of individually prepared proteins. New proteomics methods are now available that are based on resolving small peptides derived from complex protein mixtures by high-resolution liquid chromatography and directly identifying them by tandem mass spectrometry (LC/LC/MS/MS) and sophisticated computer search algorithms against whole genome sequence databases. These newer proteomic methods have the potential to accelerate the reproducible identification of large numbers of proteins from various life cycle stages of Plasmodium and may help to better understand parasite biology and lead to the identification of new targets of vaccines and drugs.  相似文献   

7.
Identification of proteins from the mass spectra of peptide fragments generated by proteolytic cleavage using database searching has become one of the most powerful techniques in proteome science, capable of rapid and efficient protein identification. Using computer simulation, we have studied how the application of chemical derivatisation techniques may improve the efficiency of protein identification from mass spectrometric data. These approaches enhance ion yield and lead to the promotion of specific ions and fragments, yielding additional database search information. The impact of three alternative techniques has been assessed by searching representative proteome databases for both single proteins and simple protein mixtures. For example, by reliably promoting fragmentation of singly-charged peptide ions at aspartic acid residues after homoarginine derivatisation, 82% of yeast proteins can be unambiguously identified from a single typical peptide-mass datum, with a measured mass accuracy of 50 ppm, by using the associated secondary ion data. The extra search information also provides a means to confidently identify proteins in protein mixtures where only limited data are available. Furthermore, the inclusion of limited sequence information for the peptides can compensate and exceed the search efficiency available via high accuracy searches of around 5 ppm, suggesting that this is a potentially useful approach for simple protein mixtures routinely obtained from two-dimensional gels.  相似文献   

8.
MALDI-TOF质谱源后衰变技术鉴定2D胶蛋白点   总被引:1,自引:0,他引:1  
PMF方法由于具有高灵敏度、高通量和容易自动化等优点,在蛋白质组学鉴定中占有重要的地位。然而,许多样品(比如:小分子蛋白,混合物等)仅仅通过PMF方法不能明确鉴定。在这种情况下,在测定PMF的同一个样品上,选择一个酶解片段峰进行PSD测序,并把这些序列信息输入MS—Tag软件进行搜索,结合PMF方法,表观分子量等电点等参数,能够对胶上的点进行明确的鉴定。本文先用PSD方法对胶上的三个标准蛋白进行鉴定,都得到了非常准确的结果,同时鉴定了胶上的几个未知点。  相似文献   

9.
UniProt archive     
UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.  相似文献   

10.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

11.
Proteomic identifications hinge on the measurement of both parent and fragment masses and matching these to amino acid sequences via database search engines. The correctness of the identifications is assessed by statistical means. Here we present an experimental approach to test identifications. Chemical modification of all peptides in a sample leads to shifts in masses depending on the chemical properties of each peptide. The identification of a native peptide sequence and its perturbed version with a different parent mass and fragment ion masses provides valuable information. Labeling all peptides using reductive alkylation with formaldehyde is one such perturbation where the ensemble of peptides shifts mass depending on the number of reactive amine groups. Matching covalently perturbed fragmentation patterns from the same underlying peptide sequence increases confidence in the assignments and can salvage low scoring post‐translationally modified peptides. Applying this strategy to bovine alpha‐crystallin, we identify 9 lysine acetylation sites, 4 O‐GlcNAc sites and 13 phosphorylation sites.  相似文献   

12.
We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z values. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2 to 3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.  相似文献   

13.
De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.  相似文献   

14.
Peptide mass fingerprinting (PMF) is a valuable method for rapid and high-throughput protein identification using the proteomics approach. Automated search engines, such as Ms-Fit, Mascot, ProFound, and Peptldent, have facilitated protein identification through PMF. The potential to obtain a true MS protein identification result depends on the choice of algorithm as well as experimental factors that influence the information content in MS data. When mass spectral data are incomplete and/or have low mass accuracy, the “number of matches” approach may be inadequate for a useful identification. Several studies have evaluated factors influencing the quality of mass spectrometry (MS) experiments. Missed cleavages, posttranslational modifications of peptides and contaminants (e.g., keratin) are important factors that can affect the results of MS analyses by influencing the identification process as well as the quality of the MS spectra. We compared search engines frequently used to identify proteins fromHomo sapiens andHalobacterium salinarum by evaluating factors, including data-based and mass tolerance to develop an improved search engine for PMF. This study may provide information to help develop a more effective algorithm for protein identification in each species through PMF.  相似文献   

15.
In mass spectrometry‐based proteomics, most conventional search engines match spectral data to sequence databases. These search databases thus play a crucial role in the identification process. While search engines can derive peptides in silico from protein sequences, this is usually limited to standard digestion algorithms. Customized search databases that provide detailed control over the search space can vastly outperform such standard strategies, especially in gel‐free proteomics experiments. Here we present Database on Demand, an easy‐to‐use web tool that can quickly produce a wide variety of customized search databases.  相似文献   

16.
Informatics for protein identification by mass spectrometry   总被引:3,自引:0,他引:3  
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche.  相似文献   

17.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

18.
Secondary Ion Mass Spectrometry (SIMS) is a well established method for sensitive surface atomic and molecular analysis. Protein analysis with conventional SIMS has been attempted numerous times; however it delivers exclusively fragment peaks assigned to α-amino acids or immonium ions. In this paper we report experiments where direct sequence information could be measured thanks to a combination of HPLC separation with matrix enhanced SIMS (ME-SIMS) on tryptic digests of intact proteins. We employ peptide mass fingerprinting (PMF) and protein identification through the detection of HPLC-separated digests of Savinase (Sav.) and bovine serum albumin (BSA), followed by MASCOT search. This is the first time that the possibility of full protein identification using LC-ME-SIMS is demonstrated in a classic proteomics workflow and that a 69kDa protein is identified with SIMS. These results demonstrate both the relevance and the potential of LC-ME-SIMS in future high resolution proteomics studies.  相似文献   

19.
Post‐translational modifications (PTMs) of proteins are central in any kind of cellular signaling. Modern mass spectrometry technologies enable comprehensive identification and quantification of various PTMs. Given the increased numbers and types of mapped protein modifications, a database is necessary that simultaneously integrates and compares site‐specific information for different PTMs, especially in plants for which the available PTM data are poorly catalogued. Here, we present the Plant PTM Viewer (http://www.psb.ugent.be/PlantPTMViewer), an integrative PTM resource that comprises approximately 370 000 PTM sites for 19 types of protein modifications in plant proteins from five different species. The Plant PTM Viewer provides the user with a protein sequence overview in which the experimentally evidenced PTMs are highlighted together with an estimate of the confidence by which the modified peptides and, if possible, the actual modification sites were identified and with functional protein domains or active site residues. The PTM sequence search tool can query PTM combinations in specific protein sequences, whereas the PTM BLAST tool searches for modified protein sequences to detect conserved PTMs in homologous sequences. Taken together, these tools help to assume the role and potential interplay of PTMs in specific proteins or within a broader systems biology context. The Plant PTM Viewer is an open repository that allows the submission of mass spectrometry‐based PTM data to remain at pace with future PTM plant studies.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号