首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Due to the limited applicability of conventional protein identification methods to the proteomes of organisms with unsequenced genomes, researchers have developed approaches to identify proteins using mass spectrometry and sequence similarity database searches. Both the integration of mass spectrometry with bioinformatics and genomic sequencing drive the expanding organismal scope of proteomics.  相似文献   

2.
Identification of proteins from the mass spectra of peptide fragments generated by proteolytic cleavage using database searching has become one of the most powerful techniques in proteome science, capable of rapid and efficient protein identification. Using computer simulation, we have studied how the application of chemical derivatisation techniques may improve the efficiency of protein identification from mass spectrometric data. These approaches enhance ion yield and lead to the promotion of specific ions and fragments, yielding additional database search information. The impact of three alternative techniques has been assessed by searching representative proteome databases for both single proteins and simple protein mixtures. For example, by reliably promoting fragmentation of singly-charged peptide ions at aspartic acid residues after homoarginine derivatisation, 82% of yeast proteins can be unambiguously identified from a single typical peptide-mass datum, with a measured mass accuracy of 50 ppm, by using the associated secondary ion data. The extra search information also provides a means to confidently identify proteins in protein mixtures where only limited data are available. Furthermore, the inclusion of limited sequence information for the peptides can compensate and exceed the search efficiency available via high accuracy searches of around 5 ppm, suggesting that this is a potentially useful approach for simple protein mixtures routinely obtained from two-dimensional gels.  相似文献   

3.
氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础.  相似文献   

4.
Lack of genomic sequence data and the relatively high cost of tandem mass spectrometry have hampered proteomic investigations into helminths, such as resolving the mechanism underpinning globally reported anthelmintic resistance. Whilst detailed mechanisms of resistance remain unknown for the majority of drug-parasite interactions, gene mutations and changes in gene and protein expression are proposed key aspects of resistance. Comparative proteomic analysis of drug-resistant and -susceptible nematodes may reveal protein profiles reflecting drug-related phenotypes. Using the gastro-intestinal nematode, Haemonchus contortus as case study, we report the application of freely available expressed sequence tag (EST) datasets to support proteomic studies in unsequenced nematodes. EST datasets were translated to theoretical protein sequences to generate a searchable database. In conjunction with matrix-assisted laser desorption ionisation time-of-flight mass spectrometry (MALDI-TOF-MS), Peptide Mass Fingerprint (PMF) searching of databases enabled a cost-effective protein identification strategy. The effectiveness of this approach was verified in comparison with MS/MS de novo sequencing with searching of the same EST protein database and subsequent searches of the NCBInr protein database using the Basic Local Alignment Search Tool (BLAST) to provide protein annotation. Of 100 proteins from 2-DE gel spots, 62 were identified by MALDI-TOF-MS and PMF searching of the EST database. Twenty randomly selected spots were analysed by electrospray MS/MS and MASCOT Ion Searches of the same database. The resulting sequences were subjected to BLAST searches of the NCBI protein database to provide annotation of the proteins and confirm concordance in protein identity from both approaches. Further confirmation of protein identifications from the MS/MS data were obtained by de novo sequencing of peptides, followed by FASTS algorithm searches of the EST putative protein database. This study demonstrates the cost-effective use of available EST databases and inexpensive, accessible MALDI-TOF MS in conjunction with PMF for reliable protein identification in unsequenced organisms.  相似文献   

5.
Strategic proteome analysis of Candida magnoliae with an unsequenced genome   总被引:2,自引:0,他引:2  
Kim HJ  Lee DY  Lee DH  Park YC  Kweon DH  Ryu YW  Seo JH 《Proteomics》2004,4(11):3588-3599
Erythritol is a noncariogenic, low calorie sweetener. It is safe for people with diabetes and obese people. Candida magnoliae is an industrially important organism because of its ability to produce erythritol as a major product. The genome of C. magnoliae has not been sequenced yet, limiting the available proteome database. Therefore, systematic approaches were employed to construct the proteome map of C. magnoliae. Proteomic analysis with systematic approaches is based on two-dimensional electrophoresis, matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS), tandem mass spectrometry (MS/MS) and database interrogation. First, 24 spots were analyzed using peptide mass fingerprinting along with MALDI-TOF MS with high mass accuracy. Only four spots were reliably identified as carbonyl reductase and its isoforms. The reason for low sequence coverage seemed to be that these identification strategies were based on the presence of the protein database obtained from the publicly accessible genome database and the availability of cross-species protein identification. MS/MS (MS/MS ion search and de novo sequencing) in combination with similarity searches allowed successful identification of 39 spots. Several proteins including transaldolase identified by MS/MS ion searches were further confirmed by partial sequences from the expressed sequence tag database. In this study, 51 protein spots were analyzed and then potentially identified. The identified proteins were involved in glycolysis, stress response, other essential metabolisms and cell structures.  相似文献   

6.
Analysing proteomic data   总被引:5,自引:0,他引:5  
The rapid growth of proteomics has been made possible by the development of reproducible 2D gels and biological mass spectrometry. However, despite technical improvements 2D gels are still less than perfectly reproducible and gels have to be aligned so spots for identical proteins appear in the same place. Gels can be warped by a variety of techniques to make them concordant. When gels are manipulated to improve registration, information is lost, so direct methods for gel registration which make use of all available data for spot matching are preferable to indirect ones. In order to identify proteins from gel spots a property or combination of properties that are unique to that protein are required. These can then be used to search databases for possible matches. Molecular mass, pI, amino acid composition and short sequence tags can all be used in database searches. Currently the method of choice for protein identification is mass spectrometry. Proteins are eluted from the gels and cleaved with specific endoproteases to produce a series of peptides of different molecular mass. In peptide mass fingerprinting, the peptide profile of the unknown protein is compared with theoretical peptide libraries generated from sequences in the different databases. Tandem mass spectroscopy (MS/MS) generates short amino acid sequence tags for the individual peptides. These partial sequences combined with the original peptide masses are then used for database searching, greatly improving specificity. Increasingly protein identification from MS/MS data is being fully or partially automated. When working with organisms, which do not have sequenced genomes (the case with most helminths), protein identification by database searching becomes problematical. A number of approaches to cross species protein identification have been suggested, but if the organism being studied is only distantly related to any organism with a sequenced genome then the likelihood of protein identification remains small. The dynamic nature of the proteome means that there really is no such thing as a single representative proteome and a complete set of metadata (data about the data) is going to be required if the full potential of database mining is to be realised in the future.  相似文献   

7.
Proteomic approaches to biological research that will prove the most useful and productive require robust, sensitive, and reproducible technologies for both the qualitative and quantitative analysis of complex protein mixtures. Here we applied the isotope-coded affinity tag (ICAT) approach to quantitative protein profiling, in this case proteins that copurified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. With the ICAT approach, cysteine residues of the two related protein isolates were covalently labeled with isotopically normal and heavy versions of the same reagent, respectively. Following proteolytic cleavage of combined labeled proteins, peptides were fractionated by multidimensional chromatography and subsequently analyzed via automated tandem mass spectrometry. Individual tandem mass spectrometry spectra were searched against a human sequence database, and a variety of recently developed, publicly available software applications were used to sort, filter, analyze, and compare the results of two repetitions of the same experiment. In particular, robust statistical modeling algorithms were used to assign measures of confidence to both peptide sequences and the proteins from which they were likely derived, identified via the database searches. We show that by applying such statistical tools to the identification of T cell lipid raft-associated proteins, we were able to estimate the accuracy of peptide and protein identifications made. These tools also allow for determination of the false positive rate as a function of user-defined data filtering parameters, thus giving the user significant control over and information about the final output of large-scale proteomic experiments. With the ability to assign probabilities to all identifications, the need for manual verification of results is substantially reduced, thus making the rapid evaluation of large proteomic datasets possible. Finally, by repeating the experiment, information relating to the general reproducibility and validity of this approach to large-scale proteomic analyses was also obtained.  相似文献   

8.
Proteomics: a technology-driven and technology-limited discovery science   总被引:9,自引:0,他引:9  
An emerging field for the analysis of biological systems is the study of the complete protein complement of the genome, the 'proteome'. There are several complementary tools available for proteome analysis including 2D protein electrophoresis and mass spectrometry. Emerging technologies for proteome analysis include spotted-array-based methods and microfluidic devices. Taken together, these technologies provide a wealth of information that is useful in discovery-based science. However, there are some key limitations of these approaches and new technology is required to be able to fully integrate proteomic information with information obtained about DNA sequence, mRNA profiles and metabolite concentrations into effective models of biological systems.  相似文献   

9.
Protein identification using 2D-LC-MS/MS   总被引:3,自引:0,他引:3  
Multidimensional liquid chromatography techniques have been coupled to tandem mass spectrometry to provide a robust method to identify proteins in complex mixtures. Data acquisition is interfaced directly with search algorithms for identification through cross-correlation with databases. This review describes the most recent advances in methodologies for protein identification by mass spectrometry and describes the limitations of the application of the technologies.  相似文献   

10.

Background  

Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on de novo sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified.  相似文献   

11.
Distance constraints in proteins and protein complexes provide invaluable information for calculation of 3D structures, identification of protein binding partners and localization of protein-protein contact sites. We have developed an integrative approach to identify and characterize such sites through the analysis of proteolytic products derived from proteins chemically cross-linked by isotopically coded cross-linkers using LC-MALDI tandem mass spectrometry and computer software. This method is specifically tailored toward the rapid analysis of low microgram amounts of proteins or multimeric protein complexes cross-linked with nonlabeled and deuterium-labeled bis-NHS ester cross-linking reagents (both commercially available and readily synthesized). Through labeling with [18O]water solvent and LC-MALDI analysis, the method further allows the possible distinction between Type 0 and Type 1 or Type 2 modified peptides (monolinks and looplinks or cross-links), although such a distinction is more readily made from analysis of tandem mass spectrometry data. When applied to the bacterial Colicin E7 DNAse/Im7 heterodimeric protein complex, 23 cross-links were identified including six intersubunit cross-links, all between residues that are close in space when examined in the context of the X-ray structure of the heterodimer. In addition, cross-links were successfully identified in five single subunit proteins, beta-lactoglobulin, cytochrome c, lysozyme, myoglobin, and ribonuclease A, establishing the generality of the approach.  相似文献   

12.
Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

13.
宏蛋白质组学是一门新型科学,它运用质谱技术规模化地采集自然界微生物种群的蛋白质信息,并结合多种组学数据,开展微生物种群的遗传特征及其生物功能的研究.宏蛋白质组学的信息分析与传统蛋白质组学方法有较大的不同,亟需拓展新的分析思路.由于宏蛋白质组的研究对象是复杂度极高的微生物样品,因此,需要构建尽可能囊括样本中所含微生物的基因组信息的物种数据库.面对庞大的数据库,必须考虑到分析过程中所消耗的计算资源和鉴定结果的质控标准,因此,需要高度优化库容量、搜库、假阳性控制等参数.鉴于宏蛋白质组数据中广泛存在复杂的同源蛋白质序列,因此,需要充分利用NCBI数据库中的分类信息进行匹配,并运用LCA算法过滤处理才能将蛋白质有效地归组到物种.本文立足于宏蛋白质组学信息分析,从宏蛋白质组的数据库建立、蛋白质归并、生物学意义发掘等几个方面着手,对该领域的发展现状、面临挑战以及未来研究方向进行了评述.  相似文献   

14.
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.  相似文献   

15.
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.  相似文献   

16.
It is now possible to obtain sequence information from gel-separated proteins by mass spectrometry at levels too low for conventional approaches. Usually this tandem mass spectrometric data are used for database searches with the aim of identifying the corresponding gene. Recently it has been shown that long and accurate amino acid sequences can be obtained which are sufficient for PCR-based strategies to clone the corresponding gene [Wilm et al. (1996), Nature 379, 466–469]. More than eight proteins have now been cloned based on that method. In many more cases the sequence information identified homologous proteins. Issues involved in cloning by mass spectrometric sequence information are discussed, as are two case studies. These results clearly establish mass spectrometry as a viable tool not only for the database identification of proteins, but also for the de novo sequencing of gel-separated proteins at the low-picomole to femtomole level.  相似文献   

17.
We demonstrate a new approach to the determination of amino acid composition from tandem mass spectrometrically fragmented peptides using both experimental and simulated data. The approach has been developed to be used as a search-space filter in a protein identification pipeline with the aim of increased performance above that which could be attained by using immonium ion information. Three automated methods have been developed and tested: one based upon a simple peak traversal, in which all intense ion peaks are treated as being either a b- or y-ion using a wide mass tolerance; a second which uses a much narrower tolerance and does not perform transformations of ion peaks to the complementary type; and the unique fragments method which allows for b- or y-ion type to be inferred and corroborated using a scan of the other ions present in each peptide spectrum. The combination of these methods is shown to provide a high-accuracy set of amino acid predictions using both experimental and simulated data sets. These high quality predictions, with an accuracy of over 85%, may be used to identify peptide fragments that are hard to identify using other methods. The data simulation algorithm is also shown post priori to be a good model of noiseless tandem mass spectrometric peptide data.  相似文献   

18.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

19.
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.  相似文献   

20.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号