期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Expanding the organismal scope of proteomics: cross-species protein identification by mass spectrometry and its implications

Liska AJ Shevchenko A 《Proteomics》2003,3(1):19-28

Due to the limited applicability of conventional protein identification methods to the proteomes of organisms with unsequenced genomes, researchers have developed approaches to identify proteins using mass spectrometry and sequence similarity database searches. Both the integration of mass spectrometry with bioinformatics and genomic sequencing drive the expanding organismal scope of proteomics. 相似文献

2.

Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome database searching

Sidhu KS Sangvanich P Brancia FL Sullivan AG Gaskell SJ Wolkenhaue O Oliver SG Hubbard SJ 《Proteomics》2001,1(11):1368-1377

Identification of proteins from the mass spectra of peptide fragments generated by proteolytic cleavage using database searching has become one of the most powerful techniques in proteome science, capable of rapid and efficient protein identification. Using computer simulation, we have studied how the application of chemical derivatisation techniques may improve the efficiency of protein identification from mass spectrometric data. These approaches enhance ion yield and lead to the promotion of specific ions and fragments, yielding additional database search information. The impact of three alternative techniques has been assessed by searching representative proteome databases for both single proteins and simple protein mixtures. For example, by reliably promoting fragmentation of singly-charged peptide ions at aspartic acid residues after homoarginine derivatisation, 82% of yeast proteins can be unambiguously identified from a single typical peptide-mass datum, with a measured mass accuracy of 50 ppm, by using the associated secondary ion data. The extra search information also provides a means to confidently identify proteins in protein mixtures where only limited data are available. Furthermore, the inclusion of limited sequence information for the peptides can compensate and exceed the search efficiency available via high accuracy searches of around 5 ppm, suggesting that this is a potentially useful approach for simple protein mixtures routinely obtained from two-dimensional gels. 相似文献

3.

利用串联质谱鉴定氨基酸突变的生物信息学算法

余庆吴松锋马洁朱云平舒坤贤《中国科学:生命科学》2014,(11):1113-1124

氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础. 相似文献

4.

Proteomic profiling and protein identification by MALDI-TOF mass spectrometry in unsequenced parasitic nematodes

Millares P Lacourse EJ Perally S Ward DA Prescott MC Hodgkinson JE Brophy PM Rees HH 《PloS one》2012,7(3):e33590

Lack of genomic sequence data and the relatively high cost of tandem mass spectrometry have hampered proteomic investigations into helminths, such as resolving the mechanism underpinning globally reported anthelmintic resistance. Whilst detailed mechanisms of resistance remain unknown for the majority of drug-parasite interactions, gene mutations and changes in gene and protein expression are proposed key aspects of resistance. Comparative proteomic analysis of drug-resistant and -susceptible nematodes may reveal protein profiles reflecting drug-related phenotypes. Using the gastro-intestinal nematode, Haemonchus contortus as case study, we report the application of freely available expressed sequence tag (EST) datasets to support proteomic studies in unsequenced nematodes. EST datasets were translated to theoretical protein sequences to generate a searchable database. In conjunction with matrix-assisted laser desorption ionisation time-of-flight mass spectrometry (MALDI-TOF-MS), Peptide Mass Fingerprint (PMF) searching of databases enabled a cost-effective protein identification strategy. The effectiveness of this approach was verified in comparison with MS/MS de novo sequencing with searching of the same EST protein database and subsequent searches of the NCBInr protein database using the Basic Local Alignment Search Tool (BLAST) to provide protein annotation. Of 100 proteins from 2-DE gel spots, 62 were identified by MALDI-TOF-MS and PMF searching of the EST database. Twenty randomly selected spots were analysed by electrospray MS/MS and MASCOT Ion Searches of the same database. The resulting sequences were subjected to BLAST searches of the NCBI protein database to provide annotation of the proteins and confirm concordance in protein identity from both approaches. Further confirmation of protein identifications from the MS/MS data were obtained by de novo sequencing of peptides, followed by FASTS algorithm searches of the EST putative protein database. This study demonstrates the cost-effective use of available EST databases and inexpensive, accessible MALDI-TOF MS in conjunction with PMF for reliable protein identification in unsequenced organisms. 相似文献

5.

Strategic proteome analysis of Candida magnoliae with an unsequenced genome 总被引：2，自引：0，他引：2

Kim HJ Lee DY Lee DH Park YC Kweon DH Ryu YW Seo JH 《Proteomics》2004,4(11):3588-3599

Erythritol is a noncariogenic, low calorie sweetener. It is safe for people with diabetes and obese people. Candida magnoliae is an industrially important organism because of its ability to produce erythritol as a major product. The genome of C. magnoliae has not been sequenced yet, limiting the available proteome database. Therefore, systematic approaches were employed to construct the proteome map of C. magnoliae. Proteomic analysis with systematic approaches is based on two-dimensional electrophoresis, matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS), tandem mass spectrometry (MS/MS) and database interrogation. First, 24 spots were analyzed using peptide mass fingerprinting along with MALDI-TOF MS with high mass accuracy. Only four spots were reliably identified as carbonyl reductase and its isoforms. The reason for low sequence coverage seemed to be that these identification strategies were based on the presence of the protein database obtained from the publicly accessible genome database and the availability of cross-species protein identification. MS/MS (MS/MS ion search and de novo sequencing) in combination with similarity searches allowed successful identification of 39 spots. Several proteins including transaldolase identified by MS/MS ion searches were further confirmed by partial sequences from the expressed sequence tag database. In this study, 51 protein spots were analyzed and then potentially identified. The identified proteins were involved in glycolysis, stress response, other essential metabolisms and cell structures. 相似文献

6.

Analysing proteomic data 总被引：5，自引：0，他引：5

Barrett J Brophy PM Hamilton JV 《International journal for parasitology》2005,35(5):543-553

The rapid growth of proteomics has been made possible by the development of reproducible 2D gels and biological mass spectrometry. However, despite technical improvements 2D gels are still less than perfectly reproducible and gels have to be aligned so spots for identical proteins appear in the same place. Gels can be warped by a variety of techniques to make them concordant. When gels are manipulated to improve registration, information is lost, so direct methods for gel registration which make use of all available data for spot matching are preferable to indirect ones. In order to identify proteins from gel spots a property or combination of properties that are unique to that protein are required. These can then be used to search databases for possible matches. Molecular mass, pI, amino acid composition and short sequence tags can all be used in database searches. Currently the method of choice for protein identification is mass spectrometry. Proteins are eluted from the gels and cleaved with specific endoproteases to produce a series of peptides of different molecular mass. In peptide mass fingerprinting, the peptide profile of the unknown protein is compared with theoretical peptide libraries generated from sequences in the different databases. Tandem mass spectroscopy (MS/MS) generates short amino acid sequence tags for the individual peptides. These partial sequences combined with the original peptide masses are then used for database searching, greatly improving specificity. Increasingly protein identification from MS/MS data is being fully or partially automated. When working with organisms, which do not have sequenced genomes (the case with most helminths), protein identification by database searching becomes problematical. A number of approaches to cross species protein identification have been suggested, but if the organism being studied is only distantly related to any organism with a sequenced genome then the likelihood of protein identification remains small. The dynamic nature of the proteome means that there really is no such thing as a single representative proteome and a complete set of metadata (data about the data) is going to be required if the full potential of database mining is to be realised in the future. 相似文献

7.

The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis,and the application of statistical tools for data analysis and interpretation

von Haller PD Yi E Donohoe S Vaughn K Keller A Nesvizhskii AI Eng J Li XJ Goodlett DR Aebersold R Watts JD 《Molecular & cellular proteomics : MCP》2003,2(7):428-442

Proteomic approaches to biological research that will prove the most useful and productive require robust, sensitive, and reproducible technologies for both the qualitative and quantitative analysis of complex protein mixtures. Here we applied the isotope-coded affinity tag (ICAT) approach to quantitative protein profiling, in this case proteins that copurified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. With the ICAT approach, cysteine residues of the two related protein isolates were covalently labeled with isotopically normal and heavy versions of the same reagent, respectively. Following proteolytic cleavage of combined labeled proteins, peptides were fractionated by multidimensional chromatography and subsequently analyzed via automated tandem mass spectrometry. Individual tandem mass spectrometry spectra were searched against a human sequence database, and a variety of recently developed, publicly available software applications were used to sort, filter, analyze, and compare the results of two repetitions of the same experiment. In particular, robust statistical modeling algorithms were used to assign measures of confidence to both peptide sequences and the proteins from which they were likely derived, identified via the database searches. We show that by applying such statistical tools to the identification of T cell lipid raft-associated proteins, we were able to estimate the accuracy of peptide and protein identifications made. These tools also allow for determination of the false positive rate as a function of user-defined data filtering parameters, thus giving the user significant control over and information about the final output of large-scale proteomic experiments. With the ability to assign probabilities to all identifications, the need for manual verification of results is substantially reduced, thus making the rapid evaluation of large proteomic datasets possible. Finally, by repeating the experiment, information relating to the general reproducibility and validity of this approach to large-scale proteomic analyses was also obtained. 相似文献

8.

Proteomics: a technology-driven and technology-limited discovery science 总被引：9，自引：0，他引：9

Lee KH 《Trends in biotechnology》2001,19(6):217-222

An emerging field for the analysis of biological systems is the study of the complete protein complement of the genome, the 'proteome'. There are several complementary tools available for proteome analysis including 2D protein electrophoresis and mass spectrometry. Emerging technologies for proteome analysis include spotted-array-based methods and microfluidic devices. Taken together, these technologies provide a wealth of information that is useful in discovery-based science. However, there are some key limitations of these approaches and new technology is required to be able to fully integrate proteomic information with information obtained about DNA sequence, mRNA profiles and metabolite concentrations into effective models of biological systems. 相似文献

9.

Protein identification using 2D-LC-MS/MS 总被引：3，自引：0，他引：3

Delahunty C Yates JR 《Methods (San Diego, Calif.)》2005,35(3):248-255

Multidimensional liquid chromatography techniques have been coupled to tandem mass spectrometry to provide a robust method to identify proteins in complex mixtures. Data acquisition is interfaced directly with search algorithms for identification through cross-correlation with databases. This review describes the most recent advances in methodologies for protein identification by mass spectrometry and describes the limitations of the application of the technologies. 相似文献

10.

Tandem mass spectrometry data quality assessment by self-convolution

Keng Wah Choo Wai Mun Tham 《BMC bioinformatics》2007,8(1):352

Background

Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on de novo sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified. 相似文献

11.

Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing

Seebacher J Mallick P Zhang N Eddes JS Aebersold R Gelb MH 《Journal of proteome research》2006,5(9):2270-2282

Distance constraints in proteins and protein complexes provide invaluable information for calculation of 3D structures, identification of protein binding partners and localization of protein-protein contact sites. We have developed an integrative approach to identify and characterize such sites through the analysis of proteolytic products derived from proteins chemically cross-linked by isotopically coded cross-linkers using LC-MALDI tandem mass spectrometry and computer software. This method is specifically tailored toward the rapid analysis of low microgram amounts of proteins or multimeric protein complexes cross-linked with nonlabeled and deuterium-labeled bis-NHS ester cross-linking reagents (both commercially available and readily synthesized). Through labeling with [18O]water solvent and LC-MALDI analysis, the method further allows the possible distinction between Type 0 and Type 1 or Type 2 modified peptides (monolinks and looplinks or cross-links), although such a distinction is more readily made from analysis of tandem mass spectrometry data. When applied to the bacterial Colicin E7 DNAse/Im7 heterodimeric protein complex, 23 cross-links were identified including six intersubunit cross-links, all between residues that are close in space when examined in the context of the X-ray structure of the heterodimer. In addition, cross-links were successfully identified in five single subunit proteins, beta-lactoglobulin, cytochrome c, lysozyme, myoglobin, and ribonuclease A, establishing the generality of the approach. 相似文献

12.

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics

Siepen JA Keevil EJ Knight D Hubbard SJ 《Journal of proteome research》2007,6(1):399-408

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave. 相似文献

13.

宏蛋白质组学信息分析的基本策略及其挑战

徐洪凯闫克强何燕斌闻博杨焕明刘斯奇《生物化学与生物物理进展》2018,45(1):23-35

宏蛋白质组学是一门新型科学,它运用质谱技术规模化地采集自然界微生物种群的蛋白质信息,并结合多种组学数据,开展微生物种群的遗传特征及其生物功能的研究.宏蛋白质组学的信息分析与传统蛋白质组学方法有较大的不同,亟需拓展新的分析思路.由于宏蛋白质组的研究对象是复杂度极高的微生物样品,因此,需要构建尽可能囊括样本中所含微生物的基因组信息的物种数据库.面对庞大的数据库,必须考虑到分析过程中所消耗的计算资源和鉴定结果的质控标准,因此,需要高度优化库容量、搜库、假阳性控制等参数.鉴于宏蛋白质组数据中广泛存在复杂的同源蛋白质序列,因此,需要充分利用NCBI数据库中的分类信息进行匹配,并运用LCA算法过滤处理才能将蛋白质有效地归组到物种.本文立足于宏蛋白质组学信息分析,从宏蛋白质组的数据库建立、蛋白质归并、生物学意义发掘等几个方面着手,对该领域的发展现状、面临挑战以及未来研究方向进行了评述. 相似文献

14.

Bioinformatics in mass spectrometry data analysis for proteomics studies

Cristoni S Bernardi LR 《Expert review of proteomics》2004,1(4):469-483

Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies. 相似文献

15.

Bioinformatics in mass spectrometry data analysis for proteomics studies

《Expert review of proteomics》2013,10(4):469-483

Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies. 相似文献

16.

Peptide Sequencing by Mass Spectrometry for Homology Searches and Cloning of Genes

Andrej Shevchenko Matthias Wilm Matthias Mann 《Journal of Protein Chemistry》1997,16(5):481-490

It is now possible to obtain sequence information from gel-separated proteins by mass spectrometry at levels too low for conventional approaches. Usually this tandem mass spectrometric data are used for database searches with the aim of identifying the corresponding gene. Recently it has been shown that long and accurate amino acid sequences can be obtained which are sufficient for PCR-based strategies to clone the corresponding gene [Wilm et al. (1996), Nature 379, 466–469]. More than eight proteins have now been cloned based on that method. In many more cases the sequence information identified homologous proteins. Issues involved in cloning by mass spectrometric sequence information are discussed, as are two case studies. These results clearly establish mass spectrometry as a viable tool not only for the database identification of proteins, but also for the de novo sequencing of gel-separated proteins at the low-picomole to femtomole level. 相似文献

17.

Determination of partial amino acid composition from tandem mass spectra for use in peptide identification strategies

Shadforth I Todd K Crowther D Bessant C 《Proteomics》2005,5(7):1787-1796

We demonstrate a new approach to the determination of amino acid composition from tandem mass spectrometrically fragmented peptides using both experimental and simulated data. The approach has been developed to be used as a search-space filter in a protein identification pipeline with the aim of increased performance above that which could be attained by using immonium ion information. Three automated methods have been developed and tested: one based upon a simple peak traversal, in which all intense ion peaks are treated as being either a b- or y-ion using a wide mass tolerance; a second which uses a much narrower tolerance and does not perform transformations of ion peaks to the complementary type; and the unique fragments method which allows for b- or y-ion type to be inferred and corroborated using a scan of the other ions present in each peptide spectrum. The combination of these methods is shown to provide a high-accuracy set of amino acid predictions using both experimental and simulated data sets. These high quality predictions, with an accuracy of over 85%, may be used to identify peptide fragments that are hard to identify using other methods. The data simulation algorithm is also shown post priori to be a good model of noiseless tandem mass spectrometric peptide data. 相似文献

18.

ScanRanker: Quality assessment of tandem mass spectra via sequence tagging

Ma ZQ Chambers MC Ham AJ Cheek KL Whitwell CW Aerni HR Schilling B Miller AW Caprioli RM Tabb DL 《Journal of proteome research》2011,10(7):2896-2904

In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu. 相似文献

19.

A face in the crowd: recognizing peptides through database search

Eng JK Searle BC Clauser KR Tabb DL 《Molecular & cellular proteomics : MCP》2011,10(11):R111.009522

Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search. 相似文献

20.

Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra

Chen Y Kwon SW Kim SC Zhao Y 《Journal of proteome research》2005,4(3):998-1005

Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification. 相似文献