首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱(tandem mass spectrometry,MS/MS)可以进行多肽的从头测序(de novo sequencing),并搜索数据库以鉴定蛋白质。用图论以及真实谱-理论谱联配(alingment)的方法对串联质谱得到的多肽图谱进行从头解析,得到了可靠的多肽序列,并应用到数据库搜索中鉴定了相应的蛋白质。同时,还用统计的方法对SwissP  相似文献   

2.
串联质谱数据的从头解析与蛋白质的数据库搜索鉴定   总被引:3,自引:0,他引:3  
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱 (tandemmassspectrometry ,MS/MS)可以进行多肽的从头测序 (denovosequencing) ,并搜索数据库以鉴定蛋白质。用图论以及真实谱 理论谱联配 (alignment)的方法对串联质谱得到的多肽图谱进行从头解析 ,得到了可靠的多肽序列 ,并应用到数据库搜索中鉴定了相应的蛋白质。同时 ,还用统计的方法对SwissProt以及TrEMBL蛋白质数据库进行了详细的分析。结果表明 ,3个四肽或者 2个五肽或者 1个八肽一般可以唯一地确定一个蛋白质  相似文献   

3.
蛋白质翻译后修饰在真核生物细胞内广泛存在,对蛋白质的结构和功能有着十分重要的影响.串联质谱技术的快速发展为翻译后修饰鉴定提供了高通量、高灵敏度和高分辨率的分析平台,但传统搜索引擎鉴定修饰的方法无法满足数据分析的需求,非限制翻译后修饰鉴定已成为目前蛋白质组修饰分析的重要手段之一.非限制翻译后修饰鉴定不需要在分析前指定修饰类型,可以直接从样品中找出大量已知或未知的修饰,对提高质谱图谱解析率以及揭示蛋白质的生物学功能具有十分重要的意义.本文首先介绍了非限制翻译后修饰鉴定的定义和发展历程,然后从序列匹配和谱图匹配两个方面详细综述了目前非限制翻译后修饰鉴定的主流算法,分析了非限制翻译后修饰鉴定的质量控制问题,最后结合非限制翻译后修饰鉴定的实际应用讨论了修饰鉴定算法的不足和发展方向.  相似文献   

4.
蛋白质组学多肽鉴定方法一直以基于质谱分析和数据库搜索的方法为主,随着质谱仪技术的发展,海量的质谱数据被获取,这为大规模蛋白质的鉴定提供了一个强大的数据仓库,使得以质谱数据为基础的蛋白质组学研究成为主流。传统的串联质谱图搜库方法鉴定多肽翻译后修饰时具有诸多局限,质谱网络方法可以在一定程度上弥补局限。文中系统综述了基于质谱聚类的质谱网络和质谱图库搜索方法的发展历程、理论研究和应用研究,讨论了质谱网络库方法在鉴定多肽翻译后修饰的优势,并进行了分析和展望。  相似文献   

5.
氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础.  相似文献   

6.
蛋白质糖基化作为最普遍、最重要的蛋白质修饰,一直是组学研究的焦点之一.近十几年来,N-连接糖蛋白质组学研究普遍采用的方法是将糖链与所修饰的多肽分开进行分析.该策略虽降低了分析难度,却也丢失了糖链与蛋白质糖基化位点间重要的对应关系信息.近年来,完整糖肽的质谱分析策略和方法逐步建立起来.总体而言,要实现对完整糖肽的直接质谱分析,首先需要从复杂样品中富集完整糖肽以消除非糖基化多肽对完整糖肽分析的影响,然后在质谱分析中还需要根据糖肽特性调整相应质谱分析参数,最后在后续数据分析中还需要开发相应的分析软件以完成完整糖肽中多肽序列和糖链组成或结构的鉴定.本文即从以上三个主要方面系统阐述目前N-完整糖肽分析中常用的质谱和数据分析策略和方法,并进一步在糖肽谱图识别、母离子单同位素分子质量校正、数据库选择以及假阳性率评估和控制等方面都进行了逐一探讨.完整糖肽的直接质谱分析有助于获取糖链和糖基化位点间的对应关系信息,可为生物标志物发现和疾病致病机理等研究提供更有力的糖蛋白质组学研究工具.  相似文献   

7.
为了探讨厚朴不同商品规格药材化学成分的整体差异,本文采用液相色谱-串联四极杆飞行时间高分辨质谱(LC-Triple TOF MS/MS)和气相色谱-串联质谱(GC-MS/MS)结合多元统计分析技术对"川朴"与"温朴"的化学成分进行比较分析。通过串联质谱分析,对其质谱数据进行峰匹配、峰对齐、滤噪处理等进行特征峰提取;用主成分分析(PCA)和偏最小二乘法-判别分析(PLS-DA)进行数据处理。非挥发性成分的LC-Triple TOF MS/MS分析,根据一级质谱精确质荷比和二级质谱碎片信息,结合软件数据库搜索、标准品比对及相关文献进行成分鉴定;挥发性成分的GC-MS/MS分析,质谱图与NIST05质谱数据库匹配及参照相关文献进行成分鉴定。结果显示,"川朴"与"温朴"样品间的化学组成得到明显区分;初步筛选并鉴定出21种非挥发性差异成分和9种挥发性差异成分。该结果从化学角度为建立"川朴"与"温朴"药材辨识的新方法以及厚朴商品药材质量的综合评价提供基础资料。  相似文献   

8.
串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

9.
序列搜索算法由三部分组成:搜索过程、搜索得到多肽的各氨基酸残基的评分及两端(N端、C端)搜索得到的多肽的合并过程.通过若干实际多肽质谱的解析,结果表明,该算法对多种序列专一性离子并存的未知多肽质谱的解析,可获得较满意结果.尤其是它的评分方式及标准,比较适合多肽质谱图的实际情况,可最大限度地判断解析结果的准确度,为从事用质谱测定多肽一级结构的分析工作者提供了一比较简便且可靠的手段.也为质谱法快速测定蛋白质或多肽序列及其在生物学中的普及提供了一条方便之路.  相似文献   

10.
蛋白质糖基化修饰的鉴定是蛋白质翻译后修饰分析中最具挑战性的任务之一,近几年尤其受到关注.快速发展的质谱技术为规模化的蛋白质糖基化修饰研究提供了有效的手段.与其他基于质谱技术的翻译后修饰鉴定相比,糖基化鉴定的难点在于糖链是大分子而且存在微观不均一性,另外糖链本身可以在串联质谱中碎裂且与肽段的碎裂规律不同,导致蛋白质组学的质谱解析方法和软件难以完整地鉴定肽段序列和糖链结构.完整N-糖肽的鉴定是糖基化分析的热点内容之一,针对N-糖肽的鉴定,近年来,人们开发了多种多样的质谱解析方法,其中包括用N-糖酰胺酶切除糖链后鉴定N-糖基化位点的方法、基于电子转运裂解的糖肽肽段鉴定、基于高能碰撞裂解与电子转运裂解联用或碰撞诱导裂解与三级谱联用的完整N-糖肽鉴定等等.本文对这些质谱解析方法进行了整理和综述,简要指出了目前完整糖肽鉴定软件存在的一些不足,展望了未来的发展方向.  相似文献   

11.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

12.
Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. Unfortunately, the software for unambiguously identifying peptide sequences has not kept pace with the recent hardware improvements in mass spectrometry instruments. Critical for reliable high-throughput protein identification, scoring functions evaluate the quality of a match between experimental spectra and a database peptide. Current scoring function technology relies heavily on ad-hoc parameterization and manual curation by experienced mass spectrometrists. In this work, we propose a two-stage stochastic model for the observed MS/MS spectrum, given a peptide. Our model explicitly incorporates fragment ion probabilities, noisy spectra, and instrument measurement error. We describe how to compute this probability based score efficiently, using a dynamic programming technique. A prototype implementation demonstrates the effectiveness of the model.  相似文献   

13.
De novo peptide sequencing via tandem mass spectrometry.   总被引:10,自引:0,他引:10  
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.  相似文献   

14.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

15.
MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Source code for the scoring functions is available from http://proteomics.fhcrc.org  相似文献   

16.
A system for creating a library of tandem mass spectra annotated with corresponding peptide sequences was described. This system was based on the annotated spectra currently available in the Global Proteome Machine Database (GPMDB). The library spectra were created by averaging together spectra that were annotated with the same peptide sequence, sequence modifications, and parent ion charge. The library was constructed so that experimental peptide tandem mass spectra could be compared with those in the library, resulting in a peptide sequence identification based on scoring the similarity of the experimental spectrum with the contents of the library. A software implementation that performs this type of library search was constructed and successfully used to obtain sequence identifications. The annotated tandem mass spectrum libraries for the Homo sapiens, Mus musculus, and Saccharomyces cerevisiae proteomes and search software were made available for download and use by other groups.  相似文献   

17.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

18.
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.  相似文献   

19.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

20.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号