首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

2.
对蛋白质质谱数据进行数据库比对和鉴定是蛋白质组学研究技术中的一个重要步骤。由于公共数据库蛋白质数据信息不全,有些蛋白质质谱数据无法得到有效的鉴定。而利用相关物种的EST序列构建专门的质谱数据库则可以增加鉴定未知蛋白的几率。本文介绍了利用EST序列构建Mascot本地数据库的具体方法和步骤,扩展了Mascot检索引擎对蛋白质质谱数据的鉴定范围,从数据库层面提高了对未知蛋白的鉴别几率,为蛋白质组学研究提供了一种较为实用的生物信息学分析技术。  相似文献   

3.
当前,基于生物质谱进行蛋白质鉴定的技术已经成为蛋白质组学研究的支撑技术之一.产生的数据主要使用数据库搜索的方法进行处理,这种方法的一大缺陷是不能鉴定数据库中未包含的蛋白质,因此如何充分利用质谱数据对蛋白质组研究的意义很大,而新蛋白质鉴定更是其中一个重要的内容.新蛋白质鉴定是蛋白质鉴定的一个方面,新蛋白质的定义按照序列和功能的已知程度分为3个层次;以蛋白质鉴定的方法为基础,目前新蛋白质鉴定的方法可分为denovo测序和相似序列搜索结合的方法以及搜索EST、基因组等核酸数据库的方法2大类;两者各有利弊.存在各自的问题和相应处理的策略.不同的研究者可以根据具体目的应用和发展不同的鉴定方法,同时新蛋白质的鉴定也将随着蛋白质组学研究的发展而更加完善.  相似文献   

4.
蛋白质组学多肽鉴定方法一直以基于质谱分析和数据库搜索的方法为主,随着质谱仪技术的发展,海量的质谱数据被获取,这为大规模蛋白质的鉴定提供了一个强大的数据仓库,使得以质谱数据为基础的蛋白质组学研究成为主流。传统的串联质谱图搜库方法鉴定多肽翻译后修饰时具有诸多局限,质谱网络方法可以在一定程度上弥补局限。文中系统综述了基于质谱聚类的质谱网络和质谱图库搜索方法的发展历程、理论研究和应用研究,讨论了质谱网络库方法在鉴定多肽翻译后修饰的优势,并进行了分析和展望。  相似文献   

5.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

6.
突变是研究蛋白质结构和功能的重要方法。点突变实验中,突变位点的选择随机性大,若能对突变后蛋白质功能是否发生变化做出预测,将大大减少实验的盲目性。为此,作者设计了一个基于信号处理的单点替换突变预测模型,对序列上每个位点所有可能的氨基酸替换的效果进行估计。使用蛋白质突变数据库(Protein Mutant Database,PMD)里的11个蛋白共2600多个点突变的数据集,对以上模型进行了验证。结果表明正确率高达81.2%,并且推荐出的替换选择位点仅占所有可能替换突变的3.1%。在体外定点突变实验中,使用本模型推荐的高可能性功能突变位点将有助于提高实验的成功率。该模型使用蛋白质的氨基酸序列信息,特别是对未知结构的蛋白质同样适用。然而,由于缺乏足够的突变实验数据,本模型的应用仍需进一步完善和验证。  相似文献   

7.
基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。  相似文献   

8.
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱(tandem mass spectrometry,MS/MS)可以进行多肽的从头测序(de novo sequencing),并搜索数据库以鉴定蛋白质。用图论以及真实谱-理论谱联配(alingment)的方法对串联质谱得到的多肽图谱进行从头解析,得到了可靠的多肽序列,并应用到数据库搜索中鉴定了相应的蛋白质。同时,还用统计的方法对SwissP  相似文献   

9.
串联质谱数据的从头解析与蛋白质的数据库搜索鉴定   总被引:3,自引:0,他引:3  
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱 (tandemmassspectrometry ,MS/MS)可以进行多肽的从头测序 (denovosequencing) ,并搜索数据库以鉴定蛋白质。用图论以及真实谱 理论谱联配 (alignment)的方法对串联质谱得到的多肽图谱进行从头解析 ,得到了可靠的多肽序列 ,并应用到数据库搜索中鉴定了相应的蛋白质。同时 ,还用统计的方法对SwissProt以及TrEMBL蛋白质数据库进行了详细的分析。结果表明 ,3个四肽或者 2个五肽或者 1个八肽一般可以唯一地确定一个蛋白质  相似文献   

10.
未知基因组及蛋白质序列数据库有限的物种的蛋白质组学分析是当前一些非模式生物物种蛋白质组学研究领域的瓶颈之一.基于同源性搜索的BLAST方法(MS BLAST),是近年新发展起来的一种用于未知基因组的蛋白质鉴定的搜索工具,已成功应用于许多未知基因组物种的蛋白质鉴定.SPITC化学辅助方法是本实验室建立的一种改进的de novo质谱测序方法.采用MS BLAST方法对经Mascot软件数据库搜索未能鉴定到的19个金鱼胚胎蛋白质进行鉴定,其中12个蛋白质是直接测序后进行MS BLAST搜索得到的结果,另外7个蛋白质是联合MS BLAST和SPITC衍生方法得到的鉴定结果.实验结果证明,采用MS BLAST方法进行蛋白质的跨物种鉴定具有可行性和可靠性,给蛋白质的跨物种鉴定提供了一条新的途径.  相似文献   

11.
Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

12.
Two-dimensional liquid chromatography (2D-LC) coupled on-line with electrospray ionization tandem mass spectrometry (2D-LC-ESI-MS/MS) is a new platform for analysis and identification of proteome. Peptides are separated by 2D-LC and then performed MS/MS analysis by tandem MS/MS. The MS/MS data are searched against database for protein identification. In one 2D-LC-ESI-MS/MS run, we obtained not only the structural information of peptides directly from MS/MS, but also the retention time of peptides eluted from LC. Information on the chromatographic behavior of peptides can assist protein identification in the new platform for proteomics. The retention time of the matching peptides of the identified protein was predicted by the hydrophobic contribute of each amino acid on reversed-phase liquid chromatography (RPLC). By using this strategy proteins were identified by four types of information: peptide mass fingerprinting (PMF), sequence query, and MS/MS ions searched and the predicted retention time. This additional information obtained from LC could assist protein identification with no extra experimental cost.  相似文献   

13.
To improve the utility of increasingly large numbers of available unannotated and initially poorly annotated genomic sequences for proteome analysis, we demonstrate that effective protein identification can be made on a large and unannotated genome. The strategy developed is to translate the unannotated genome sequence into amino acid sequence encoding putative proteins in all six reading frames, to identify peptides by tandem mass spectrometry (MS/MS), to localize them on the genome sequence, and to preliminarily annotate the protein via a similarity search by BLAST. These tasks have been optimized and automated. Optimization to obtain multiple peptide matches in effect extends the searchable region and results in more robust protein identification. The viability of this strategy is demonstrated with the identification of 223 cilia proteins in the unicellular eukaryotic model organism Tetrahymena thermophila, whose initial genomic sequence draft was released in November 2003. To the best of our knowledge, this is the first demonstration of large-scale protein identification based on such a large, unannotated genome. Of the 223 cilia proteins, 84 have no similarity to proteins in NCBI's nonredundant (nr) database. This methodology allows identifying the locations of the genes encoding these novel proteins, which is a necessary first step to downstream functional genomic experimentation.  相似文献   

14.
Exploring the proteome of Plasmodium   总被引:2,自引:0,他引:2  
With the entire genomic sequence of several species of Plasmodium soon to be available, researchers are now focusing on methods to study gene and protein expression at the whole organism level. Traditional methods of characterising and identifying large numbers of proteins from a complex protein mixture have relied predominantly on two-dimensional gel electrophoresis combined with N-terminal sequencing or mass spectrometry of individually prepared proteins. New proteomics methods are now available that are based on resolving small peptides derived from complex protein mixtures by high-resolution liquid chromatography and directly identifying them by tandem mass spectrometry (LC/LC/MS/MS) and sophisticated computer search algorithms against whole genome sequence databases. These newer proteomic methods have the potential to accelerate the reproducible identification of large numbers of proteins from various life cycle stages of Plasmodium and may help to better understand parasite biology and lead to the identification of new targets of vaccines and drugs.  相似文献   

15.
As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.  相似文献   

16.
Mutation-tolerant protein identification by mass spectrometry.   总被引:8,自引:0,他引:8  
Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related spectra in large collections of uncharacterized spectra (i.e., from normal and diseased individuals) would be very valuable in functional proteomics. This problem is far from being simple since very similar peptides may have very different spectra. We introduce a new notion of spectral similarity that allows one to identify related spectra even if the corresponding peptides have multiple modifications/mutations. Based on this notion, we developed a new algorithm for mutation-tolerant database search as well as a method for cross-correlating related uncharacterized spectra.  相似文献   

17.
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.  相似文献   

18.
A main objective of proteomics research is to systematically identify and quantify proteins in a given proteome (cells, subcellular fractions, protein complexes, tissues or body fluids). Protein labeling with isotope-coded affinity tags (ICAT) followed by tandem mass spectrometry allows sequence identification and accurate quantification of proteins in complex mixtures, and has been applied to the analysis of global protein expression changes, protein changes in subcellular fractions, components of protein complexes, protein secretion and body fluids. This protocol describes protein-sample labeling with ICAT reagents, chromatographic fractionation of the ICAT-labeled tryptic peptides, and protein identification and quantification using tandem mass spectrometry. The method is suitable for both large-scale analysis of complex samples including whole proteomes and small-scale analysis of subproteomes, and allows quantitative analysis of proteins, including those that are difficult to analyze by gel-based proteomics technology.  相似文献   

19.
Identification of proteins by mass spectrometry (MS) is an essential step in pro- teomic studies and is typically accomplished by either peptide mass fingerprinting (PMF) or amino acid sequencing of the peptide. Although sequence information from MS/MS analysis can be used to validate PMF-based protein identification, it may not be practical when analyzing a large number of proteins and when high- throughput MS/MS instrumentation is not readily available. At present, a vast majority of proteomic studies employ PMF. However, there are huge disparities in criteria used to identify proteins using PMF. Therefore, to reduce incorrect protein identification using PMF, and also to increase confidence in PMF-based protein identification without accompanying MS/MS analysis, definitive guiding principles are essential. To this end, we propose a value-based scoring system that provides guidance on evaluating when PMF-based protein identification can be deemed sufficient without accompanying amino acid sequence data from MS/MS analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号