首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 328 毫秒
对蛋白质质谱数据进行数据库比对和鉴定是蛋白质组学研究技术中的一个重要步骤。由于公共数据库蛋白质数据信息不全,有些蛋白质质谱数据无法得到有效的鉴定。而利用相关物种的EST序列构建专门的质谱数据库则可以增加鉴定未知蛋白的几率。本文介绍了利用EST序列构建Mascot本地数据库的具体方法和步骤,扩展了Mascot检索引擎对蛋白质质谱数据的鉴定范围,从数据库层面提高了对未知蛋白的鉴别几率,为蛋白质组学研究提供了一种较为实用的生物信息学分析技术。  相似文献   

氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础.  相似文献   

未知基因组及蛋白质序列数据库有限的物种的蛋白质组学分析是当前一些非模式生物物种蛋白质组学研究领域的瓶颈之一.基于同源性搜索的BLAST方法(MS BLAST),是近年新发展起来的一种用于未知基因组的蛋白质鉴定的搜索工具,已成功应用于许多未知基因组物种的蛋白质鉴定.SPITC化学辅助方法是本实验室建立的一种改进的de novo质谱测序方法.采用MS BLAST方法对经Mascot软件数据库搜索未能鉴定到的19个金鱼胚胎蛋白质进行鉴定,其中12个蛋白质是直接测序后进行MS BLAST搜索得到的结果,另外7个蛋白质是联合MS BLAST和SPITC衍生方法得到的鉴定结果.实验结果证明,采用MS BLAST方法进行蛋白质的跨物种鉴定具有可行性和可靠性,给蛋白质的跨物种鉴定提供了一条新的途径.  相似文献   

蛋白质组学多肽鉴定方法一直以基于质谱分析和数据库搜索的方法为主,随着质谱仪技术的发展,海量的质谱数据被获取,这为大规模蛋白质的鉴定提供了一个强大的数据仓库,使得以质谱数据为基础的蛋白质组学研究成为主流。传统的串联质谱图搜库方法鉴定多肽翻译后修饰时具有诸多局限,质谱网络方法可以在一定程度上弥补局限。文中系统综述了基于质谱聚类的质谱网络和质谱图库搜索方法的发展历程、理论研究和应用研究,讨论了质谱网络库方法在鉴定多肽翻译后修饰的优势,并进行了分析和展望。  相似文献   

串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

串联质谱数据的从头解析与蛋白质的数据库搜索鉴定   总被引:3,自引:0,他引:3  
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱 (tandemmassspectrometry ,MS/MS)可以进行多肽的从头测序 (denovosequencing) ,并搜索数据库以鉴定蛋白质。用图论以及真实谱 理论谱联配 (alignment)的方法对串联质谱得到的多肽图谱进行从头解析 ,得到了可靠的多肽序列 ,并应用到数据库搜索中鉴定了相应的蛋白质。同时 ,还用统计的方法对SwissProt以及TrEMBL蛋白质数据库进行了详细的分析。结果表明 ,3个四肽或者 2个五肽或者 1个八肽一般可以唯一地确定一个蛋白质  相似文献   

高分辨率质谱技术的快速发展使得"自顶向下"的蛋白质组学(top-down proteomics)研究逐渐成熟起来.在完整蛋白质水平上研究蛋白质组可以提供更精准、更丰富的生物学信息,特别是对于蛋白质上发生了多种关联性的翻译后修饰的情况.另外,由于基因突变、RNA可变剪接和大量蛋白质翻译后修饰的存在,同一个基因往往最终会产生多个"蛋白质变体"(proteoform),而要准确地鉴定这些蛋白质变体,也离不开"自顶向下"的蛋白质组学.在蛋白质水平上的分离技术、质谱技术与生物信息学技术是完整蛋白质鉴定最关键的三项技术.高效的分离技术是实现规模化蛋白质变体鉴定的前提,有效的质谱碎裂是提供可靠鉴定的核心,而快速准确的质谱鉴定算法则是数据分析效率的保障.本文对这三项技术进行了详细总结,重点集中在生物信息学相关技术上,包括对完整蛋白质的质谱数据预处理、数据库搜索鉴定以及翻译后修饰定位等几个计算问题的讨论.  相似文献   

宏蛋白质组学是一门新型科学,它运用质谱技术规模化地采集自然界微生物种群的蛋白质信息,并结合多种组学数据,开展微生物种群的遗传特征及其生物功能的研究.宏蛋白质组学的信息分析与传统蛋白质组学方法有较大的不同,亟需拓展新的分析思路.由于宏蛋白质组的研究对象是复杂度极高的微生物样品,因此,需要构建尽可能囊括样本中所含微生物的基因组信息的物种数据库.面对庞大的数据库,必须考虑到分析过程中所消耗的计算资源和鉴定结果的质控标准,因此,需要高度优化库容量、搜库、假阳性控制等参数.鉴于宏蛋白质组数据中广泛存在复杂的同源蛋白质序列,因此,需要充分利用NCBI数据库中的分类信息进行匹配,并运用LCA算法过滤处理才能将蛋白质有效地归组到物种.本文立足于宏蛋白质组学信息分析,从宏蛋白质组的数据库建立、蛋白质归并、生物学意义发掘等几个方面着手,对该领域的发展现状、面临挑战以及未来研究方向进行了评述.  相似文献   

蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱(tandem mass spectrometry,MS/MS)可以进行多肽的从头测序(de novo sequencing),并搜索数据库以鉴定蛋白质。用图论以及真实谱-理论谱联配(alingment)的方法对串联质谱得到的多肽图谱进行从头解析,得到了可靠的多肽序列,并应用到数据库搜索中鉴定了相应的蛋白质。同时,还用统计的方法对SwissP  相似文献   

蛋白质组学系统研究了生物体蛋白质组,尤其是一定生理、病理条件下差异表达的蛋白;对蛋白质序列、翻译后修饰及其位置的定性鉴定可以帮助我们系统地了解蛋白质的结构和功能。随着软电离技术(如电喷雾电离技术)及高质量测量精度、高质量分辨质谱仪(如轨道阱质谱仪)的发展与相对普及,完整蛋白质的质谱表征(即所谓的自上而下蛋白质组学)已成为可能且渐渐流行起来;相应的数据库搜索引擎和蛋白质鉴定生物信息学工具也有了一定的进展。本文对作者研发的蛋白质电喷雾质谱原位解析算法"同位素质荷比及轮廓指纹比对"及整体蛋白质数据库搜索引擎"Protein Goggle2.0"(http://proteingoggle.tongji.edu.cn/)做一个概述。  相似文献   



Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.  相似文献   

Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

Analysing proteomic data   总被引:5,自引:0,他引:5  
The rapid growth of proteomics has been made possible by the development of reproducible 2D gels and biological mass spectrometry. However, despite technical improvements 2D gels are still less than perfectly reproducible and gels have to be aligned so spots for identical proteins appear in the same place. Gels can be warped by a variety of techniques to make them concordant. When gels are manipulated to improve registration, information is lost, so direct methods for gel registration which make use of all available data for spot matching are preferable to indirect ones. In order to identify proteins from gel spots a property or combination of properties that are unique to that protein are required. These can then be used to search databases for possible matches. Molecular mass, pI, amino acid composition and short sequence tags can all be used in database searches. Currently the method of choice for protein identification is mass spectrometry. Proteins are eluted from the gels and cleaved with specific endoproteases to produce a series of peptides of different molecular mass. In peptide mass fingerprinting, the peptide profile of the unknown protein is compared with theoretical peptide libraries generated from sequences in the different databases. Tandem mass spectroscopy (MS/MS) generates short amino acid sequence tags for the individual peptides. These partial sequences combined with the original peptide masses are then used for database searching, greatly improving specificity. Increasingly protein identification from MS/MS data is being fully or partially automated. When working with organisms, which do not have sequenced genomes (the case with most helminths), protein identification by database searching becomes problematical. A number of approaches to cross species protein identification have been suggested, but if the organism being studied is only distantly related to any organism with a sequenced genome then the likelihood of protein identification remains small. The dynamic nature of the proteome means that there really is no such thing as a single representative proteome and a complete set of metadata (data about the data) is going to be required if the full potential of database mining is to be realised in the future.  相似文献   

Hernandez P  Gras R  Frey J  Appel RD 《Proteomics》2003,3(6):870-878
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.  相似文献   

Separation of proteins by two-dimensional gel electrophoresis (2-DE) coupled with identification of proteins through peptide mass fingerprinting (PMF) by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is the widely used technique for proteomic analysis. This approach relies, however, on the presence of the proteins studied in public-accessible protein databases or the availability of annotated genome sequences of an organism. In this work, we investigated the reliability of using raw genome sequences for identifying proteins by PMF without the need of additional information such as amino acid sequences. The method is demonstrated for proteomic analysis of Klebsiella pneumoniae grown anaerobically on glycerol. For 197 spots excised from 2-DE gels and submitted for mass spectrometric analysis 164 spots were clearly identified as 122 individual proteins. 95% of the 164 spots can be successfully identified merely by using peptide mass fingerprints and a strain-specific protein database (ProtKpn) constructed from the raw genome sequences of K. pneumoniae. Cross-species protein searching in the public databases mainly resulted in the identification of 57% of the 66 high expressed protein spots in comparison to 97% by using the ProtKpn database. 10 dha regulon related proteins that are essential for the initial enzymatic steps of anaerobic glycerol metabolism were successfully identified using the ProtKpn database, whereas none of them could be identified by cross-species searching. In conclusion, the use of strain-specific protein database constructed from raw genome sequences makes it possible to reliably identify most of the proteins from 2-DE analysis simply through peptide mass fingerprinting.  相似文献   

基于串联质谱的蛋白质组研究会产生海量的质谱数据,这些数据通常使用数据库搜索引擎进行鉴定分析,并根据肽段谱图匹配(PSM)反推真实的样品蛋白质.对于高通量蛋白质组数据的处理,其鉴定结果的可信是后续分析应用的前提,因此对鉴定结果的质量控制尤为重要,而基于目标-诱饵库(target-decoy)搜索策略的质量控制是目前应用最为广泛的方法.本文首先介绍了基于目标-诱饵库搜索策略搜库和质量控制的实施流程,然后综述了基于目标-诱饵库搜索策略的质量控制工具,并提出了目标-诱饵库搜索策略的不足及改善方法,最后对目标-诱饵库搜索策略进行了总结与展望.  相似文献   

Lack of genomic sequence data and the relatively high cost of tandem mass spectrometry have hampered proteomic investigations into helminths, such as resolving the mechanism underpinning globally reported anthelmintic resistance. Whilst detailed mechanisms of resistance remain unknown for the majority of drug-parasite interactions, gene mutations and changes in gene and protein expression are proposed key aspects of resistance. Comparative proteomic analysis of drug-resistant and -susceptible nematodes may reveal protein profiles reflecting drug-related phenotypes. Using the gastro-intestinal nematode, Haemonchus contortus as case study, we report the application of freely available expressed sequence tag (EST) datasets to support proteomic studies in unsequenced nematodes. EST datasets were translated to theoretical protein sequences to generate a searchable database. In conjunction with matrix-assisted laser desorption ionisation time-of-flight mass spectrometry (MALDI-TOF-MS), Peptide Mass Fingerprint (PMF) searching of databases enabled a cost-effective protein identification strategy. The effectiveness of this approach was verified in comparison with MS/MS de novo sequencing with searching of the same EST protein database and subsequent searches of the NCBInr protein database using the Basic Local Alignment Search Tool (BLAST) to provide protein annotation. Of 100 proteins from 2-DE gel spots, 62 were identified by MALDI-TOF-MS and PMF searching of the EST database. Twenty randomly selected spots were analysed by electrospray MS/MS and MASCOT Ion Searches of the same database. The resulting sequences were subjected to BLAST searches of the NCBI protein database to provide annotation of the proteins and confirm concordance in protein identity from both approaches. Further confirmation of protein identifications from the MS/MS data were obtained by de novo sequencing of peptides, followed by FASTS algorithm searches of the EST putative protein database. This study demonstrates the cost-effective use of available EST databases and inexpensive, accessible MALDI-TOF MS in conjunction with PMF for reliable protein identification in unsequenced organisms.  相似文献   

Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

Ishino Y  Okada H  Ikeuchi M  Taniguchi H 《Proteomics》2007,7(22):4053-4065
MS combined with database searching has become the preferred method for identifying proteins present in cell or tissue samples. The technique enables us to execute large-scale proteome analyses of species whose genomes have already been sequenced. Searching mass spectrometric data against protein databases composed of annotated genes has been widely conducted. However, there are some issues with this technique; wrong annotations in protein databases cause deterioration in the accuracy of protein identification, and only proteins that have already been annotated can be identified. We propose a new framework that can detect correct ORFs by integrating an MS/MS proteomic data mapping and a knowledge-based system regarding the translation initiation sites. This technique can provide correction of predicted coding sequences, together with the possibility of identifying novel genes. We have developed a computational system; it should first conduct the probabilistic peptide-matching against all possible translational frames using MS/MS data, then search for discriminative DNA patterns around the detected peptides, and lastly integrate the facts using empirical knowledge stored in knowledge bases to obtain correct ORFs. We used photosynthetic bacteria Synechocystis sp. PCC6803 as a sample prokaryote, resulting in the finding of 14 N-terminus annotation errors and several new candidate genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号