首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
串联质谱数据的从头解析与蛋白质的数据库搜索鉴定   总被引:3,自引:0,他引:3  
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱 (tandemmassspectrometry ,MS/MS)可以进行多肽的从头测序 (denovosequencing) ,并搜索数据库以鉴定蛋白质。用图论以及真实谱 理论谱联配 (alignment)的方法对串联质谱得到的多肽图谱进行从头解析 ,得到了可靠的多肽序列 ,并应用到数据库搜索中鉴定了相应的蛋白质。同时 ,还用统计的方法对SwissProt以及TrEMBL蛋白质数据库进行了详细的分析。结果表明 ,3个四肽或者 2个五肽或者 1个八肽一般可以唯一地确定一个蛋白质  相似文献   

2.
蛋白质的鉴定是蛋白质组学研究中必不可少的一步。用串联质谱(tandem mass spectrometry,MS/MS)可以进行多肽的从头测序(de novo sequencing),并搜索数据库以鉴定蛋白质。用图论以及真实谱-理论谱联配(alingment)的方法对串联质谱得到的多肽图谱进行从头解析,得到了可靠的多肽序列,并应用到数据库搜索中鉴定了相应的蛋白质。同时,还用统计的方法对SwissP  相似文献   

3.
基于串联质谱的蛋白质组研究会产生海量的质谱数据,这些数据通常使用数据库搜索引擎进行鉴定分析,并根据肽段谱图匹配(PSM)反推真实的样品蛋白质.对于高通量蛋白质组数据的处理,其鉴定结果的可信是后续分析应用的前提,因此对鉴定结果的质量控制尤为重要,而基于目标-诱饵库(target-decoy)搜索策略的质量控制是目前应用最为广泛的方法.本文首先介绍了基于目标-诱饵库搜索策略搜库和质量控制的实施流程,然后综述了基于目标-诱饵库搜索策略的质量控制工具,并提出了目标-诱饵库搜索策略的不足及改善方法,最后对目标-诱饵库搜索策略进行了总结与展望.  相似文献   

4.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

5.
氨基酸突变能够改变蛋白的结构和功能,影响生物体的生命过程.基于串联质谱的鸟枪法蛋白质组学是目前大规模研究蛋白质组学的主要方法,但是现有的质谱数据鉴定流程为了提高鉴定结果的灵敏度往往会有意压缩数据库中的氨基酸突变信息.因此,如何挖掘数据中的氨基酸突变信息成为当前质谱数据鉴定的一个重要部分.当前应用于氨基酸突变鉴定的串联质谱鉴定方法大致可以分为3大类:基于序列数据库搜索的方法、基于序列标签搜索的算法以及基于图谱库搜索的算法.本文首先详细介绍了这3种氨基酸突变鉴定算法,并分析了各种方法的特点和不足,然后介绍了氨基酸突变鉴定的研究现状和发展方向.随着基于串联质谱的蛋白质组学的不断发展,蛋白序列中的氨基酸突变信息将被更好地解析出来,从而得以深入探讨由氨基酸突变引起的蛋白结构和功能改变,为揭示氨基酸突变的生物学意义奠定基础.  相似文献   

6.
基于串联质谱的蛋白质组研究会产生海量的质谱数据,这些数据通常使用数据库搜索引擎进行鉴定分析,并根据肽段谱图匹配(PSM)反推真实的样品蛋白质.对于高通量蛋白质组数据的处理,其鉴定结果的可信是后续分析应用的前提,因此对鉴定结果的质量控制尤为重要,而基于目标-诱饵库(target-decoy)搜索策略的质量控制是目前应用最为广泛的方法.本文首先介绍了基于目标-诱饵库搜索策略搜库和质量控制的实施流程,然后综述了基于目标-诱饵库搜索策略的质量控制工具,并提出了目标-诱饵库搜索策略的不足及改善方法,最后对目标-诱饵库搜索策略进行了总结与展望.  相似文献   

7.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

8.
糖组学的研究与发展对生命科学及生物医药的发展具有重要的推动作用.寡糖结构的解析是糖组学中重要的研究课题之一.串联质谱分析技术以其具有高特异性及高灵敏度的特点成为了广为使用的寡糖结构解析方法.本文首先概述了串联质谱寡糖结构解析的研究背景;然后介绍了现有的寡糖结构解析策略及基于每种策略的经典解析方法,并对所列方法的原理和算法进行逐一分析讨论;最后,总结现有方法的优缺点,对串联质谱寡糖结构研究领域进行了研究展望.  相似文献   

9.
蛋白质翻译后修饰在细胞中广泛存在,对生命活动起到重要的调节作用。串联质谱技术的快速发展为蛋白质及其修饰鉴定提供了高通量、高灵敏度和高分辨率的研究平台。由于蛋白质鉴定软件往往不能准确定位修饰发生的位点,因而需要专门的数据分析算法重新定位修饰位点。本文首先介绍蛋白质修饰定位问题的来源和挑战,继而分析已有修饰重定位算法的原理和特点,最后讨论修饰重定位算法的不足和发展方向。  相似文献   

10.
鸟枪法蛋白质鉴定质量控制方法研究进展   总被引:1,自引:0,他引:1  
鸟枪法串联质谱蛋白质鉴定策略由于其高可靠和高效率而被广泛应用于蛋白质组学研究中,这种方法直接对蛋白质混合物进行酶切,以肽段为鉴定单元,继而推导真实的样品蛋白质.由于利用质谱图推导肽段存在一定的假阳性率,而且直接对蛋白质混合物的酶切也导致了肽段和蛋白质之间关联信息的丢失,所鉴定的蛋白质难免存在部分不可靠结果.因此,蛋白质鉴定的质量控制在蛋白质组学研究中极为重要.蛋白质鉴定的质量控制包含两大类主要方法,其一为利用肽段进行蛋白质组装,当前最常用也被证明最有效的方法是使用简约原则,即用最少的蛋白质解释所有鉴定肽段,现有的方法可以分为布尔型和概率型,其二为鉴定蛋白质的可靠性评估,包括单个蛋白质鉴定置信度和蛋白质鉴定整体水平的假阳性率计算.综合各种可辅助蛋白质鉴定的先验信息,构建普适的概率统计模型,是目前蛋白质鉴定质量控制方法的发展趋势.  相似文献   

11.
Mass spectrometry‐based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph‐based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.  相似文献   

12.
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.  相似文献   

13.
生物大分子指生物体内存在的DNA、蛋白质、多糖等物质,其对生物体正常生命活动至关重要.从头合成和设计技术在生物大分子的合成和结构设计上具有自由度高、前体简单等特点,能够按照特定研究目的对生物大分子进行全新设计和高效合成.近年来,从头合成与设计技术在人造基因组合成、新型蛋白质类药物设计、糖缀合物合成等领域已开始受到重视.基于生物大分子从头合成和设计技术,可以定向制备全新设计的DNA或全新的基因表达产物,以及具有识别功能的糖链或糖缀合物,将大大推进诸如细胞因子模拟物、基因治疗递送载体等生物活性物质的开发,为人工生物系统的构建、罕见疾病的治疗等提供新的解决方法.本文就DNA、蛋白质和多糖的从头合成和设计进行了综述,阐述了相关方法及应用,最后概括分析了三者之间的关系.  相似文献   

14.
There are many computer programs that can match tandem mass spectra of peptides to database-derived sequences; however, situations can arise where mass spectral data cannot be correlated with any database sequence. In such cases, sequences can be automatically deduced de novo, without recourse to sequence databases, and the resulting peptide sequences can be used to perform homologous nonexact searches of sequence databases. This article describes details on how to implement both a de novo sequencing program called “Lutefisk,” and a version of FASTA that has been modified to account for sequence ambiguities inherent in tandem mass spectrometry data.  相似文献   

15.
Sequence determination of peptides is a crucial step in mass spectrometry–based proteomics. Peptide sequences are determined either by database search or by de novo sequencing using tandem mass spectrometry. Determination of all the theoretical expected peptide fragments and eliminating false discoveries remains a challenge in proteomics. Developing standards for evaluating the performance of mass spectrometers and algorithms used for identification of proteins is important for proteomics studies. The current study is focused on these aspects by using synthetic peptides. A total of 599 peptides were designed from in silico tryptic digest with 1 or 2 missed cleavages from 199 human proteins, and synthetic peptides corresponding to these sequences were obtained. The peptides were mixed together, and analysis was carried out using liquid chromatography–electrospray ionization tandem mass spectrometry on a Q-Exactive HF mass spectrometer. The peptides and proteins were identified with SEQUEST program. The analysis was carried out using the proteomics workflows. A total of 573 peptides representing 196 proteins could be identified, and a spectral library was created for these peptides. Analysis parameters such as “no enzyme selection” gave the maximum number of detected peptides as compared with trypsin in the selection. False discoveries could be identified. This study highlights the limitations of peptide detection and the need for developing powerful algorithms along with tools to evaluate mass spectrometers and algorithms. It also shows the limitations of peptide detection even with high-end mass spectrometers. The mass spectral data are available in ProteomeXchange with accession no. PXD017992.  相似文献   

16.
In proteomic studies, assigning protein identity from organisms whose genomes are yet to be completely sequenced remains a challenging task. For these organisms, protein identification is typically based on cross species matching of amino acid sequence obtained from collision induced dissociation (CID) of peptides using mass spectrometry. The most direct approach of de novo sequencing is slow and often difficult, due to the complexity of the resultant CID spectra. For MALDI-MS, this problem has been addressed by using chemical derivatisation to direct peptide fragmentation, thereby simplifying CID spectra and facilitating de novo interpretation. In this study, milk whey proteins from the tammar wallaby (Macropus eugenii) were used to evaluate three chemical derivatisation methods compatible with MALDI MS/MS. These methods included (i) guanidination and sulfonation using chemically-assisted fragmentation (CAF), (ii) guanidination and sulfonation using 4-sulfophenyl isothiocyanate (SPITC) and (iii) derivatising the epsilon-amino group of lysine residues with Lys Tag 4H. Derivatisation with CAF and SPITC resulted in more protein identification than Lys Tag 4H. Sulfonation using SPITC was the preferred method due to the low cost per experiment, the reactivity with both lysine and arginine terminated peptides and the resultant simplified MS/MS spectra.*Australian Peptide Conference Issue.**This project was funded by an ARC Linkage grant to Deane supported by TGR Biosciences and facilitated by access to the Australian Proteome Analysis Facility established under the Australian Government’s Major National Research Facilities program.  相似文献   

17.

Background

Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence.

Results

We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model.

Conclusions

Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.
  相似文献   

18.

Background  

Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i).  相似文献   

19.
Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith–Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.Very often, groundbreaking discoveries with a significant impact on the biotechnological and biomedical fields have emerged from studying “non-canonical” organisms. For example, the study of Thermus aquaticus allowed us to ultimately pave the way to modern molecular biology with the characterization of that organism''s thermostable DNA polymerase (1). The characterization of the green fluorescent protein in Aequoria victoria led to a revolution in cellular biology and to a Nobel Prize being awarded to Osamu Shimomura, Martin Chalfie, and Roger Tsien. In Brazil, Sergio Ferreira''s work on the venom of the Brazilian poisonous snake Bothrops jararaca enabled the development of the first angiotensin-converting enzyme inhibitor drug (Captopril) for the treatment of hypertension (2).In scenarios such as these, proteomics has the potential to allow a better understanding of the complexity of biological systems and the process of evolution than the study of the genetic code alone. It enables the characterization of molecular processes according to their protein content, facilitating new discoveries. In proteomics, the most frequently used strategy for protein identification is so-called peptide spectrum matching (PSM),1 or the comparison of experimental mass spectra obtained by fragmenting peptides in a mass spectrometer to theoretical spectra generated from a sequence database. In general, the identification process follows from the sequence whose theoretical spectrum yields the highest matching score according to some empirical or probabilistic function. Examples of search engines adopting this strategy are SEQUEST (3), X!Tandem (4), and Mascot (5).Back in the 1990s, establishment of a cutoff score for confident identification relied mostly on user experience; for example, given a specific charge state, Washburn et al. established cross-correlation and deltaCn cutoff values for SEQUEST in order to allow the selection of a subset of confident identifications from LCQ data. This has since been termed “the Washburn criterion.” In what followed, target-decoy databases were implemented to allow for more sophisticated refinements in filtering the data (6). In 2007, Elias and Gygi published a seminal paper on the target-decoy approach to shotgun proteomics (7) that ultimately firmed this approach as a standard and motivated the development of several statistical filters capable of converging to a list of confident identifications satisfying a user-specified false-discovery rate (FDR) with significantly more sensitivity than the conservative Washburn criterion. Such statistical filters include mixtures of probabilities (8), quadratic discriminant analysis (9), semi-supervised learning with support vector machines (10), and Bayesian logic (11) using a semi-labeled decoy analysis to account for overfitting (12). With so many advances, the PSM workflow has become the gold standard, as it is very sensitive and the least error-prone method when a database is available with the corresponding proteins. The latter factor limits the application of PSM to those organisms for which accurate sequence databases have been established. If a peptide''s sequence is not contained within the sequence database, it cannot be identified via the PSM method. However, efforts in developing error-tolerant PSM approaches such as implemented in Mascot have made it possible to handle minor sequence modifications constrained by a simple set of rules. Nevertheless, increasing the search space in the PSM approach leads to decreased sensitivity (13).Even though the concept of computer-aided de novo sequencing predates that of PSM (14), advances in the quality of mass spectrometry data and the power of computer hardware have allowed it to reemerge at the heart of a highly active field. De novo sequencing is unbiased insofar as it is not constrained by a sequence database, and it is therefore complementary to PSM. However, it has remained the most error prone of the two methods (15). The challenges of de novo sequencing notwithstanding, a few recent and notable improvements in computer-aided de novo analysis are PepNovo (16), which combines graph theory with machine learning; pNovo+ (17), which is optimized for high-resolution HCD data; NovoHMM (18), relying on hidden Markov models for increased sensitivity; and PEAKS (19), which creates a spectrum graph model by performing dynamic programming on the mass values regardless of the presence of an observed fragment ion. By considering the complementarities of different fragmentation strategies (e.g. collision induced dissociation, electron transfer dissociation (20), and electron capture dissociation (21)), computational proteomics scientists have also demonstrated significant advances in de novo accuracy (22). In particular, the Bandeira group has continually pushed the limits and redefined the notion of what de novo sequencing can do by introducing the spectral networks paradigm (2325). Briefly, this strategy can assemble mass spectra into spectral pairs by joining overlapping spectra obtained from sample aliquots digested by different enzymes. As a consequence, it reduces noise and significantly improves protein coverage. Its latest version also combines data from different fragmentation techniques.These algorithm developments have improved de novo sequencing, shifting the bottleneck to post-sequence processing of data. This is because the output of de novo software is a long list of highly similar full and partial peptide sequence and scores. An initial attempt to overcome these limitations consisted of a tag approach that was a hybrid of de novo sequencing and database searching: short sequence tags were derived from tandem mass spectra and used to search a sequence database (26). In what followed, a modified version based on the FASTA homology search tool was proposed for homology-driven proteomics (27). This strategy was implemented as part of the CIDentify tool, whose novelty was to account, in the alignment score, for limitations of mass spectrometry sequencing such as switching between leucine and isoleucine or other combinations of amino acids having the same mass. The next steps were taken mainly by the Shevchenko group through the introduction of the MS-Blast algorithm, which relies on a different set of scores and uses the PAM30MS substitution matrix, itself tailored for mass-spectrometry-based proteomics (28, 29). For a complete review of de novo sequencing and homology searching, we suggest Ref. 30.The current de novo post-processing paradigm presents several limitations that are similar to those of the early PSM workflow. Output files generally consist of a peptide list with corresponding scores, demanding an experienced user to assess trustworthy identifications. If the same peptide is analyzed by different mass spectrometers, different scores might be generated, which makes data comparison between different groups a challenging task. In a sense, problems are similar to those encountered when adopting the early Washburn criterion. Assembling protein information from a list of peptides is not a simple task, and usually it is not performed using state-of-the-art de novo tools. Although there are great tools for doing this at the PSM level, there is still a lack of similar tools for de novo sequencing.To tackle the aforementioned shortcomings, and in line with our strong interest in diversity-driven proteomics (29), we present a methodology for post-processing de novo sequencing data that allows inference of protein identification through statistical mapping of de novo sequencing results to a protein sequence database. Our approach begins with the use of Gotoh''s version of the Smith–Waterman algorithm, based on affine gap scoring (31) for increased scalability, to align de novo sequences against those in a target-decoy database. Then a radial basis function neural network (RBF-NN) is used to rank results according to alignment score, de novo score, precursor charge state, and peptide length. Finally, a heuristic method is used to present protein identification results in a user-friendly, interactive report. The resulting algorithm was implemented as the software PepExplorer. In essence, its goal is somewhat similar to that of post-processing tools such as DTASelect (9), Percolator (10), and SEPro (11), but with an extra layer of complexity inherent from de novo sequencing. PepExplorer can handle the output of several widely adopted de novo tools, such as PepNovo, pNovo+, and PEAKS, and accepts a generic format to enable result analysis from a broader range of tools once results are run through simple parsers. Similarly, the software accepts a series of database formats for input analysis. These features are not found in other tools. PepExplorer is freely available to the scientific community and is provided with the necessary documentation.The effectiveness of our methodology has been verified in two distinct scenarios, the first a real but controlled experiment and the other pertaining to comprehensive profiling of the plasma components of Bothrops jararaca, a venomous viper endemic to Brazil, southern Paraguay, and northern Argentina. The first scenario''s purpose was to validate the effectiveness of the tool in analyzing a published Pyrococcus furiosus dataset (11). We note that this organism is recognized by the proteomics community as well suited for benchmarking, because it allows for the rigorous testing of identification algorithms at the peptide and protein levels (32, 33). We modified the P. furiosus sequence database in such a way that no more peptides were identified via the PSM approach or another widely adopted error-tolerant search tool, Mod-A (34). We then found that we could recover protein identifications using our tool. The B. jararaca scenario has allowed us to explore uncharted territory, as this organism has an incomplete sequence database and we were therefore required to rely on those of orthologous organisms. In particular, B. jararaca plasma was chosen because it is a main research model studied at the Laboratory of Toxinology (FIOCRUZ, Brazil), and several natural inhibitors of snake toxins have already been identified/characterized from this biological matrix (3537).  相似文献   

20.
Motivation: The key to MS -based proteomics is peptide sequencing.The major challenge in peptide sequencing, whether library searchor de novo, is to better infer statistical significance andbetter attain noise reduction. Since the noise in a spectrumdepends on experimental conditions, the instrument used andmany other factors, it cannot be predicted even if the peptidesequence is known. The characteristics of the noise can onlybe uncovered once a spectrum is given. We wish to overcome suchissues. Results: We designed RAId to identify peptides from their associatedtandem mass spectrometry data. RAId performs a novel de novosequencing followed by a search in a peptide library that wecreated. Through de novo sequencing, we establish the spectrum-specificbackground score statistics for the library search. When thedatabase search fails to return significant hits, the top-rankingde novo sequences become potential candidates for new peptidesthat are not yet in the database. The use of spectrum-specificbackground statistics seems to enable RAId to perform well evenwhen the spectral quality is marginal. Other important featuresof RAId include its potential in de novo sequencing alone andthe ease of incorporating post-translational modifications. Availability: Programs implementing the methods described areavailable from the authors on request. Contact: yyu{at}ncbi.nlm.nih.gov Supplementary information: ftp://ftp.ncbi.nih.gov/pub/yyu/Proteomics/MSMS/RAId/MSMS_bioinfo_supp.pdf  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号