首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The conventional approach in modern proteomics to identify proteins from limited information provided by molecular and fragment masses of their enzymatic degradation products carries an inherent risk of both false positive and false negative identifications. For reliable identification of even known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65-80%), in a typical trypsin-based proteomics experiment. A new linear de novo algorithm is developed combining efficiency and speed, processing on a conventional 3 GHz PC, 1000 MS/MS data sets in 60 s. More than 6% of all MS/MS data for doubly charged peptides yielded complete sequences, and another 13% gave nearly complete sequences with a maximum gap of two amino acid residues. These figures are comparable with the typical success rates (5-15%) of database identification. For peptides reliably found in the database (Mowse score > or = 34), the agreement with de novo-derived full sequences was >95%. Full sequences were derived in 67% of the cases when full sequence information was present in MS/MS spectra. Thus the new de novo sequencing approach reached the same level of efficiency and reliability as conventional database-identification strategies.  相似文献   

2.
Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of overconfidence that exists in the conventional target decoy method for certain types of peptide identification software.  相似文献   

3.
4.
5.
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.  相似文献   

6.
Manual checking is commonly employed to validate the phosphopeptide identifications from database searching of tandem mass spectra. It is very time-consuming and labor intensive as the number of phosphopeptide identifications increases greatly. In this study, a simple automatic validation approach was developed for phosphopeptide identification by combining consecutive stage mass spectrometry data and the target-decoy database searching strategy. Only phosphopeptides identified from both MS2 and its corresponding MS3 were accepted for further filtering, which greatly improved the reliability in phosphopeptide identification. Before database searching, the spectra were validated for charge state and neutral loss peak intensity, and then the invalid MS2/MS3 spectra were removed, which greatly reduced the database searching time. It was found that the sensitivity was significantly improved in MS2/MS3 strategy as the number of identified phosphopeptides was 2.5 times that obtained by the conventional filter-based MS2 approach. Because of the use of the target-decoy database, the false-discovery rate (FDR) of the identified phosphopeptides could be easily determined, and it was demonstrated that the determined FDR can precisely reflect the actual FDR without any manual validation stage.  相似文献   

7.
Many genomes of nonmodel organisms are yet to be annotated. Peptidomics research on those organisms therefore cannot adopt the commonly used database-driven identification strategy, leaving the more difficult de novo sequencing approach as the only alternative. The reported tool uses the growing resources of publicly or in-house available fragmentation spectra and sequences of (model) organisms to elucidate the identity of peptides of experimental spectra of nonannotated species. Clustering algorithms are implemented to infer the identity of unknown peak lists based on their publicly or in-house available counterparts. The reported tool, which we call the HomClus-tool, can cope with post-translational modifications and amino acid substitutions. We applied this tool on two locusts (Schistocerca gregaria and Locusta migratoria) LC-MALDI-TOF/TOF datasets. Compared to a Mascot database search (using the available UniProt-KB proteins of these species), we were able to double the amount of peptide identifications for both spectral sets. Known bioactive peptides from Drosophila melanogaster (i.e., fragmentations spectra generated in silico thereof) were used as a starting point for clustering, trying to reveal their experimental homologues' counterparts.  相似文献   

8.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

9.
Large scale characterization of phosphoproteins requires highly specific methods for purification of phosphopeptides because of the low abundance of phosphoproteins and substoichiometry of phosphorylation. Enrichment of phosphopeptides from complex peptide mixtures by IMAC is a popular way to perform phosphoproteome analysis. However, conventional IMAC adsorbents with iminodiacetic acid as the chelating group to immobilize Fe(3+) lack enough specificity for efficient phosphoproteome analysis. Here we report a novel IMAC adsorbent through Zr(4+) chelation to the phosphonate-modified poly(glycidyl methacrylate-co-ethylene dimethacrylate) polymer beads. The high specificity of Zr(4+)-IMAC adsorbent was demonstrated by effectively enriching phosphopeptides from the digest mixture of phosphoprotein (alpha- or beta-casein) and bovine serum albumin with molar ratio at 1:100. Zr(4+)-IMAC adsorbent was also successfully applied for the analysis of mouse liver phosphoproteome, resulting in the identification of 153 phosphopeptides (163 phosphorylation sites) from 133 proteins in mouse liver lysate. Significantly more phosphopeptides were identified than by the conventional Fe(3+)-IMAC approach, indicating the excellent performance of the Zr(4+)-IMAC approach. The high specificity of Zr(4+)-IMAC adsorbent was found to mainly result from the strong interaction between chelating Zr(4+) and phosphate group on phosphopeptides. Enrichment of phosphopeptides by Zr(4+)-IMAC provides a powerful approach for large scale phosphoproteome analysis.  相似文献   

10.
未知基因组及蛋白质序列数据库有限的物种的蛋白质组学分析是当前一些非模式生物物种蛋白质组学研究领域的瓶颈之一.基于同源性搜索的BLAST方法(MS BLAST),是近年新发展起来的一种用于未知基因组的蛋白质鉴定的搜索工具,已成功应用于许多未知基因组物种的蛋白质鉴定.SPITC化学辅助方法是本实验室建立的一种改进的de novo质谱测序方法.采用MS BLAST方法对经Mascot软件数据库搜索未能鉴定到的19个金鱼胚胎蛋白质进行鉴定,其中12个蛋白质是直接测序后进行MS BLAST搜索得到的结果,另外7个蛋白质是联合MS BLAST和SPITC衍生方法得到的鉴定结果.实验结果证明,采用MS BLAST方法进行蛋白质的跨物种鉴定具有可行性和可靠性,给蛋白质的跨物种鉴定提供了一条新的途径.  相似文献   

11.
利用反相高效液相色谱 (RP HPLC)和电喷雾串联质谱 (ESI MS MS)联用技术直接对模式蛋白分子 (牛血清白蛋白 ,BSA)的胰蛋白酶酶解产物进行分离和测定 .获得的一系列BSA酶解片段的一级 (MS)和二级 (MS MS)质谱数据经分析软件处理后 ,分别在不同处理和不同参数条件下 ,用 3种不同的方法通过网上蛋白质数据库进行蛋白质搜寻鉴定 .结果显示 ,3种搜寻法都能正确地鉴定该蛋白质 ,其中以利用MS数据的肽质量指纹谱搜寻法 (PMF法 )较为快捷方便 ,但鉴定结果易受数据处理和数据库搜寻鉴定时参数设置等因素的影响 ;利用未解析MS MS数据 (rawMS MSdata)的搜寻法可在较宽的搜寻参数变化范围内获得明确的鉴定结果 ;而借助从头测序 (denovosequencing)结果的序列搜寻法 (sequencequery)则显示出更高的专一性 ,利用较少酶解片段数据就能得到稳定和明确的鉴定结果 ,搜寻参数变化的影响很小 .就酶解条件、数据处理和搜寻参数设置对蛋白质鉴定结果的影响展开详细的讨论 ,为蛋白质组学研究中的数据处理和库搜寻鉴定积累了可借鉴的资料  相似文献   

12.
A novel hybrid methodology for the automated identification of peptides via de novo integer linear optimization, local database search, and tandem mass spectrometry is presented in this article. A modified version of the de novo identification algorithm PILOT, is utilized to construct accurate de novo peptide sequences. A modified version of the local database search tool FASTA is used to query these de novo predictions against the nonredundant protein database to resolve any low-confidence amino acids in the candidate sequences. The computational burden associated with performing several alignments is alleviated with the use of distributive computing. Extensive computational studies are presented for this new hybrid methodology, as well as comparisons with MASCOT for a set of 38 quadrupole time-of-flight (QTOF) and 380 OrbiTrap tandem mass spectra. The results for our proposed hybrid method for the OrbiTrap spectra are also compared with a modified version of PepNovo, which was trained for use on high-precision tandem mass spectra, and the tag-based method InsPecT. The de novo sequences of PILOT and PepNovo are also searched against the nonredundant protein database using CIDentify to compare with the alignments achieved by our modifications of FASTA. The comparative studies demonstrate the excellent peptide identification accuracy gained from combining the strengths of our de novo method, which is based on integer linear optimization, and database driven search methods.  相似文献   

13.
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.  相似文献   

14.
LC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, combined with data processing, stringent, and sequence-similarity database searching tools, was employed in a layered manner to identify proteins in organisms with unsequenced genomes. Highly specific stringent searches (MASCOT) were applied as a first layer screen to identify either known (i.e. present in a database) proteins, or unknown proteins sharing identical peptides with related database sequences. Once the confidently matched spectra were removed, the remainder was filtered against a nonannotated library of background spectra that cleaned up the dataset from spectra of common protein and chemical contaminants. The rectified spectral dataset was further subjected to rapid batch de novo interpretation by PepNovo software, followed by the MS BLAST sequence-similarity search that used multiple redundant and partially accurate candidate peptide sequences. Importantly, a single dataset was acquired at the uncompromised sensitivity with no need of manual selection of MS/MS spectra for subsequent de novo interpretation. This approach enabled a completely automated identification of novel proteins that were, otherwise, missed by conventional database searches.  相似文献   

15.
Neuropeptidomics is used to characterize endogenous peptides in the brain of tree shrews (Tupaia belangeri). Tree shrews are small animals similar to rodents in size but close relatives of primates, and are excellent models for brain research. Currently, tree shrews have no complete proteome information available on which direct database search can be allowed for neuropeptide identification. To increase the capability in the identification of neuropeptides in tree shrews, we developed an integrated mass spectrometry (MS)-based approach that combines methods including data-dependent, directed, and targeted liquid chromatography (LC)-Fourier transform (FT)-tandem MS (MS/MS) analysis, database construction, de novo sequencing, precursor protein search, and homology analysis. Using this integrated approach, we identified 107 endogenous peptides that have sequences identical or similar to those from other mammalian species. High accuracy MS and tandem MS information, with BLAST analysis and chromatographic characteristics were used to confirm the sequences of all the identified peptides. Interestingly, further sequence homology analysis demonstrated that tree shrew peptides have a significantly higher degree of homology to equivalent sequences in humans than those in mice or rats, consistent with the close phylogenetic relationship between tree shrews and primates. Our results provide the first extensive characterization of the peptidome in tree shrews, which now permits characterization of their function in nervous and endocrine system. As the approach developed fully used the conservative properties of neuropeptides in evolution and the advantage of high accuracy MS, it can be portable for identification of neuropeptides in other species for which the fully sequenced genomes or proteomes are not available.  相似文献   

16.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

17.
De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.  相似文献   

18.
Tandem mass spectrometry is a method of choice for rapid analysis in proteomics. Identification and characterization of proteins from organisms with sequenced genomes is today a routine procedure as will be identification of proteins from organisms with unsequenced genomes with new developing tools. Here, we report the use of isotopic labeling with electrospray ionisation (ESI)-tandem mass spectrometry for de novo sequencing in combination with database search taking advantage of different programs for identification of fungal proteins. Using this approach we could identify the proteins of interest. Nevertheless, the identification of a novel protein responsible for the conversion of testosterone into androstenedione was still a difficult task, mostly due to the low homology of steroid transforming enzymes, especially those from microorganisms. Protein p27 was identified as the vanillate O-demethylase oxidoreductase, p33 and p36 as two isoenzymes of malate dehydrogenase, and p45 as citrate synthase. By rechecking the sequences using additional programs it could be shown that the protein p36 has a higher local homology to the steroid-transforming enzyme than to the malate dehydrogenase. Therefore, we assume that p36 is a pluripotent enzyme most probably responsible for the 17beta-hydroxysteroid dehydrogenase activity.  相似文献   

19.
MOTIVATION: Peptide-sequencing methods by mass spectrum use the following two approaches: database searching and de novo sequencing. The database-searching approach is convenient; however, in cases wherein the corresponding sequences are not included in the databases, the exact identification is difficult. On the other hand, in the case of de novo sequencing, no preliminary information is necessary; however, continuous amino acid sequence peaks and the differentiation of these peaks are required. It is, however, very difficult to obtain and differentiate the peaks of all amino acids by using an actual spectrum. We propose a novel de novo sequencing approach using not only mass-to-charge ratio but also ion peak intensity and amino acid cleavage intensity ratio (CIR). RESULTS: Our method compensates for any undetectable amino acid peak intervals by estimating the amino acid set and the probability of peak expression based on amino acid CIR. It provides more accurate identification of sequences than the existing methods, by which it is usually difficult to sequence.  相似文献   

20.
We present and evaluate a strategy for the mass spectrometric identification of proteins from organisms for which no genome sequence information is available that incorporates cross-species information from sequenced organisms. The presented method combines spectrum quality scoring, de novo sequencing and error tolerant BLAST searches and is designed to decrease input data complexity. Spectral quality scoring reduces the number of investigated mass spectra without a loss of information. Stringent quality-based selection and the combination of different de novo sequencing methods substantially increase the catalog of significant peptide alignments. The de novo sequences passing a reliability filter are subsequently submitted to error tolerant BLAST searches and MS-BLAST hits are validated by a sampling technique. With the described workflow, we identified up to 20% more groups of homologous proteins in proteome analyses with organisms whose genome is not sequenced than by state-of-the-art database searches in an Arabidopsis thaliana database. We consider the novel data analysis workflow an excellent screening method to identify those proteins that evade detection in proteomics experiments as a result of database constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号