首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.  相似文献   

2.
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.  相似文献   

3.
De novo sequencing is an important task in proteomics to identify novel peptide sequences. Traditionally, only one MS/MS spectrum is used for the sequencing of a peptide; however, the use of multiple spectra of the same peptide with different types of fragmentation has the potential to significantly increase the accuracy and practicality of de novo sequencing. Research into the use of multiple spectra is in a nascent stage. We propose a general framework to combine the two different types of MS/MS data. Experiments demonstrate that our method significantly improves the de novo sequencing of existing software.  相似文献   

4.
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.  相似文献   

5.
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.  相似文献   

6.
Although the advent of large-scale genomic sequencing has greatly simplified the task of determining the primary structures of peptides and proteins, the genomic sequences of many organisms are still unknown. Even for those that are known, modifications such as post-translational events may prevent the identification of all or part of the protein sequence. Thus, complete characterization of the protein primary structure often requires determination of the protein sequence by mass spectrometry with minimal assistance from genomic data - de novo protein sequencing. This task has been facilitated by technical developments during the past few years: 'soft' ionization techniques, new forms of chemical modification (derivatization), new types of mass spectrometer and improved software.  相似文献   

7.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

8.
Tandem mass spectrometry has emerged to be one of the most powerful high-throughput techniques for protein identification. Tandem mass spectrometry selects and fragments peptides of interest into N-terminal ions and C-terminal ions, and it measures the mass/charge ratios of these ions. The de novo peptide sequencing problem is to derive the peptide sequences from given tandem mass spectral data of k ion peaks without searching against protein databases. By transforming the spectral data into a matrix spectrum graph G = (V, E), where |V| = O(k(2)) and |E| = O(k(3)), we give the first polynomial time suboptimal algorithm that finds all the suboptimal solutions (peptides) in O(p|E|) time, where p is the number of solutions. The algorithm has been implemented and tested on experimental data. The program is available at http://hto-c.usc.edu:8000/msms/menu/denovo.htm.  相似文献   

9.
10.
Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of overconfidence that exists in the conventional target decoy method for certain types of peptide identification software.  相似文献   

11.
Because of the intrinsic physical properties of single- or double-charged ions, MALDI-based CID on these peptide precursor ions tends to be incomplete, resulting in a large number of MS/MS spectra unassigned or ambiguously identified. Consequently, the TOF/TOF high throughput capability may not be fully explored and utilized. Here, we describe a novel method for de novo sequence assignment of those MALDI TOF/TOF MS/MS spectra with incomplete or weak fragment ion series. In this approach, the deuterium-labeled lysine and leucine precursors were used in parallel to mass-tag the proteome of a metastatic human hepatocellular carcinoma (HCC) cell line during in vivo cell culturing. These stable isotope precursor markers not only position at terminal but at internal MS/MS fragment ions with the characteristic isotope pattern induced by multiple mass tagging in parallel. This enhanced signal specificity evidently resolved ambiguities in those sparse poor-quality TOF/TOF spectra by providing critical sequential links among MS/MS fragment ions. Our data-dependent approach was able to reduce many false-positives in current genome sequence-based peptide sequencing. With developing new algorithms accordingly, our approach is amenable for automation that will lead to more comprehensive and reliable identification for proteomes.  相似文献   

12.
Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.  相似文献   

13.
Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.  相似文献   

14.
Peptide sequencing using tandem mass spectrometry data is an important and challenging problem in proteomics. We address the problem of peptide sequencing for multi-charge spectra. Most peptide sequencing algorithms currently consider only charge one or two ions even for higher-charge spectra. We give a characterization of multi-charge spectra by generalizing existing models. Using our models, we analyzed spectra from Global Proteome Machine (GPM) [Craig R, Cortens JP, Beavis RC, J Proteome Res 3:1234-1242, 2004.] (with charges 1-5), Institute for Systems Biology (ISB) [Keller A, Purvine S, Nesvizhskii AI, Stolyar S, Goodlett DR, Kolker E, OMICS 6:207-212, 2002.] and Orbitrap (both with charges 1-3). Our analysis for the GPM dataset shows that higher charge peaks contribute significantly to prediction of the complete peptide. They also help to explain why existing algorithms do not perform well on multi-charge spectra. Based on these analyses, we claim that peptide sequencing algorithms can achieve higher sensitivity results if they also consider higher charge ions. We verify this claim by proposing a de novo sequencing algorithm called the greedy best strong tag (GBST) algorithm that is simple but considers higher charge ions based on our new model. Evaluation on multi-charge spectra shows that our simple GBST algorithm outperforms Lutefisk and PepNovo, especially for the GPM spectra of charge three or more.  相似文献   

15.
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.  相似文献   

16.
We report on a new de novo peptide sequencing algorithm that uses spectral graph partitioning. In this approach, relationships between m/z peaks are represented by attractive and repulsive springs, and the vibrational modes of the spring system are used to infer information about the peaks (such as "likely b-ion" or "likely y-ion"). We demonstrate the effectiveness of this approach by comparison with other de novo sequencers on test sets of ion-trap and QTOF spectra, including spectra of mixtures of peptides. On all datasets, we outperform the other sequencers. Along with spectral graph theory techniques, the new de novo sequencer EigenMS incorporates another improvement of independent interest: robust statistical methods for recalibration of time-of-flight mass measurements. Robust recalibration greatly outperforms simple least-squares recalibration, achieving about three times the accuracy for one QTOF dataset.  相似文献   

17.
Peptide mass fingerprinting (PMF) is widely used for protein identification while studying proteome via time-of-flight mass spectrometer or via 1D or 2D electrophoresis. Peptide mass tolerance indicating the fit of theoretical peptide mass to an experimental one signifcantly influences protein identification. The role of peptide mass tolerance could be estimated by counting the number of correctly identified proteins for the reference set of mass spectra. The reference set of 400 Ultraflex (Bruker Daltonics, Germany) protein mass spectra was obtained for liver microsomes slices hydrolyzed via 1D gel electrophoresis. Using a Mascot server for protein identification, the peptide mass tolerance value varied within 0.02–0.40 Da with a step of 0.01 Da. The number of identified proteins changed up to 10 times depending on the tolerance. The maximal number of identified proteins was reported for the tolerance value of 0.15 Da (120 ppm) known to be 1.5–2-fold higher than the recommended values for such a type of mass spectrometer. The software program PMFScan was developed to obtain the dependence between the number of identified proteins and the tolerance values.  相似文献   

18.
Peptide mass-fingerprint is widely used for protein identification while studying proteome with the use of 1D or 2D electrophoresis. Peptide mass tolerance indicates the fit of theoretical peptide mass with the experimental measurements, and choice of this parameter sufficiently influences the protein identification. The role of peptide mass tolerance was estimated by counting the number of identified proteins for the reference set of mass-spectra. The reference set of 400 Ultraflex (Bruker Daltonics, Germany) mass-spectra was obtained for the slices of 1D gel of liver microsomes. Using Mascot server for protein identification, the peptide mass tolerance value was varied in the range from 0.02 to 0.40 Da with a step 0.01 Da. Depending on the tolerance the number of identified protein changes up to 10 times. Maximal number of identified proteins was reported for the tolerance value of 0.15 Da (120 ppm), which is 1.5 - 2 times higher than the recommended values for such type of mass-spectrometers. The software program PMFScan was developed to obtain the dependence of number of identified proteins of the tolerance values.  相似文献   

19.
Lee YH  Kim MS  Choie WS  Min HK  Lee SW 《Proteomics》2004,4(6):1684-1694
Recently, various chemical modifications of peptides have been incorporated into mass spectrometric analyses of proteome samples, predominantly in conjunction with matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS), to facilitate de novo sequencing of peptides. In this work, we investigate systematically the utility of N-terminal sulfonation of tryptic peptides by 4-sulfophenyl isothiocyanate (SPITC) for proteome analysis by capillary reverse-phase liquid chromatography/tandem mass spectrometry (cRPLC/MS/MS). The experimental conditions for the sulfonation were carefully adjusted so that SPITC reacts selectively with the N-terminal amino groups, even in the presence of the epsilon-amino groups of lysine residues. Mass spectrometric analyses of the modified peptides by cRPLC/MS/MS indicated that SPITC derivatization proceeded toward near completion under the experimental conditions employed here. The SPITC-derivatized peptides underwent facile fragmentation, predominantly resulting in y-series ions in the MS/MS spectra. Combining SPITC derivatization and cRPLC/MS/MS analyses facilitated the acquisition of sequence information for lysine-terminated tryptic peptides as well as arginine-terminated peptides without the need for additional peptide pretreatment, such as guanidination of lysine amino group. This process alleviated the biased detection of arginine-terminated peptides that is often observed in MALDI MS experiments. We will discuss the utility of the technique as a viable method for proteome analyses and present examples of its application in analyzing samples having different levels of complexity.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号