首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phosphorylation site assignment of high throughput tandem mass spectrometry (LC-MS/MS) data is one of the most common and critical aspects of phosphoproteomics. Correctly assigning phosphorylated residues helps us understand their biological significance. The design of common search algorithms (such as Sequest, Mascot etc.) do not incorporate site assignment; therefore additional algorithms are essential to assign phosphorylation sites for mass spectrometry data. The main contribution of this study is the design and implementation of a linear time and space dynamic programming strategy for phosphorylation site assignment referred to as PhosSA. The proposed algorithm uses summation of peak intensities associated with theoretical spectra as an objective function. Quality control of the assigned sites is achieved using a post-processing redundancy criteria that indicates the signal-to-noise ratio properties of the fragmented spectra. The quality assessment of the algorithm was determined using experimentally generated data sets using synthetic peptides for which phosphorylation sites were known. We report that PhosSA was able to achieve a high degree of accuracy and sensitivity with all the experimentally generated mass spectrometry data sets. The implemented algorithm is shown to be extremely fast and scalable with increasing number of spectra (we report up to 0.5 million spectra/hour on a moderate workstation). The algorithm is designed to accept results from both Sequest and Mascot search engines. An executable is freely available at http://helixweb.nih.gov/ESBL/PhosSA/ for academic research purposes.  相似文献   

2.

Background

The sequence database searching has been the dominant method for peptide identification, in which a large number of peptide spectra generated from LC/MS/MS experiments are searched using a search engine against theoretical fragmentation spectra derived from a protein sequences database or a spectral library. Selecting trustworthy peptide spectrum matches (PSMs) remains a challenge.

Results

A novel scoring method named FC-Ranker is developed to assign a nonnegative weight to each target PSM based on the possibility of its being correct. Particularly, the scores of PSMs are updated by using a fuzzy SVM classification model and a fuzzy silhouette index iteratively. Trustworthy PSMs will be assigned high scores when the algorithm stops.

Conclusions

Our experimental studies show that FC-Ranker outperforms other post-database search algorithms over a variety of datasets, and it can be extended to solve a general classification problem with uncertain labels.
  相似文献   

3.
The Virtual Expert Mass Spectrometrist (VEMS) program package was developed for flexible, automated, and manual de novo tandem mass spectrometry (MS/MS) protein sequencing, and includes accessory programs for matrix-assisted laser desorption/ionization-mass spectrometry (MS) interpretation, and generation of protein and peptide databases. VEMS V2.0 has been developed into a fast tool for combining database-independent and -dependent protein assignments in an extended analysis of MS/MS-peptide data. MS or MS/MS data can be directly recalibrated after the first search by fitting the data to the best search result using polynomial equations. The score function is an improvement of known scoring algorithms and can be adapted for any MS instrument type. In addition, VEMS offers a novel statistical model for evaluating the significance of the protein assignment. The novel features are illustrated by the analysis of the fragmentation spectra obtained by liquid chromatrography-MS/MS analysis of peptides from an anionic peroxidase enriched protein fraction from potato root tissue. The extended analysis mode resulted in the additional assignment of spectra for nine modified tryptic peptides and nine miscleaved peptides, in addition to the 45 spectra from regular tryptic peptides. Of the nine modified peptides, three were glycosylated.  相似文献   

4.
We present a new approach capable of assigning charge states to peptides based on both their intact mass spectrum and their fragmentation mass spectrum. More specifically, our approach aims at fully exploiting available information to improve correct charge assignment rate. This is achieved by using information provided by the fragmentation spectrum extensively. For low-resolution spectra, charge assignment based on fragmentation mass spectrum is better than charge assignment based on intact peptide signal only. We introduce two methods that allow to integrate information contributing to successful peptide charge state assignment. We demonstrate the performance of our algorithms on large ion trap data sets. The application of these algorithms to large-scale proteomics projects can save significant computation time and have a positive impact on identification false positive rates.  相似文献   

5.
Summary A new program package, XEASY, was written for interactive computer support of the analysis of NMR spectra for three-dimensional structure determination of biological macromolecules. XEASY was developed for work with 2D, 3D and 4D NMR data sets. It includes all the functions performed by the precursor program EASY, which was designed for the analysis of 2D NMR spectra, i.e., peak picking and support of sequence-specific resonance assignments, cross-peak assignments, cross-peak integration and rate constant determination for dynamic processes. Since the program utilizes the X-window system and the Motif widget set, it is portable on a wide range of UNIX workstations. The design objective was to provide maximal computer support for the analysis of spectra, while providing the user with complete control over the final resonance assignments. Technically important features of XEASY are the use and flexible visual display of strips, i.e., two-dimensional spectral regions that contain the relevant parts of 3D or 4D NMR spectra, automated sorting routines to narrow down the selection of strips that need to be interactively considered in a particular assignment step, a protocol of resonance assignments that can be used for reliable bookkeeping, independent of the assignment strategy used, and capabilities for proper treatment of spectral folding and efficient transfer of resonance assignments between spectra of different types and different dimensionality, including projected, reduced-dimensionality triple-resonance experiments.Abbreviations 1D, 2D, 3D, 4D one-, two-, three-, four-dimensional - NOE nuclear Overhauser enhancement - NOESY nuclear Overhauser enhancement spectroscopy - TOCSY total correlation spectroscopy - COSY correlation spectroscopy - TPPI time-proportional phase incrementation  相似文献   

6.
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.  相似文献   

7.
A new program, Mapper, for semiautomatic sequence-specific NMR assignment in proteins is introduced. The program uses an input of short fragments of sequentially neighboring residues, which have been assembled based on sequential NMR connectivities and for which either the 13C and 13C chemical shifts or data on the amino acid type from other sources are known. Mapper then performs an exhaustive search for self-consistent simultaneous mappings of all these fragments onto the protein sequence. Compared to using only the individual mappings of the spectroscopically connected fragments, the global mapping adds a powerful new constraint, which results in resolving many otherwise intractable ambiguities. In an initial application, virtually complete sequence-specific assignments were obtained for a 110 kDa homooctameric protein, 7,8-dihydroneopterin aldolase from Staphylococcus aureus.  相似文献   

8.
Unambiguous identification of tandem mass spectra is a cornerstone in mass-spectrometry-based proteomics. As the study of post-translational modifications (PTMs) by means of shotgun proteomics progresses in depth and coverage, the ability to correctly identify PTM-bearing peptides is essential, increasing the demand for advanced data interpretation. Several PTMs are known to generate unique fragment ions during tandem mass spectrometry, the so-called diagnostic ions, which unequivocally identify a given mass spectrum as related to a specific PTM. Although such ions offer tremendous analytical advantages, algorithms to decipher MS/MS spectra for the presence of diagnostic ions in an unbiased manner are currently lacking. Here, we present a systematic spectral-pattern-based approach for the discovery of diagnostic ions and new fragmentation mechanisms in shotgun proteomics datasets. The developed software tool is designed to analyze large sets of high-resolution peptide fragmentation spectra independent of the fragmentation method, instrument type, or protease employed. To benchmark the software tool, we analyzed large higher-energy collisional activation dissociation datasets of samples containing phosphorylation, ubiquitylation, SUMOylation, formylation, and lysine acetylation. Using the developed software tool, we were able to identify known diagnostic ions by comparing histograms of modified and unmodified peptide spectra. Because the investigated tandem mass spectra data were acquired with high mass accuracy, unambiguous interpretation and determination of the chemical composition for the majority of detected fragment ions was feasible. Collectively we present a freely available software tool that allows for comprehensive and automatic analysis of analogous product ions in tandem mass spectra and systematic mapping of fragmentation mechanisms related to common amino acids.In mass spectrometry (MS)-based proteomics, protein mixtures are digested into peptides using standard proteases such as trypsin or Lys-C (1). The complex peptide mixture is separated via liquid chromatography (LC) directly coupled to MS, and the eluting peptide ions are electrosprayed into the vacuum of the mass spectrometer, where a peptide mass spectrum is recorded (2). In the mass spectrometer, selected peptide ions are fragmented, most commonly through the collision of peptide molecular ions with inert gas molecules in a technique referred to as either collision-induced dissociation (CID)1 or collisionally activated dissociation (3, 4). During this energetic collision, some of the deposited kinetic energy is converted into internal energy, which results in peptide bond breakage and fragmentation of the molecular peptide ion into sequence-specific ions (5). Identification of the analyzed peptide is then performed by scanning the measured peptide mass and list of fragment masses against a protein sequence database (6). Overall this approach provides a rapid and sensitive means of determining the primary sequence of peptides.During the fragmentation step, various types of fragment ions can be observed in the MS/MS spectrum. Their occurrence depends on the primary sequence of the investigated peptide, the amount of internal energy deposited, how the energy was introduced, the charge state, and other factors (7). Low-energy dissociation conditions as observed in ion trap CID mainly generate fragment ions containing sequence-specific amino acid information about the investigated peptides (8). This occurs because the energy deposited during this fragmentation method primarily facilitates the fragmentation of precursor ions yielding single peptide bond fragmentation between individual amino acids (9).With faster activation methods, such as beam-type/quadrupole CID (10), generated fragments can undergo further collisions. Multiple bonds can thereby be fragmented, giving rise to internal sequence ions, which in combination with regular b- and y-type cleavage produce specific amino-immonium ions (11). These immonium ions appear in the very low m/z range of the MS/MS spectrum, and for the majority of naturally occurring amino acids such immonium ions are unique for that particular residue (12, 13). Exceptions for this are the leucine/isoleucine and lysine/glutamine pairs, which produce immonium ions with the same chemical mass. Overall, immonium ions can confirm the presence of certain amino acid residues in a peptide, whereas information regarding the position or the stoichiometry of these amino acid residues cannot be ascertained. Because tryptic peptides on average contain 9 to 12 amino acids, they frequently contain many different residues; as a result, the analytical information hidden in the regular amino acid immonium ions might be limited. However, immonium ions can be used to support peptide sequence assignment during proteomic database searching (14).Contrary to the 20 naturally occurring residues, many amino acids can be modified by various post-translational modifications (PTMs), and these PTM-bearing residues can themselves generate unique immonium ions—the so-called diagnostic ions. The two most prominent examples are phosphorylation of tyrosine and acetylation of lysine residues (15), which generate diagnostic ions at m/z = 216.0424 and m/z = 126.0917, respectively. Thus, the presence of these unique ions in a MS/MS spectrum can unequivocally identify the sequenced peptide as harboring a given PTM. Evidently, knowledge regarding modification-specific diagnostic ions is of great importance for the identification and validation of modified peptides in MS-based proteomics (16, 17). Additionally, such PTM-specific information can be informative in targeted proteomics approaches facilitating MS/MS precursor ion scanning (18) and become valuable in post-acquisition analysis involving extracted ion chromatograms for specific m/z values. Moreover, information regarding diagnostic ions can be a powerful addition to analytical approaches such as selected reaction monitoring, a targeted technique that relies on ion-filtering capabilities to comprehensively study peptides and PTMs (19).Currently only a minor subset of modified amino acids has been investigated for diagnostic ions, primarily because of the lack of unbiased methods for mapping such ions in large-scale proteomics experiments. The identification of diagnostic ions is a labor-intensive endeavor, requiring manual interpretation of large numbers of MS/MS spectra for proper validation of low-mass fragmentation ions. As a result, most studies on diagnostic ions have been performed on a few selected synthetic peptides, as the interrogation of larger biological datasets has not been feasible (15, 20).Here we describe a proteomic approach utilizing a novel algorithm based upon binning of tandem mass spectra for fast and automated mapping of analogously occurring product ions. The developed algorithm is completely independent of instrument type and fragmentation technique employed, but it performs more favorably under experimental conditions that augment the generation of immonium ions. As a result, the performance of the algorithm is benchmarked on data derived from LTQ Orbitrap Velos and Q Exactive mass spectrometers, which exhibit improved HCD performance (2123). HCD has proven to be a powerful fragmentation technique, particularly for PTM analysis (24, 25), as no low mass detection cutoff is observed as compared with fragmentation experiments on ion trap mass spectrometers (26). Moreover, the beam-type energy deposited during HCD fragmentation allows for improved generation of both immonium and other sequence-related ions relative to CID (27, 28). Additionally, HCD experiments are performed at very high resolution, yielding high mass accuracy (<10 ppm) on all detected fragment ions, which allows the algorithm to utilize very narrow mass binning and hence easily determine the exact chemical composition of any novel detected ions.Briefly, the algorithm takes all significantly identified MS/MS spectra and bins them together in discrete mass bins. As commonly occurring ions, such as immonium and diagnostic ions, will have same chemical composition and consequently the same m/z, they will cluster in the same mass bins, whereas sequence-specific fragment ions will scatter across the binned mass range. For validation of the presented approach, we mapped known and novel diagnostic ions from a variety of PTM-bearing amino acids, demonstrating the sensitivity and specificity of the method. Moreover, we demonstrate that mass spectral binning additionally can be employed for automated mapping of composition-specific neutral losses from large-scale proteomic experiments.  相似文献   

9.
The effect of time and spatial averaging on 15N chemical shift/1H-15N dipolar correlation spectra, i.e., PISEMA spectra, of -helical membrane peptides and proteins is investigated. Three types of motion are considered: (a) Librational motion of the peptide planes in the -helix; (b) rotation of the helix about its long axis; and (c) wobble of the helix about a nominal tilt angle. A 2ns molecular dynamics simulation of helix D of bacteriorhodopsin is used to determine the effect of librational motion on the spectral parameters. For the time averaging, the rotation and wobble of this same helix are modelled by assuming either Gaussian motion about the respective angles or a uniform distribution of a given width. For the spatial averaging, regions of possible 15N chemical shift/1H-15N dipolar splittings are computed for a distribution of rotations and/or tilt angles of the helix. The computed spectra show that under certain motional modes the 15N chemical shift/1H-15N dipolar pairs for each of the residues do not form patterns which mimic helical wheel patterns. As a result, the unambiguous identification of helix tilt and helix rotation without any resonance assignments or on the basis of a single assignment may be difficult.  相似文献   

10.
Subtilin, a 32-amino acid peptide with potent antimicrobial activity, has been isolated from Bacillus subtilis ATCC6633. The chemical structure has been confirmed by the unambiguous sequence-specific assignment of its 1H NMR spectrum. Detailed NMR analysis revealed that subtilin is a rather flexible molecule; the only observed conformational contraints were those imposed by the cyclic structures created by the lanthionine and 3-methyllanthionine residues. These results suggest that in aqueous solution subtilin and the homologous peptide nisin have similar conformations.  相似文献   

11.
M Sheinblatt  Y Rahamim 《Biopolymers》1976,15(9):1643-1653
Sequential determination of glycyl residues (and in several cases different amino-acid residues) in tetra and branched peptides using the nmr technique is reported. The method is based on changes in the nmr spectra of (1) the peptide hydrogens of the different residues and (2) the methylene groups of the glycyl residues, as a result of increasing the rate of the base-catalyzed exchange reaction of the peptide hydrogens. Hence, the spectral changes are pH dependent. However, the exact pH dependence is a function of the location of the residue in the peptide molecule. Thus, it is possible to determine the sequence of the amino-acid residues by studying the changes in the spectra with pH. For peptide molecules of known sequences, the above method can be used for unequivocal assignment of the peptide hydrogen signals.  相似文献   

12.
A database of peptide chemical shifts, computed at the density functional level, has been used to develop an algorithm for prediction of 15N and 13C shifts in proteins from their structure; the method is incorporated into a program called SHIFTS (version 4.0). The database was built from the calculated chemical shift patterns of 1335 peptides whose backbone torsion angles are limited to areas of the Ramachandran map around helical and sheet configurations. For each tripeptide in these regions of regular secondary structure (which constitute about 40% of residues in globular proteins) SHIFTS also consults the database for information about sidechain torsion angle effects for the residue of interest and for the preceding residue, and estimates hydrogen bonding effects through an empirical formula that is also based on density functional calculations on peptides. The program optionally searches for alternate side-chain torsion angles that could significantly improve agreement between calculated and observed shifts. The application of the program on 20 proteins shows good consistency with experimental data, with correlation coefficients of 0.92, 0.98, 0.99 and 0.90 and r.m.s. deviations of 1.94, 0.97, 1.05, and 1.08 ppm for 15N, 13C, 13C and 13C, respectively. Reference shifts fit to protein data are in good agreement with `random-coil' values derived from experimental measurements on peptides. This prediction algorithm should be helpful in NMR assignment, crystal and solution structure comparison, and structure refinement.  相似文献   

13.
Summary A new computer-based approach is described for efficient sequence-specific assignment of uniformly 15N-labeled proteins. For this purpose three-dimensional 15N-correlated [1H, 1H]-NOESY spectra are divided up into two-dimensional 1H-1H strips which extend over the entire spectral width along one dimension and have a width of ca. 100 Hz, centered about the amide proton chemical shifts along the other dimension. A spectral correlation function enables sorting of these strips according to proximity of the corresponding residues in the amino acid sequence. Thereby, starting from a given strip in the spectrum, the probability of its corresponding to the C-terminal neighboring residue is calculated for all other strips from the similarity of their peak patterns with a pattern predicted for the sequentially adjoining residue, as manifested in the scalar product of the vectors representing the predicted and measured peak patterns. Tests with five different proteins containing both -helices and -sheets, and ranging in size from 58 to 165 amino acid residues show that the discrimination achieved between the sequentially neighboring residue and all other residues compares well with that obtained with an unguided interactive search of pairs of sequentially neighboring strips, with important savings in the time needed for complete analysis of 3D 15N-correlated [1H, 1H]-NOESY spectra. The integration of this routine into the program package XEASY ensures that remaining ambiguities can be resolved by visual inspection of the strips, combined with reference to the amino acid sequence and information on spin-system types obtained from additional NMR spectra.Abbreviations 1D, 2D, 3D, 4D one-, two-, three-, four-dimensional - NOE nuclear Overhauser enhancement - NOESY nuclear Overhauser enhancement spectroscopy - COSY correlation spectroscopy - TOCSY total correlation spectroscopy  相似文献   

14.
Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from “mixture spectra” to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.Peptide identification by tandem mass spectra is a critical step in mass spectrometry (MS)-based1 proteomics (1). Numerous computational algorithms and software tools have been developed for this purpose (26). These algorithms can be classified into three categories: (i) pattern-based database search, (ii) de novo sequencing, and (iii) hybrid search that combines database search and de novo sequencing. With the continuous development of high-performance liquid chromatography and high-resolution mass spectrometers, it is now possible to analyze almost all protein components in mammalian cells (7). In contrast to rapid data collection, it remains a challenge to extract accurate information from the raw data to identify peptides with low false positive rates (specificity) and minimal false negatives (sensitivity) (8).Database search methods usually assign peptide sequences by comparing MS/MS spectra to theoretical peptide spectra predicted from a protein database, as exemplified in SEQUEST (9), Mascot (10), OMSSA (11), X!Tandem (12), Spectrum Mill (13), ProteinProspector (14), MyriMatch (15), Crux (16), MS-GFDB (17), Andromeda (18), BaMS2 (19), and Morpheus (20). Some other programs, such as SpectraST (21) and Pepitome (22), utilize a spectral library composed of experimentally identified and validated MS/MS spectra. These methods use a variety of scoring algorithms to rank potential peptide spectrum matches (PSMs) and select the top hit as a putative PSM. However, not all PSMs are correctly assigned. For example, false peptides may be assigned to MS/MS spectra with numerous noisy peaks and poor fragmentation patterns. If the samples contain unknown protein modifications, mutations, and contaminants, the related MS/MS spectra also result in false positives, as their corresponding peptides are not in the database. Other false positives may be generated simply by random matches. Therefore, it is of importance to remove these false PSMs to improve dataset quality. One common approach is to filter putative PSMs to achieve a final list with a predefined false discovery rate (FDR) via a target-decoy strategy, in which decoy proteins are merged with target proteins in the same database for estimating false PSMs (2326). However, the true and false PSMs are not always distinguishable based on matching scores. It is a problem to set up an appropriate score threshold to achieve maximal sensitivity and high specificity (13, 27, 28).De novo methods, including Lutefisk (29), PEAKS (30), NovoHMM (31), PepNovo (32), pNovo (33), Vonovo (34), and UniNovo (35), identify peptide sequences directly from MS/MS spectra. These methods can be used to derive novel peptides and post-translational modifications without a database, which is useful, especially when the related genome is not sequenced. High-resolution MS/MS spectra greatly facilitate the generation of peptide sequences in these de novo methods. However, because MS/MS fragmentation cannot always produce all predicted product ions, only a portion of collected MS/MS spectra have sufficient quality to extract partial or full peptide sequences, leading to lower sensitivity than achieved with the database search methods.To improve the sensitivity of the de novo methods, a hybrid approach has been proposed to integrate peptide sequence tags into PSM scoring during database searches (36). Numerous software packages have been developed, such as GutenTag (37), InsPecT (38), Byonic (39), DirecTag (40), and PEAKS DB (41). These methods use peptide tag sequences to filter a protein database, followed by error-tolerant database searching. One restriction in most of these algorithms is the requirement of a minimum tag length of three amino acids for matching protein sequences in the database. This restriction reduces the sensitivity of the database search, because it filters out some high-quality spectra in which consecutive tags cannot be generated.In this paper, we describe JUMP, a novel tag-based hybrid algorithm for peptide identification. The program is optimized to balance sensitivity and specificity during tag derivation and MS/MS pattern matching. JUMP can use all potential sequence tags, including tags consisting of only one amino acid. When we compared its performance to that of two widely used search algorithms, SEQUEST and Mascot, JUMP identified ∼30% more PSMs at the same FDR threshold. In addition, the program provides two additional features: (i) using tag sequences to improve modification site assignment, and (ii) analyzing co-fragmented peptides from mixture MS/MS spectra.  相似文献   

15.
The NMR structure of the peptide deformylase (PDF) (1–150) from Escherichia coli, which is an essential enzyme that removes the formyl group from nascent polypeptides and represents a potential target for drug discovery, was determined using 15N/13C doubly labeled protein. Nearly completely automated assignment routines were employed to assign three-dimensional triple resonance, 15N-resolved and 13C-resolved NOESY spectra using the program GARANT. This assignment strategy, demonstrated on a 17 kDa protein, is a significant advance in the automation of NMR data assignment and structure determination that will accelerate future work. A total of 2302 conformational constraints were collected as input for the distance geometry program DYANA. After restrained energy minimization with the program X-PLOR the 20 best conformers characterize a high quality structure with an average of 0.43 Å for the root-mean-square deviation calculated from the backbone atoms N, C and C, and 0.81 Å for all heavy atoms of the individual conformers relative to the mean coordinates for residues 1 to 150. The globular fold of PDF contains two -helices comprising residues 25–40, 125–138, six -strands 57–60, 70–77, 85–88, 98–101, 105–111, 117–123 and one 310 helix comprising residues 49–51. The C-terminal helix contains the HEXXH motif positioning a zinc ligand in a similar fashion to other metalloproteases, with the third ligand being cysteine and the fourth presumably a water. The three-dimensional structure of PDF affords insight into the substrate recognition and specificity for N-formylated over N-acetylated substrates and is compared to other PDF structures.  相似文献   

16.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

17.
The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.The success of tandem MS (MS/MS1) approaches to peptide identification is partly due to advances in computational techniques allowing for the reliable interpretation of MS/MS spectra. Mainstream computational techniques mainly fall into two categories: database search approaches that score each spectrum against peptides in a sequence database (14) or de novo techniques that directly reconstruct the peptide sequence from each spectrum (58). The combination of these methods with advances in high-throughput MS/MS have promoted the accelerated growth of spectral libraries, collections of peptide MS/MS spectra the identification of which were validated by accepted statistical methods (9, 10) and often also manually confirmed by mass spectrometry experts. The similar concept of spectral archives was also recently proposed to denote spectral libraries including “interesting” nonidentified spectra (11) (i.e. recurring spectra with good de novo reconstructions but no database match). The growing availability of these large collections of MS/MS spectra has reignited the development of alternative peptide identification approaches based on spectral matching (1214) and alignment (1517) algorithms.However, mainstream approaches were developed under the (often unstated) assumption that each MS/MS spectrum is generated from a single peptide. Although chromatographic procedures greatly contribute to making this a reasonable assumption, there are several situations where it is difficult or even impossible to separate pairs of peptides. Examples include certain permutations of the peptide sequence or post-translational modifications (see (18) for examples of co-eluting histone modification variants). In addition, innovative experimental setups have demonstrated the potential for increased throughput in peptide identification using mixture spectra; examples include data-independent acquisition (19) ion-mobility MS (20), and MSE strategies (21).To alleviate the algorithmic bottleneck in such scenarios, we describe a computational approach, M-SPLIT (mixture-spectrum partitioning using library of identified tandem mass spectra), that is able to reliably and efficiently identify peptides from mixture spectra, which are generated from a pair of peptides. In brief, a mixture spectrum is modeled as linear combination of two single-peptide spectra, and peptide identification is done by searching against a spectral library. We show that efficient filtration and accurate branch-and-bound strategies can be used to avoid the huge computational cost of searching all possible pairs. Thus equipped, our approach is able to identify the correct matches by considering only a minuscule fraction of all possible matches. Beyond potentially enhancing the identification capabilities of current MS/MS acquisition setups, we argue that the availability of methods to reliably identify MS/MS spectra from mixtures of peptides could enable the collection of MS/MS data using accelerated chromatography setups to obtain the same or better peptide identification results in a fraction of the experimental time currently required for exhaustive peptide separation.  相似文献   

18.
The possibility of mass spectrometric sequencing of peptides without the need for the conventional MS/MS analysis has been demonstrated experimentally. The peptide hydrolysate was fractionated by reversephase chromatography on a microbore column. The eluate fraction was injected into the mass spectrometer via an electrospray ion source that directly coupled a liquid chromatography instrument to a time-of-flight mass spectrometer (HPLC-MS). Fragmentation of the peptides eluted from the column was performed in the mass spectrometer interface by varying the voltage difference between the mass spectrometer nozzle and skimmer. A restricted set of intensive peaks of y-ions, which corresponded to sequential cleavage of all amino acids from the peptide, was obtained. The ratios of the y-ion peak intensities to the background were (5?100)/1. The presence of Lys and Arg in the peptides provided for a substantial increase of informative peak intensity in the mass spectra. The mass spectra of short peptides (up to 10 residues) were processed manually, whereas the Proteos hardware and software system was used to process the fragmentation results for a long N-terminal peptide of the human hemoglobin α-chain.  相似文献   

19.

Background

High resolution mass spectrometry has been employed to rapidly and accurately type and subtype influenza viruses. The detection of signature peptides with unique theoretical masses enables the unequivocal assignment of the type and subtype of a given strain. This analysis has, to date, required the manual inspection of mass spectra of whole virus and antigen digests.

Results

A computer algorithm, FluTyper, has been designed and implemented to achieve the automated analysis of MALDI mass spectra recorded for proteolytic digests of the whole influenza virus and antigens. FluTyper incorporates the use of established signature peptides and newly developed naïve Bayes classifiers for four common influenza antigens, hemagglutinin, neuraminidase, nucleoprotein, and matrix protein 1, to type and subtype the influenza virus based on their detection within proteolytic peptide mass maps. Theoretical and experimental testing of the classifiers demonstrates their applicability at protein coverage rates normally achievable in mass mapping experiments. The application of FluTyper to whole virus and antigen digests of a range of different strains of the influenza virus is demonstrated.

Conclusions

FluTyper algorithm facilitates the rapid and automated typing and subtyping of the influenza virus from mass spectral data. The newly developed naïve Bayes classifiers increase the confidence of influenza virus subtyping, especially where signature peptides are not detected. FluTyper is expected to popularize the use of mass spectrometry to characterize influenza viruses.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号