首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

2.
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow a similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10-40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.  相似文献   

3.
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.  相似文献   

4.
Analysis of a database containing over 20,000 high-resolution collision-activation mass spectra of tryptic peptide dications was employed to study the relative specificity of neutral losses from backbone fragments. The high resolution of the FTMS instrument allowed for the first time the first isotope of the water loss and the monoisotope of the ammonia loss to be distinguished. Contrary to a popular belief, water losses from y' ions are not specific enough to rely upon for detecting the presence of amino acids with oxygen in the side chains. At the same time, ammonia loss from b ions is sufficiently specific (>95%) to detect the presence of amino acids Gln, Asn, His, Lys, and Arg. This feature will be useful for de novo algorithms for high-resolution MS data. Clear trends were observed when the effect of amino acids proximate to the cleavage site on the rate of loss formation was studied. These trends turned out to be different for losses from b and y ions.  相似文献   

5.
Proteomic techniques are fast becoming the main method for qualitative and quantitative determination of the protein content in biological systems. Despite notable advances, efficient and accurate analysis of high throughput proteomic data generated by mass spectrometers remains one of the major stumbling blocks in the protein identification problem. We present a model for the number of random matches between an experimental MS-MS spectrum and a theoretical spectrum of a peptide. The shape of the probability distribution is a function of the experimental accuracy, the number of peaks in the experimental spectrum, the length of the interval over which the peaks are distributed, and the number of theoretical spectral peaks in this interval. Based on this probability distribution, a goodness-of-fit tool can be used to yield fast and accurate scoring schemes for peptide identification through database search. In this paper, we describe one possible implementation of such a method and compare the performance of the resulting scoring function with that of SEQUEST. In terms of speed, our algorithm is roughly two orders of magnitude faster than the SEQUEST program, and its accuracy of peptide identification compares favorably to that of SEQUEST. Moreover, our algorithm does not use information related to the intensities of the peaks.  相似文献   

6.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.  相似文献   

7.
Circular dichroism (CD) is a spectroscopic technique commonly used to investigate the structure of proteins. Major secondary structure types, alpha‐helices and beta‐strands, produce distinctive CD spectra. Thus, by comparing the CD spectrum of a protein of interest to a reference set consisting of CD spectra of proteins of known structure, predictive methods can estimate the secondary structure of the protein. Currently available methods, including K2D2, use such experimental CD reference sets, which are very small in size when compared to the number of tertiary structures available in the Protein Data Bank (PDB). Conversely, given a PDB structure, it is possible to predict a theoretical CD spectrum from it. The methodological framework for this calculation was established long ago but only recently a convenient implementation called DichroCalc has been developed. In this study, we set to determine whether theoretically derived spectra could be used as reference set for accurate CD based predictions of secondary structure. We used DichroCalc to calculate the theoretical CD spectra of a nonredundant set of structures representing most proteins in the PDB, and applied a straightforward approach for predicting protein secondary structure content using these theoretical CD spectra as reference set. We show that this method improves the predictions, particularly for the wavelength interval between 200 and 240 nm and for beta‐strand content. We have implemented this method, called K2D3, in a publicly accessible web server at http://www. ogic.ca/projects/k2d3 . Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

8.
A new multi-model approach (MMA) for sweat loss prediction is proposed to improve prediction accuracy. MMA was computed as the average of sweat loss predicted by two existing thermoregulation models: i.e., the rational model SCENARIO and the empirical model Heat Strain Decision Aid (HSDA). Three independent physiological datasets, a total of 44 trials, were used to compare predictions by MMA, SCENARIO, and HSDA. The observed sweat losses were collected under different combinations of uniform ensembles, environmental conditions (15–40°C, RH 25–75%), and exercise intensities (250–600 W). Root mean square deviation (RMSD), residual plots, and paired t tests were used to compare predictions with observations. Overall, MMA reduced RMSD by 30–39% in comparison with either SCENARIO or HSDA, and increased the prediction accuracy to 66% from 34% or 55%. Of the MMA predictions, 70% fell within the range of mean observed value ± SD, while only 43% of SCENARIO and 50% of HSDA predictions fell within the same range. Paired t tests showed that differences between observations and MMA predictions were not significant, but differences between observations and SCENARIO or HSDA predictions were significantly different for two datasets. Thus, MMA predicted sweat loss more accurately than either of the two single models for the three datasets used. Future work will be to evaluate MMA using additional physiological data to expand the scope of populations and conditions.  相似文献   

9.
We describe the application of a peptide retention time reversed phase liquid chromatography (RPLC) prediction model previously reported (Petritis et al. Anal. Chem. 2003, 75, 1039) for improved peptide identification. The model uses peptide sequence information to generate a theoretical (predicted) elution time that can be compared with the observed elution time. Using data from a set of known proteins, the retention time parameter was incorporated into a discriminant function for use with tandem mass spectrometry (MS/MS) data analyzed with the peptide/protein identification program SEQUEST. For singly charged ions, the number of confident identifications increased by 12% when the elution time metric is included compared to when mass spectral data is the sole source of information in the context of a Drosophila melanogaster database. A 3-4% improvement was obtained for doubly and triply charged ions for the same biological system. Application to the larger Rattus norvegicus (rat) and human proteome databases resulted in an 8-9% overall increase in the number of confident identifications, when both the discriminant function and elution time are used. The effect of adding "runner-up" hits (peptide matches that are not the highest scoring for a spectra) from SEQUEST is also explored, and we find that the number of confident identifications is further increased by 1% when these hits are also considered. Finally, application of the discriminant functions derived in this work with approximately 2.2 million spectra from over three hundred LC-MS/MS analyses of peptides from human plasma protein resulted in a 16% increase in confident peptide identifications (9022 vs 7779) using elution time information. Further improvements from the use of elution time information can be expected as both the experimental control of elution time reproducibility and the predictive capability are improved.  相似文献   

10.
We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z values. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2 to 3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.  相似文献   

11.
Phosphopeptide identification and phosphorylation site localization are crucial aspects of many biological studies. Furthermore, multiple phosphorylations of peptides make site localization even more difficult. We developed a probability-based method to unambiguously determine phosphorylation sites within phosphopeptides using MS2/3 pair information. A comparison test was performed with SEQUEST and MASCOT predictions using a spectral data set from a synthetic doubly phosphorylated peptide, and the results showed that PhosphoScan analysis yielded a 63% phosphopeptide localization improvement compared with SEQUEST and a 57% improvement compared with MASCOT.  相似文献   

12.
Experimental proof is given that the volume distribution spectrum of mammalian cells in suspension culture can be determined accurately with a Coulter spectrometer. Stable spectra corresponding to the predictions of a mathematical model are observed under favorable conditions of growth. Cell volume spectrometry appears to be a useful method for diagnosing the state of the culture with respect to past uniformity of growth rate and present population age distribution. In addition, it offers a method for quantitative study of the laws governing cell growth and division.  相似文献   

13.
Methyl esters of [5]-ladderanoic acid and [3]-ladderanoic acid were prepared by esterification of the acids isolated from biomass at a wastewater treatment plant. Optical rotations at six different wavelengths (633, 589, 546, 436, 405 and 365 nm) and vibrational circular dichroism (VCD) spectra in the 1800–900 cm−1 region were measured in CDCl3 solvent and compared with quantum chemical (QC) predictions using B3LYP functional and 6-311++G(2d,2p) basis set with polarizing continuum model representing the solvent. QC predictions gave negative optical rotations at all six wavelengths for (R)-methyl [5]-ladderanoate and positive optical rotations for (R)-methyl [3]-ladderanoate, the same signs as previously reported for the corresponding acids. The crystal structure of (−)-methyl [5]-ladderanoate independently confirmed (R) configuration. The QC-predicted VCD spectra using Boltzmann population weighted spectra of individual conformers did not provide satisfactory quantitative agreement with the experimental VCD spectra. An improved quantitative agreement for VCD spectra could be obtained when conformer populations were optimized to maximize the similarity between experimental and predicted VCD spectra, but more improvements in VCD predictions are needed.  相似文献   

14.
MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The MS/MS search software was evaluated by use of a high mass accuracy dataset and its results compared with those from MASCOT, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than MASCOT, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable PTMs did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with MASCOT, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ~100 000 spectra per hour when searching against a complete human database with eight variable modifications. The algorithm is available for public searches at http://www.massmatrix.net.  相似文献   

15.
Time-consuming and experience-dependent manual validations of tandem mass spectra are usually applied to SEQUEST results. This inefficient method has become a significant bottleneck for MS/MS data processing. Here we introduce a program AMASS (advanced mass spectrum screener), which can filter the tandem mass spectra of SEQUEST results by measuring the match percentage of high-abundant ions and the continuity of matched fragment ions in b, y series. Compared with Xcorr and DeltaCn filter, AMASS can increase the number of positives and reduce the number of negatives in 22 datasets generated from 18 known protein mixtures. It effectively removed most noisy spectra, false interpretations, and about half of poor fragmentation spectra, and AMASS can work synergistically with Rscore filter. We believe the use of AMASS and Rscore can result in a more accurate identification of peptide MS/MS spectra and reduce the time and energy for manual validation.  相似文献   

16.
Phosphorylation has been the most studied of all the posttranslational modifications of proteins. Mass spectrometry has emerged as a powerful tool for phosphomapping on proteins/peptides. Collision-induced dissociation (CID) of phosphopeptides leads to the loss of phosphoric or metaphosphoric acid as a neutral molecule, giving an intense neutral loss product ion in the mass spectrum. Dissociation of the neutral loss product ion identifies peptide sequence. This method of data-dependent constant neutral loss (DDNL) scanning analysis has been commonly used for mapping phosphopeptides. However, preferential losses of groups other than phosphate are frequently observed during CID of phosphopeptides. Ions that result from such losses are not identified during DDNL analysis due to predetermined scanning for phosphate loss. In this study, we describe an alternative approach for improved identification of phosphopeptides by sequential abundant ion fragmentation analysis (SAIFA). In this approach, there is no predetermined neutral loss molecule, thereby undergoing sequential fragmentation of abundant peak, irrespective of the moiety lost during CID. In addition to improved phosphomapping, the method increases the sequence coverage of the proteins identified, thereby increasing the confidence of protein identification. To the best of our knowledge, this is the first report to use SAIFA for phosphopeptide identification.  相似文献   

17.
Abstract.— Palumbi et al. (2001) proposed a "three-times rule" that uses mitochondrial DNA (mtDNA) sequences to predict probabilities of monophyly for nuclear loci (i.e., whether the alleles within a taxon coalesce with one another before they coalesce with alleles from a sister taxon). They use neutral coalescent theory to infer these probabilities from the ratio of interspecific divergence to intraspecific variation of mtDNA. We show that the estimated probabilities have very wide confidence intervals because of the inherent stochasticity of the mtDNA coalescent process. Under neutrality, the true probability of monophyly can be much higher, or much lower, than predicted by the three-times rule. We also review recent empirical and theoretical studies that refute neutrality-based predictions concerning mtDNA variation and divergence. We conclude that the three-times rule is neither a useful test for neutral molecular evolution nor a reliable guide to genealogical species.  相似文献   

18.
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.  相似文献   

19.
Ammonia losses during swine wastewater treatment were examined using single- and two-chambered microbial fuel cells (MFCs). Ammonia removal was 60% over 5 days for a single-chamber MFC with the cathode exposed to air (air-cathode), versus 69% over 13 days from the anode chamber in a two-chamber MFC with a ferricyanide catholyte. In both types of systems, ammonia losses were accelerated with electricity generation. For the air-cathode system, our results suggest that nitrogen losses during electricity generation were increased due to ammonia volatilization with conversion of ammonium ion to the more volatile ammonia species as a result of an elevated pH near the cathode (where protons are consumed). This loss mechanism was supported by abiotic tests (applied voltage of 1.1 V). In a two-chamber MFC, nitrogen losses were primarily due to ammonium ion diffusion through the membrane connecting the anode and cathode chambers. This loss was higher with electricity generation as the rate of ammonium transport was increased by charge transfer across the membrane. Ammonia was not found to be used as a substrate for electricity generation, as intermittent ammonia injections did not produce power. The ammonia-oxidizing bacterium Nitrosomonas europaea was found on the cathode electrode of the single-chamber system, supporting evidence of biological nitrification, but anaerobic ammonia-oxidizing bacteria were not detected by molecular analyses. It is concluded that ammonia losses from the anode chamber were driven primarily by physical-chemical factors that are increased with electricity generation, although some losses may occur through biological nitrification and denitrification.  相似文献   

20.
Two-dimensional proton nuclear magnetic resonance nuclear Overhauser effect experiments have been performed at a series of mixing times on proflavine and on a DNA octamer duplex [d-(GGAATTCC)]2 in solution. Using the complete matrix approach recently explored theoretically (Keepers and James, 1984), proton-proton internuclear distances were determined quantitatively for proflavine from the two-dimensional nuclear Overhauser effect results. Since proflavine is a rigid molecule with X-ray crystal structure determined, interproton distances obtained from the two-dimensional nuclear Overhauser effect experiments in solution can be compared with those for the crystalline compound agreement is better than 10 %. Experimental two-dimensional nuclear Overhauser effect spectral data for [d-(GGAATTCC)]2 were analyzed by comparison with theoretical two-dimensional nuclear Overhauser effect spectra at each mixing time calculated using the complete 70 × 70 relaxation matrix. The theoretical spectra were calculated using two structures: a standard B-form DNA structure and an energy-minimized structure based on similarity of the octamer's six internal residues with those of [d-(CGCGAATTCGCG)]2, for which the crystal structure has been determined. Neither the standard B-DNA nor the energy-minimized structure yield theoretical two-dimensional nuclear Overhauser effect spectra which accurately reproduce all experimental peak intensities. But many aspects of the experimental spectra can be represented by both the B-DNA and the energy-minimized structure. In general, the energy-minimized structure yields theoretical two-dimensional nuclear Overhauser effect spectra which mimic many, if not all, features of the experimental, spectra including structural characteristics at the purine-pyrimidine junction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号