共查询到20条相似文献,搜索用时 15 毫秒
1.
Manual checking is commonly employed to validate the phosphopeptide identifications from database searching of tandem mass spectra. It is very time-consuming and labor intensive as the number of phosphopeptide identifications increases greatly. In this study, a simple automatic validation approach was developed for phosphopeptide identification by combining consecutive stage mass spectrometry data and the target-decoy database searching strategy. Only phosphopeptides identified from both MS2 and its corresponding MS3 were accepted for further filtering, which greatly improved the reliability in phosphopeptide identification. Before database searching, the spectra were validated for charge state and neutral loss peak intensity, and then the invalid MS2/MS3 spectra were removed, which greatly reduced the database searching time. It was found that the sensitivity was significantly improved in MS2/MS3 strategy as the number of identified phosphopeptides was 2.5 times that obtained by the conventional filter-based MS2 approach. Because of the use of the target-decoy database, the false-discovery rate (FDR) of the identified phosphopeptides could be easily determined, and it was demonstrated that the determined FDR can precisely reflect the actual FDR without any manual validation stage. 相似文献
2.
Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics 总被引:1,自引:0,他引:1
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments. 相似文献
3.
Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry 总被引:1,自引:0,他引:1
Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account. 相似文献
4.
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination. 相似文献
5.
Authentic biomarkers, distilling the essence of a complex, functionally significant process in a mammalian system into a precise, physicochemical measurement have been implicated as a tool of increasing importance for drug discovery and development. However, even in spite of recent technological advances, validating a new biomarker candidate, where generation of suitable antibodies is required, is still a long-lasting task. Methods to accelerate initial validation by MS approaches have been suggested, but all methods described so far are associated with serious drawbacks, finally leading to non-generic methods of detection and quantification. Moreover, when complex body fluids are used as samples, efficient debulking strategies are crucial to open a window of analytical sensitivity in the ng/mL range, where many diagnostically relevant analytes are present. Here we report the proof-of-principle of a multi-dimensional strategy for accelerated initial validation of biomarker candidates by MS, which promises to be generally applicable, sensitive and quantitative. The method presented employs a combination of electrophoretic and chromatographic steps on the peptide level, followed by MS quantification using isotopically labeled synthetic peptides as internal standards. Our proposed workflow includes up to four dimensions, finally resulting in a desired LOD sufficient to detect and quantify diagnostically relevant analytes from complex samples. Although the current state of the method only represents a starting point for further validation and development, it reveals great potential in biomarker validation. 相似文献
6.
Validation of endogenous peptide identifications using a database of tandem mass spectra 总被引:1,自引:0,他引:1
Fälth M Svensson M Nilsson A Sköld K Fenyö D Andren PE 《Journal of proteome research》2008,7(7):3049-3053
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches. 相似文献
7.
Estimating the statistical significance of peptide identifications from shotgun proteomics experiments 总被引:1,自引:0,他引:1
Higgs RE Knierman MD Freeman AB Gelbert LM Patil ST Hale JE 《Journal of proteome research》2007,6(5):1758-1767
We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone. 相似文献
8.
James G Rohrbough Linda Breci Nirav Merchant Susan Miller Paul A Haynes 《Journal of biomolecular techniques》2006,17(5):327-332
Data produced from the MudPIT analysis of yeast (S. cerevisiae) and rice (O. sativa) were used to develop a technique to validate single-peptide protein identifications using complementary database search algorithms. This results in a considerable reduction of overall false-positive rates for protein identifications; the overall false discovery rates in yeast are reduced from near 25% to less than 1%, and the false discovery rate of yeast single-peptide protein identifications becomes negligible. This technique can be employed by laboratories utilizing a SEQUEST-based proteomic analysis platform, incorporating the XTandem algorithm as a complementary tool for verification of single-peptide protein identifications. We have achieved this using open-source software, including several data-manipulation software tools developed in our laboratory, which are freely available to download. 相似文献
9.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments. 相似文献
10.
Background
Analysis of complex samples with tandem mass spectrometry (MS/MS) has become routine in proteomic research. However, validation of database search results creates a bottleneck in MS/MS data processing. Recently, methods based on a randomized database have become popular for quality control of database search results. However, a consequent problem is the ignorance of how to combine different database search scores to improve the sensitivity of randomized database methods. 相似文献11.
Wymelenberg AV Sabat G Martinez D Rajangam AS Teeri TT Gaskell J Kersten PJ Cullen D 《Journal of biotechnology》2005,118(1):17-34
The white rot basidiomycete, Phanerochaete chrysosporium, employs an array of extracellular enzymes to completely degrade the major polymers of wood: cellulose, hemicellulose and lignin. Towards the identification of participating enzymes, 268 likely secreted proteins were predicted using SignalP and TargetP algorithms. To assess the reliability of secretome predictions and to evaluate the usefulness of the current database, we performed shotgun LC-MS/MS on cultures grown on standard cellulose-containing medium. A total of 182 unique peptide sequences were matched to 50 specific genes, of which 24 were among the secretome subset. Underscoring the rich genetic diversity of P. chrysosporium, identifications included 32 glycosyl hydrolases. Functionally interconnected enzyme groups were recognized. For example, the multiple endoglucanases and processive exocellobiohydrolases observed quite probably attack cellulose in a synergistic manner. In addition, a hemicellulolytic system included endoxylanases, alpha-galactosidase, acetyl xylan esterase, and alpha-l-arabinofuranosidase. Glucose and cellobiose metabolism likely involves cellobiose dehydrogenase, glucose oxidase, and various inverting glycoside hydrolases, all perhaps enhanced by an epimerase. To evaluate the completeness of the current database, mass spectroscopy analysis was performed on a larger and more inclusive dataset containing all possible ORFs. This allowed identification of a previously undetected hypothetical protein and a putative acid phosphatase. The expression of several genes was supported by RT-PCR amplification of their cDNAs. 相似文献
12.
13.
14.
Viswanadham Sridhara Dina L Bai An Chi Jeffrey Shabanowitz Donald F Hunt Stephen H Bryant Lewis Y Geer 《Proteome science》2012,10(1):1-10
Background
Early diagnosis and treatment of Mycobacterium tuberculosis infection can prevent most deaths resulting from this pathogen; however, multidrug-resistant strains present serious threats to global tuberculosis control and prevention efforts. In this study, we identified antigens that could be used for the serodiagnosis of drug-resistant M. tuberculosis strains, using a proteomics-based analysis.Results
Serum from patients infected with drug-resistant or drug-susceptible M. tuberculosis strains and healthy controls was subjected to two-dimensional gel electrophoresis using a western blot approach. This procedure identified nine immunoreactive proteins, which were subjected to MALDI-TOF-MS analysis. Six recombinant proteins, namely rRv2031c, rRv0444c, rRv2145c, rRv3692, rRv0859c, and rRv3040, were expressed and used to determine the immuno-reactivity of 100 serum samples. Antibody reactivity against rRv2031c, rRv3692, and rRv0444c was consistently observed. Among them, the best sensitivity and specificity of rRv3692 were 37% and 95% respectively. Furthermore, when rRv2031c and rRv3692 or rRv2031c, rRv3692, and rRv0444c were combined in 2:1 or equal amounts, the assay sensitivity and specificity were improved to 56.7% and 100% respectively.Conclusions
These results suggest that Rv2031c, Rv3692, and Rv0444c are possible candidate biomarkers for effective use in the serodiagnosis of drug-resistant tuberculosis infections, and a combined formula of these antigens should be considered when designing a subunit assay kit. 相似文献15.
Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification. 相似文献
16.
Charles Taylor and John Marshall explain the utility of mathematical modeling for evaluating the effectiveness of population replacement strategy. Insight is given into how computational models can provide information on the population dynamics of mosquitoes and the spread of transposable elements through A. gambiae subspecies. The ethical considerations of releasing genetically modified mosquitoes into the wild are discussed. 相似文献
17.
Nico Pfeifer Andreas Leinenbach Christian G Huber Oliver Kohlbacher 《BMC bioinformatics》2007,8(1):468
Background
High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. 相似文献18.
General framework for developing and evaluating database scoring algorithms using the TANDEM search engine 总被引:2,自引:0,他引:2
MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Source code for the scoring functions is available from http://proteomics.fhcrc.org 相似文献
19.
The individualization of radiotherapy treatment would be beneficial for cancer patients; however, there are no predictive biomarkers of radiotherapy resistance in routine clinical use. This article describes the body of work in this field where comparative proteomics methods have been used for the discovery of putative biomarkers associated with radiotherapy resistance. A large number of differentially expressed proteins have been reported, mostly from the study of novel radiotherapy-resistant cell lines. Here, we have assessed these putative biomarkers through the discovery, confirmation and validation phases of the biomarker pipeline, and inform the reader on the current status of proteomics-based findings. Suggested avenues for future work are discussed. 相似文献
20.
MOTIVATION: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required. RESULTS: We present a novel method for fast and accurate homology detection, assuming that the Smith-Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available. 相似文献