RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications |
| |
Authors: | Cao Weifeng Ma Di Kapur Arvinder Patankar Manish S Ma Yadi Li Lingjun |
| |
Institution: | Department of Chemistry, University of Wisconsin-Madison, 777 Highland Ave., Madison, WI 53705, USA. wcao2@wisc.edu |
| |
Abstract: | Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|