首页 | 本学科首页   官方微博 | 高级检索  
     


Estimating false discovery rates for peptide and protein identification using randomized databases
Authors:Gregory Hather  Roger Higdon  Andrew Bauman  Priska D. von Haller  Eugene Kolker
Affiliation:1. Bioinformatics & High‐throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA, USA;2. High‐throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA;3. Predictive Analytics, Seattle Children's Hospital, Seattle, WA, USA;4. Department of Genome Sciences, University of Washington, Seattle, WA, USA;5. Proteomics Resource, University of Washington, Seattle, WA, USA;6. Division of Biomedical & Health Informatics, Department of Medical Education & Biomedical Informatics, School of Medicine, University of Washington, Seattle, WA, USA
Abstract:MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).
Keywords:Bioinformatics  False discovery rates  Peptide identification  Protein identification  Randomized databases
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号