期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection.

Methods

In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods.

Results

Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC.

Conclusions

The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests. 相似文献

15.

A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators

Laurence D Hurst Oxana Sachenkova Carsten Daub Alistair RR Forrest the FANTOM consortium Lukasz Huminiecki 《Genome biology》2014,15(7)

相似文献

16.

Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions

Yohan Kim John Sidney S?ren Buus Alessandro Sette Morten Nielsen Bjoern Peters 《BMC bioinformatics》2014,15(1)

Background

It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set.

Results

We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates.

Conclusion

It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-241) contains supplementary material, which is available to authorized users. 相似文献

17.

Notch and MAML-1 complexation do not detectably alter the DNA binding specificity of the transcription factor CSL

Del Bianco C Vedenko A Choi SH Berger MF Shokri L Bulyk ML Blacklow SC 《PloS one》2010,5(11):e15034

相似文献

18.

Quality versus accuracy: result of a reanalysis of protein-binding microarrays from the DREAM5 challenge by using BayesPI2 including dinucleotide interdependence

Junbai Wang 《BMC bioinformatics》2014,15(1)

相似文献

19.

PPARG Binding Landscapes in Macrophages Suggest a Genome-Wide Contribution of PU.1 to Divergent PPARG Binding in Human and Mouse

Sebastian Pott Nima K. Kamrani Guillaume Bourque Sven Pettersson Edison T. Liu 《PloS one》2012,7(10)

相似文献

20.

Morphogenesis-regulated localization of protein kinase A to genomic sites in Candida albicans

Alida Schaekel Prashant R Desai Joachim F Ernst 《BMC genomics》2013,14(1)

相似文献