Improved variance estimation of classification performance via reduction of bias caused by small sample size |
| |
Authors: | Ulrika Wickenberg-Bolin Hanna Göransson Mårten Fryknäs Mats G Gustafsson Anders Isaksson |
| |
Affiliation: | (1) Department of Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 85 Uppsala, Sweden;(2) Department of Engineering Sciences, Uppsala University, Box 528, SE-751 20 Uppsala, Sweden |
| |
Abstract: |
Background Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|