Improved variance estimation of classification performance via reduction of bias caused by small sample size期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Improved variance estimation of classification performance via reduction of bias caused by small sample size

Authors:	Ulrika Wickenberg-Bolin Hanna Göransson Mårten Fryknäs Mats G Gustafsson Anders Isaksson

Affiliation:	(1) Department of Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 85 Uppsala, Sweden;(2) Department of Engineering Sciences, Uppsala University, Box 528, SE-751 20 Uppsala, Sweden

Abstract:	Background Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT).

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Background