Good–Turing frequency estimation in a finite population |
| |
Authors: | Wen‐Han Hwang Chih‐Wei Lin Tsung‐Jen Shen |
| |
Affiliation: | 1. Institute of Statistics and Department of Applied Mathematics, National Chung Hsing University, Taichung 40227, Taiwan;2. Department of Leisure Services Management, Chaoyang University of Technology, Taichung 41349, Taiwan |
| |
Abstract: | Good–Turing frequency estimation (Good, 1953 ) is a simple, effective method for predicting detection probabilities of objects of both observed and unobserved classes based on observed frequencies of classes in a sample. The method has been used widely in several disciplines, such as information retrieval, computational linguistics, text recognition, and ecological diversity estimation. Nevertheless, existing studies assume sampling with replacement or sampling from an infinite population, which might be inappropriate for many practical applications. In light of this limitation, this article presents a modification of the Good–Turing estimation method to account for finite population sampling. We provide three practical extensions of the modified method, and we examine performance of the modified method and its extensions in simulation experiments. |
| |
Keywords: | Frequency estimation Finite population Good– Turing Number‐of‐classes estimation Sample coverage Shannon index |
|
|