Abstract: | The intraclass version of kappa coefficient has been commonly applied as a measure of agreement for two ratings per subject with binary outcome in reliability studies. We present an efficient statistic for testing the strength of kappa agreement using likelihood scores, and derive asymptotic power and sample size formula. Exact evaluation shows that the score test is generally conservative and more powerful than a method based on a chi‐square goodness‐of‐fit statistic (Donner and Eliasziw , 1992, Statistics in Medicine 11 , 1511–1519). In particular, when the research question is one directional, the one‐sided score test is substantially more powerful and the reduction in sample size is appreciable. |