首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Weighted kappa was defined as a measure of pairwise interobserver agreement for the case where the observers judging one subject are not necessarily the same as those judging another subject. In this paper improved formulas for the large sample variance of the weighted kappa statistic are derived, a new definition of interclass kappa coefficients is suggested, and the intraclass correlation coefficient is shown to be a special case of weighted kappa.  相似文献   

2.
S T Gross 《Biometrics》1986,42(4):883-893
Published results on the use of the kappa coefficient of agreement have traditionally been concerned with situations where a large number of subjects is classified by a small group of raters. The coefficient is then used to assess the degree of agreement among the raters through hypothesis testing or confidence intervals. A modified kappa coefficient of agreement for multiple categories is proposed and a parameter-free distribution for testing null agreement is provided, for use when the number of raters is large relative to the number of categories and subjects. The large-sample distribution of kappa is shown to be normal in the nonnull case, and confidence intervals for kappa are provided. The results are extended to allow for an unequal number of raters per subject.  相似文献   

3.
Basu S  Banerjee M  Sen A 《Biometrics》2000,56(2):577-582
Cohen's kappa coefficient is a widely popular measure for chance-corrected nominal scale agreement between two raters. This article describes Bayesian analysis for kappa that can be routinely implemented using Markov chain Monte Carlo (MCMC) methodology. We consider the case of m > or = 2 independent samples of measured agreement, where in each sample a given subject is rated by two rating protocols on a binary scale. A major focus here is on testing the homogeneity of the kappa coefficient across the different samples. The existing frequentist tests for this case assume exchangeability of rating protocols, whereas our proposed Bayesian test does not make any such assumption. Extensive simulation is carried out to compare the performances of the Bayesian and the frequentist tests. The developed methodology is illustrated using data from a clinical trial in ophthalmology.  相似文献   

4.
Guo Y  Manatunga AK 《Biometrics》2009,65(1):125-134
Summary .  Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study.  相似文献   

5.
The intraclass version of kappa coefficient has been commonly applied as a measure of agreement for two ratings per subject with binary outcome in reliability studies. We present an efficient statistic for testing the strength of kappa agreement using likelihood scores, and derive asymptotic power and sample size formula. Exact evaluation shows that the score test is generally conservative and more powerful than a method based on a chi‐square goodness‐of‐fit statistic (Donner and Eliasziw , 1992, Statistics in Medicine 11 , 1511–1519). In particular, when the research question is one directional, the one‐sided score test is substantially more powerful and the reduction in sample size is appreciable.  相似文献   

6.
A method for analysing dependent agreement data with categorical responses is proposed. A generalized estimating equation approach is developed with two sets of equations. The first set models the marginal distribution of categorical ratings, and the second set models the pairwise association of ratings with the kappa coefficient (kappa) as a metric. Covariates can be incorporated into both sets of equations. This approach is compared with a latent variable model that assumes an underlying multivariate normal distribution in which the intraclass correlation coefficient is used as a measure of association. Examples are from a cervical ectopy study and the National Heart, Lung, and Blood Institute Veteran Twin Study.  相似文献   

7.
The quality of ordered categorical recordings is determined from repeated measurements on the same subject in order to assess the level of agreement between raters, scales or occasions. The presented rating-invariant method for ordered categorical data provides means of analysing the quality of single-item rating scales, irrespective of the number of possible response values and the marginal distributions. Marginal heterogeneity implies systematic disagreement, so-called bias. An augmented ranking approach is the basis for the separation of inter-rater disagreement into systematic and random components. Correlation between pairs of augmented rank values provides a measure of agreement to the best common ordering of paired classifications, given inter-rater bias. The essential differences in interpretation and applicability of the proposed coefficient of agreement and the Spearman rank-order correlation for ordered categorical data are discussed.  相似文献   

8.
Agreement between raters for binary outcome data is typically assessed using the kappa coefficient. There has been considerable recent work extending logistic regression to provide summary estimates of interrater agreement adjusted for covariates predictive of the marginal probability of classification by each rater. We propose an estimating equations approach which can also be used to identify covariates predictive of kappa. Models may include an arbitrary and variable number of raters per subject and yet do not require any stringent parametric assumptions. Examples used to illustrate this procedure include an investigation of factors affecting agreement between primary and proxy respondents from a case‐control study and a study of the effects of gender and zygosity on twin concordance for smoking history.  相似文献   

9.
Large‐scale agreement studies are becoming increasingly common in medical settings to gain better insight into discrepancies often observed between experts' classifications. Ordered categorical scales are routinely used to classify subjects' disease and health conditions. Summary measures such as Cohen's weighted kappa are popular approaches for reporting levels of association for pairs of raters' ordinal classifications. However, in large‐scale studies with many raters, assessing levels of association can be challenging due to dependencies between many raters each grading the same sample of subjects' results and the ordinal nature of the ratings. Further complexities arise when the focus of a study is to examine the impact of rater and subject characteristics on levels of association. In this paper, we describe a flexible approach based upon the class of generalized linear mixed models to assess the influence of rater and subject factors on association between many raters' ordinal classifications. We propose novel model‐based measures for large‐scale studies to provide simple summaries of association similar to Cohen's weighted kappa while avoiding prevalence and marginal distribution issues that Cohen's weighted kappa is susceptible to. The proposed summary measures can be used to compare association between subgroups of subjects or raters. We demonstrate the use of hypothesis tests to formally determine if rater and subject factors have a significant influence on association, and describe approaches for evaluating the goodness‐of‐fit of the proposed model. The performance of the proposed approach is explored through extensive simulation studies and is applied to a recent large‐scale cancer breast cancer screening study.  相似文献   

10.
Content analysis involves classification of textual, visual, or audio data. The inter-coder agreement is estimated by making two or more coders to classify the same data units, with subsequent comparison of their results. The existing methods of agreement estimation, e.g., Cohen’s kappa, require that coders place each unit of content into one and only one category (one-to-one coding) from the pre-established set of categories. However, in certain data domains (e.g., maps, photographs, databases of texts and images), this requirement seems overly restrictive. The restriction could be lifted, provided that there is a measure to calculate the inter-coder agreement in the one-to-many protocol. Building on the existing approaches to one-to-many coding in geography and biomedicine, such measure, fuzzy kappa, which is an extension of Cohen’s kappa, is proposed. It is argued that the measure is especially compatible with data from certain domains, when holistic reasoning of human coders is utilized in order to describe the data and access the meaning of communication.  相似文献   

11.
Weighted least-squares approach for comparing correlated kappa   总被引:3,自引:0,他引:3  
Barnhart HX  Williamson JM 《Biometrics》2002,58(4):1012-1019
In the medical sciences, studies are often designed to assess the agreement between different raters or different instruments. The kappa coefficient is a popular index of agreement for binary and categorical ratings. Here we focus on testing for the equality of two dependent kappa coefficients. We use the weighted least-squares (WLS) approach of Koch et al. (1977, Biometrics 33, 133-158) to take into account the correlation between the estimated kappa statistics. We demonstrate how the SAS PROC CATMOD can be used to test for the equality of dependent Cohen's kappa coefficients and dependent intraclass kappa coefficients with nominal categorical ratings. We also test for the equality of dependent Cohen's kappa and dependent weighted kappa with ordinal ratings. The major advantage of the WLS approach is that it allows the data analyst a way of testing dependent kappa with popular SAS software. The WLS approach can handle any number of categories. Analyses of three biomedical studies are used for illustration.  相似文献   

12.
Agreement coefficients quantify how well a set of instruments agree in measuring some response on a population of interest. Many standard agreement coefficients (e.g. kappa for nominal, weighted kappa for ordinal, and the concordance correlation coefficient (CCC) for continuous responses) may indicate increasing agreement as the marginal distributions of the two instruments become more different even as the true cost of disagreement stays the same or increases. This problem has been described for the kappa coefficients; here we describe it for the CCC. We propose a solution for all types of responses in the form of random marginal agreement coefficients (RMACs), which use a different adjustment for chance than the standard agreement coefficients. Standard agreement coefficients model chance agreement using expected agreement between two independent random variables each distributed according to the marginal distribution of one of the instruments. RMACs adjust for chance by modeling two independent readings both from the mixture distribution that averages the two marginal distributions. In other words, both independent readings represent first a random choice of instrument, then a random draw from the marginal distribution of the chosen instrument. The advantage of the resulting RMAC is that differences between the two marginal distributions will not induce greater apparent agreement. As with the standard agreement coefficients, the RMACs do not require any assumptions about the bivariate distribution of the random variables associated with the two instruments. We describe the RMAC for nominal, ordinal and continuous data, and show through the delta method how to approximate the variances of some important special cases.  相似文献   

13.
Guo Y  Manatunga AK 《Biometrics》2007,63(1):164-172
Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. Lin's (1989, Biometrics 45, 255-268) concordance correlation coefficient (CCC) has become a popular measure of agreement for correlated continuous outcomes. However, commonly used estimation methods for the CCC do not accommodate censored observations and are, therefore, not applicable for survival outcomes. In this article, we estimate the CCC nonparametrically through the bivariate survival function. The proposed estimator of the CCC is proven to be strongly consistent and asymptotically normal, with a consistent bootstrap variance estimator. Furthermore, we propose a time-dependent agreement coefficient as an extension of Lin's (1989) CCC for measuring the agreement between survival times among subjects who survive beyond a specified time point. A nonparametric estimator is developed for the time-dependent agreement coefficient as well. It has the same asymptotic properties as the estimator of the CCC. Simulation studies are conducted to evaluate the performance of the proposed estimators. A real data example from a prostate cancer study is used to illustrate the method.  相似文献   

14.
Introduction. After the clinical diagnosis of leprosy, classification methods are necessary to define a treatment and prognosis of patients consistent with bacterial load. Bacteria are detected in skin smear, and bacterial load typically is established by the internationally used Ridley′s logarithmic scale, However, in Colombia an alternative semiquantitative scale is used. Objective. The interobserver reproducibility was established for the Ridley and Colombia scales, and the level of correlation-matching was identified between the bacillary indices obtained in order to assess the degree of interchangeability. Materials and methods. Standardization was attained by a reading of the smears by 2 readers with subsequent, blinded evaluation of inter-observer agreement. Each reader quantified the bacterial load of for each sample (n=325) using the Colombian and the Ridley scales. The degree of interobserver agreement was assessed with weighted kappa coefficient. The level of correlation and agreement between the measurements of the bacillary index was established with coefficient of Lin. Results. The interobserver weighted kappa coefficient was 0.83 for the Colombia scale and 0.85 for the Ridley scale. The Lin coefficient was 0.96 for the correlation-matching of bacillary indexes. Conclusions. Interobserver agreement obtained for both scales was excellent as the correlation-matching bacillary indices determined with both methods. With the cut-off points yielded a good level of agreement, ensuring interchangeability between the scales defining the high or low bacterial load.  相似文献   

15.
Range of motion (ROM) measurements are essential for the evaluation for and diagnosis of adhesive capsulitis of the shoulder (AC). However, taking these measurements using a goniometer is inconvenient and sometimes unreliable. The Kinect (Microsoft, Seattle, WA, USA) is gaining attention as a new motion detecting device that is nonintrusive and easy to implement. This study aimed to apply Kinect to measure shoulder ROM in AC; we evaluated its validity by calculating the agreement of the measurements obtained using Kinect with those obtained using goniometer and assessed its utility for the diagnosis of AC. Both shoulders of 15 healthy volunteers and affected shoulders of 12 patients with AC were included in the study. The passive and active ROM of each were measured with a goniometer for flexion, abduction, and external rotation. Their active shoulder motions for each direction were again captured using Kinect and the ROM values were calculated. The agreement between the two measurements was tested with the intraclass correlation coefficient (ICC). Diagnostic performance using the Kinect ROM was evaluated with Cohen’s kappa value. The cutoff values of the limited ROM were determined in the following ways: the same as passive ROM values, reflecting the mean difference, and based on receiver operating characteristic curves. The ICC for flexion/abduction/external rotation between goniometric passive ROM and the Kinect ROM were 0.906/0.942/0.911, while those between active ROMs and the Kinect ROMs were 0.864/0.932/0.925. Cohen’s kappa values were 0.88, 0.88, and 1.0 with the cutoff values in the order above. Measurements of the shoulder ROM using Kinect show excellent agreement with those taken using a goniometer. These results indicate that the Kinect can be used to measure shoulder ROM and to diagnose AC as an alternative to goniometer.  相似文献   

16.
The reliability of binary assessments is often measured by the proportion of agreement above chance, as estimated by the kappa statistic. In this paper, we develop a model to estimate inter-rater and intra-rater reliability when each of the two observers has the opportunity to obtain a pair of replicate measurements on each subject. The model is analogous to the nested beta-binomial model proposed by Rosner (1989, 1992). We show that the gain in precision obtained from increasing the number of measurements per rater from one to two may allow fewer subjects to be included in the study with no net loss in efficiency for estimating the inter-rater reliability.  相似文献   

17.
In clinical research and in more general classification problems, a frequent concern is the reliability of a rating system. In the absence of a gold standard, agreement may be considered as an indication of reliability. When dealing with categorical data, the well‐known kappa statistic is often used to measure agreement. The aim of this paper is to obtain a theoretical result about the asymptotic distribution of the kappa statistic with multiple items, multiple raters, multiple conditions, and multiple rating categories (more than two), based on recent work. The result settles a long lasting quest for the asymptotic variance of the kappa statistic in this situation and allows for the construction of asymptotic confidence intervals. A recent application to clinical endoscopy and to the diagnosis of inflammatory bowel diseases (IBDs) is shortly presented to complement the theoretical perspective.  相似文献   

18.
Nam JM 《Biometrics》2000,56(2):583-585
We derive a likelihood score method for interval estimation of the intraclass version of the kappa coefficient of agreement with binary classification using a general theory of Bartlett (1953, Biometrika 40, 306-317). By exact evaluation, we investigate statistical properties of the score method, the chi-square goodness-of-fit procedure (Donner and Eliasziw, 1992, Statistics in Medicine 11, 1511-1519; Hale and Fleiss, 1993, Biometrics 49, 523-534), and a crude confidence interval for small and medium sample sizes. Actual coverage percentages of the score and chi-square methods are satisfactorily close to the nominal confidence coefficient, while that of the crude method is quite unsatisfactory. The expected length of the score method is shorter than that of the chi-square procedure when the response rate is very small or very large.  相似文献   

19.
20.
Nam JM 《Biometrics》2003,59(4):1027-1035
When the intraclass correlation coefficient or the equivalent version of the kappa agreement coefficient have been estimated from several independent studies or from a stratified study, we have the problem of comparing the kappa statistics and combining the information regarding the kappa statistics in a common kappa when the assumption of homogeneity of kappa coefficients holds. In this article, using the likelihood score theory extended to nuisance parameters (Tarone, 1988, Communications in Statistics-Theory and Methods 17(5), 1549-1556) we present an efficient homogeneity test for comparing several independent kappa statistics and, also, give a modified homogeneity score method using a noniterative and consistent estimator as an alternative. We provide the sample size using the modified homogeneity score method and compare it with that using the goodness-of-fit method (GOF) (Donner, Eliasziw, and Klar, 1996, Biometrics 52, 176-183). A simulation study for small and moderate sample sizes showed that the actual level of the homogeneity score test using the maximum likelihood estimators (MLEs) of parameters is satisfactorily close to the nominal and it is smaller than those of the modified homogeneity score and the goodness-of-fit tests. We investigated statistical properties of several noniterative estimators of a common kappa. The estimator (Donner et al., 1996) is essentially efficient and can be used as an alternative to the iterative MLE. An efficient interval estimation of a common kappa using the likelihood score method is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号