首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Weighted kappa was defined as a measure of pairwise interobserver agreement for the case where the observers judging one subject are not necessarily the same as those judging another subject. In this paper improved formulas for the large sample variance of the weighted kappa statistic are derived, a new definition of interclass kappa coefficients is suggested, and the intraclass correlation coefficient is shown to be a special case of weighted kappa.  相似文献   

2.
Barnhart HX  Haber M  Song J 《Biometrics》2002,58(4):1020-1027
Accurate and precise measurement is an important component of any proper study design. As elaborated by Lin (1989, Biometrics 45, 255-268), the concordance correlation coefficient (CCC) is more appropriate than other indices for measuring agreement when the variable of interest is continuous. However, this agreement index is defined in the context of comparing two fixed observers. In order to use multiple observers in a study involving large numbers of subjects, there is a need to assess agreement among these multiple observers. In this article, we present an overall CCC (OCCC) in terms of the interobserver variability for assessing agreement among multiple fixed observers. The OCCC turns out to be equivalent to the generalized CCC (King and Chinchilli, 2001, Statistics in Medicine 20, 2131-2147; Lin, 1989; Lin, 2000, Biometrics 56, 324-325) when the squared distance function is used. We evaluated the OCCC through generalized estimating equations (Barnhart and Williamson, 2001, Biometrics 57, 931-940) and U-statistics (King and Chinchilli, 2001) for inference. This article offers the following important points. First, it addresses the precision and accuracy indices as components of the OCCC. Second, it clarifies that the OCCC is the weighted average of all pairwise CCCs. Third, it is intuitively defined in terms of interobserver variability. Fourth, the inference approaches of GEE and the U-statistics are compared via simulations for small samples. Fifth, we illustrate the use of the OCCC by two medical examples with the GEE, U-statistics, and bootstrap approaches.  相似文献   

3.
Guo Y  Manatunga AK 《Biometrics》2009,65(1):125-134
Summary .  Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study.  相似文献   

4.
By COHEN and others the kappa index was developed for measuring nominal scale agreement between two raters. This statistic measures the distance from the nullhypothesis of independent ratings of two observers. Here a modified kappa is introduced, which takes into account the distance between the marginal distributions, as well. This distance is interpreted as the so-called interobserver bias. Population analogues are defined for the modified kappa and a related conditional index. For these parameters asymptotic confidence intervals and tests are derived. The procedures are illustrated by fictitious and real examples.  相似文献   

5.
OBJECTIVE: To assess whether digital images can be useful in evaluating interobserver variability in cervical-vaginal cytology. STUDY DESIGN: In phase 1 of the study, to measure interobserver variability, a set of 160 digital images was submitted to 192 cytologists with experience ranging from 2 to > 30 years. The set was preceded by two days of immersion in lessons and practical exercises with digital images. In phase 2, to compare different procedures of interobserver variability, two different sets of slides and one set of digital images were used. RESULTS: In phase 1, kappa and weighted kappa w values computed against both the consensus and the target diagnosis showed good agreement, with few exceptions. In phase 2, the consensus and target diagnoses obtained on the slide sets and digital set were compared. Mean k and kw values obtained with the digital images in phase 2 were significantly lower as compared to those in phase 1. CONCLUSION: A set of digital images can be a useful tool for evaluating and improving interobserver reproducibility. A two-day course on digital images could be an ideal modality for introducing this new technology.  相似文献   

6.
Large‐scale agreement studies are becoming increasingly common in medical settings to gain better insight into discrepancies often observed between experts' classifications. Ordered categorical scales are routinely used to classify subjects' disease and health conditions. Summary measures such as Cohen's weighted kappa are popular approaches for reporting levels of association for pairs of raters' ordinal classifications. However, in large‐scale studies with many raters, assessing levels of association can be challenging due to dependencies between many raters each grading the same sample of subjects' results and the ordinal nature of the ratings. Further complexities arise when the focus of a study is to examine the impact of rater and subject characteristics on levels of association. In this paper, we describe a flexible approach based upon the class of generalized linear mixed models to assess the influence of rater and subject factors on association between many raters' ordinal classifications. We propose novel model‐based measures for large‐scale studies to provide simple summaries of association similar to Cohen's weighted kappa while avoiding prevalence and marginal distribution issues that Cohen's weighted kappa is susceptible to. The proposed summary measures can be used to compare association between subgroups of subjects or raters. We demonstrate the use of hypothesis tests to formally determine if rater and subject factors have a significant influence on association, and describe approaches for evaluating the goodness‐of‐fit of the proposed model. The performance of the proposed approach is explored through extensive simulation studies and is applied to a recent large‐scale cancer breast cancer screening study.  相似文献   

7.
Weighted kappa is defined as a measure of pairwise inter observer agreement. A weighted intra class kappa coefficient is proposed to measure agreement on a particular response category. An interclass kappa coefficient is proposed for each pair of response categories. Simple estimation procedures are presented for the case where the observers judging one subject are not necessarily the same as those judging another subject. Large sample standard errors are derived and a numerical example is given.  相似文献   

8.

Background

Clear definitions of outcomes following trichiasis surgery are critical for planning program evaluations and for identifying ways to improve trichiasis surgery. Eyelid contour abnormality is an important adverse outcome of surgery; however, no standard method has been described to categorize eyelid contour abnormalities.

Methodology/Principal Findings

A classification system for eyelid contour abnormalities following surgery for trachomatous trichiasis was developed. To determine whether the grading was reproducible using the classification system, six-week postoperative photographs were reviewed by two senior graders to characterize severity of contour abnormalities. Sample photographs defining each contour abnormality category were compiled and used to train four new graders. All six graders independently graded a Standardization Set of 75 eyelids, which included a roughly equal distribution across the severity scale, and weighted kappa scores were calculated. Two hundred forty six-week postoperative photographs from an ongoing clinical trial were randomly selected for evaluating agreement across graders. Two months after initial grading, one grader regraded a subset of the 240 photographs to measure longer-term intra-observer agreement. The weighted kappa for agreement between the two senior graders was 0.80 (95% CI: 0.71–0.89). Among the Standardization Set, agreement between the senior graders and the 4 new graders showed weighted kappa scores ranging from 0.60–0.80. Among 240 eyes comprising the clinical trial dataset, agreement ranged from weighted kappa 0.70–0.71. Longer-term intra-observer agreement was weighted kappa 0.86 (95% CI: 0.80–0.92).

Conclusions/Significance

The standard eyelid contour grading system we developed reproducibly delineates differing levels of contour abnormality. This grading system could be useful both for helping to evaluate trichiasis surgery outcomes in clinical trials and for evaluating trichiasis surgery programs.  相似文献   

9.

Purpose

Chronic hand and wrist pain is a common clinical issue for orthopaedic surgeons and rheumatologists. The purpose of this study was 1. To analyze the interobserver agreement of SPECT/CT, MRI, CT, bone scan and plain radiographs in patients with non-specific pain of the hand and wrist, and 2. to assess the diagnostic accuracy of these imaging methods in this selected patient population.

Materials and Methods

Thirty-two consecutive patients with non-specific pain of the hand or wrist were evaluated retrospectively. All patients had been imaged by plain radiographs, planar early-phase imaging (bone scan), late-phase imaging (SPECT/CT including bone scan and CT), and MRI. Two experienced and two inexperienced readers analyzed the images with a standardized read-out protocol. Reading criteria were lesion detection and localisation, type and etiology of the underlying pathology. Diagnostic accuracy and interobserver agreement were determined for all readers and imaging modalities.

Results

The most accurate modality for experienced readers was SPECT/CT (accuracy 77%), followed by MRI (56%). The best performing, though little accurate modality for inexperienced readers was also SPECT/CT (44%), followed by MRI and bone scan (38% each). The interobserver agreement of experienced readers was generally high in SPECT/CT concerning lesion detection (kappa 0.93, MRI 0.72), localisation (kappa 0.91, MRI 0.75) and etiology (kappa 0.85, MRI 0.74), while MRI yielded better results on typification of lesions (kappa 0.75, SPECT/CT 0.69). There was poor agreement between experienced and inexperienced readers in SPECT/CT and MRI.

Conclusions

SPECT/CT proved to be the most helpful imaging modality in patients with non-specific wrist pain. The method was found reliable, providing high interobserver agreement, being outperformed by MRI only concerning the typification of lesions. We believe it is beneficial to integrate SPECT/CT into the diagnostic imaging algorithm of chronic wrist pain.  相似文献   

10.
Weighted least-squares approach for comparing correlated kappa   总被引:3,自引:0,他引:3  
Barnhart HX  Williamson JM 《Biometrics》2002,58(4):1012-1019
In the medical sciences, studies are often designed to assess the agreement between different raters or different instruments. The kappa coefficient is a popular index of agreement for binary and categorical ratings. Here we focus on testing for the equality of two dependent kappa coefficients. We use the weighted least-squares (WLS) approach of Koch et al. (1977, Biometrics 33, 133-158) to take into account the correlation between the estimated kappa statistics. We demonstrate how the SAS PROC CATMOD can be used to test for the equality of dependent Cohen's kappa coefficients and dependent intraclass kappa coefficients with nominal categorical ratings. We also test for the equality of dependent Cohen's kappa and dependent weighted kappa with ordinal ratings. The major advantage of the WLS approach is that it allows the data analyst a way of testing dependent kappa with popular SAS software. The WLS approach can handle any number of categories. Analyses of three biomedical studies are used for illustration.  相似文献   

11.
Basu S  Banerjee M  Sen A 《Biometrics》2000,56(2):577-582
Cohen's kappa coefficient is a widely popular measure for chance-corrected nominal scale agreement between two raters. This article describes Bayesian analysis for kappa that can be routinely implemented using Markov chain Monte Carlo (MCMC) methodology. We consider the case of m > or = 2 independent samples of measured agreement, where in each sample a given subject is rated by two rating protocols on a binary scale. A major focus here is on testing the homogeneity of the kappa coefficient across the different samples. The existing frequentist tests for this case assume exchangeability of rating protocols, whereas our proposed Bayesian test does not make any such assumption. Extensive simulation is carried out to compare the performances of the Bayesian and the frequentist tests. The developed methodology is illustrated using data from a clinical trial in ophthalmology.  相似文献   

12.

Background

Management of endometrial precancerous lesions has been of much debate due to inconsistencies in their classification, natural history and histologic diagnosis. Endometrial hyperplasia constitutes a wide range of histomorphologic features associated with high intra and interobserver diagnostic variability. Although traditional microscopic diagnosis is by far the most applicable method and the gold standard for histomorphologic diagnosis, digitized image analysis has been used as a powerful adjunct to maximize the histologic data retrieval and to add some detailed objective criteria for correct diagnosis in difficult cases.

Methods

A series of 100 endometrial curettage specimens with diagnosis of endometrial hyperplasia or well differentiated adenocarcinoma were blindly reviewed by 5 pathologists; their intra and interobserver reproducibility determined and further compared to the objective morphometric data i.e. D-score and volume percent of stroma (VPS).

Results

The results were assessed using the weighted kappa statistics. Mean intraobserver kappa value was 0.8690 (99.44% agreement). Mean interobserver kappa values by diagnostic category were: simple hyperplasia without atypia: 0.7441; complex hyperplasia without atypia: 0.3379; atypical hyperplasia: 0.3473, and well-differentiated endometrioid carcinoma: 0.6428; with a kappa value of 0.5372 for all cases combined. Interobserver agreement was in substantial rate for simple hyperplasia (SH) and well differentiated adenocarcinoma (WDA) but was in fair limit for complex hyperplasia (CH) and atypical hyperplasia (AH). Intraobserver agreement was almost perfect. The specimens were divided in two groups according to the computerized morphometric analysis: Endometrial Hyperplasia (EH) ( D Score ≥ 1 or VPS ≥ 55%) and Endometrial Intraepithelial Neoplasia (EIN) (D-Score < 1 or VPS < 55%). Morphometric findings were closely compatible with routine WHO classification made by one expert pathologist; however; diagnosis of (CH) and (AH) made by other pathologists were not concordant with morphometric data.

Conclusion

It may be necessary to make some revisions in WHO classification for endometrial hyperplasia and precancerous lesions.  相似文献   

13.
OBJECTIVE: The recently developed software (CONQUISTADOR), capable of computing all intralaboratory and interlaboratory quality control (QC) indicators, was used to evaluate the diagnostic agreement among 4 cytology laboratories participating in the LAMS Study. STUDY DESIGN: The study was an interlaboratory exchange of specially designed 5 slide sets, each comprising 20 (conventional cytology) slides. At the first step, 80 slides (with "clear-cut" cases) were divided into four sets (A, B, C, D) of 20 specimens, each including inadequate and negative cases as well as in different proportions of all diagnostic TBS 2001 categories. In the second round, a fifth set (E) of 20 slides ("difficult cases") was designed, with all diagnostic categories, ASC and AGC included. Common measures of reproducibility (kappa and weighted kappa), accuracy (SE, SP, PPV, NPV) and 3 indices of diagnostic variability were calculated for sets A-D and set E, separately. RESULTS: For the 5 slide sets together, the weighted kappa was 0.8 (95% CI 0.76-0.85), which is the lower limit of the "almost perfect" ranking of kappa statistics, indicating an excellent interlaboratory agreement. The interlaboratory reproducibility was lower only for the difficult set (E). Similarly, the sensitivity for set E (70.0%) was lower than that (92.1%) for sets A-D. The diagnostic variability indices were not substantially different between the difficult (set E) and clearcut (sets A-D) cases. CONCLUSION: High interlaboratory reproducibility was obtained for sets A-D ("clear-cut" cases), while more interlaboratory variation was evident in the difficult samples. The new CONQUISTADOR software is a valuable tool in calculating the indicators needed in this intralaboratory and interlaboratory.  相似文献   

14.
In spite of the large number of studies on technical problems affecting the interlaboratory reproducibility of IHC HER-2/neu determination, only little is known about factors limiting the intra- and interobserver reproducibility in the actual practice of HER-2/neu expression analysis. The aim of the present INQAT study was to evaluate the intra- and inter-observer reproducibility of IHC HER-2 analysis among pathologists routinely working in Italian laboratories. Twenty immunostained slides were distributed to 12 pathologists who had to report, for each slide, the semiquantitative analysis of the percentage of immunopositive cells and the qualitative evaluation of the intensity of membrane staining. The intra- and interobserver reproducibility as well as the reproducibility between each laboratory and the reference values were quantified adopting an approach based on computation of the weighted kappa statistic (Kw). Additionally, in order to evaluate the contribution of each category to the overall agreement, the kappa category-specific statistics (Kcs) were estimated for both classification criteria by jointly considering all the participating laboratories. The intraobserver analyses showed a satisfactory level of reproducibility for both the percentage of positive cells (median Kw, 0.94; range: 0.80-0.96) and membrane staining (median Kw, 0.86; range: 0.78-0.96). Similarly, a fairly good level of reproducibility for the percentage of cells (median Kw, 0.89; range, 0.73-0.96) and the intensity of membrane staining (median Kw, 0.84; range, 0.72-0.92) were observed from comparisons with reference values. When all possible pairwise comparisons were performed, a satisfactory level of interobserver reproducibility was found for most laboratories. Kw values varied between 0.51 and 0.98 (median Kw, 0.80) and between 0.61 and 0.94 (median Kw, 0.81) for semiquantitative and qualitative measurements, respectively. Analysis of the contribution of the extreme categories to the overall agreement showed a substantial or almost perfect agreement for both classification criteria. Conversely, the contribution of intermediate categories appeared to be scarce or slight for the percentage of immunostained cells and slight or fair for the intensity of membrane staining. We conclude that, overall, the interobserver reproducibility in IHC analysis of HER-2/neu expression is satisfactory, although classification of the intermediate categories is problematic, both with regard to the percentage of immunostained cells and the intensity of membrane staining.  相似文献   

15.
Objectives: This National Heart, Lung, and Blood Institute Growth and Health Study report assesses racial differences in fat patterning in black and white girls ages 9 to 19 years, comparing the sum of triceps and subscapular skinfolds (SSFs) and percentage of body fat (%BF) from impedance as two indices of adiposity. It is hypothesized that racial differences in fat patterning manifest during puberty. Research Methods and Procedures: SSF and %BF were measured annually. Racial differences in SSF and %BF were evaluated by age. Associations between %BF and SSF were evaluated using the Pearson's correlations coefficient. Classification agreement was evaluated using the kappa‐statistic. Effects of pubertal stage and race on classification agreement were examined using multivariate models. Results: White girls had a greater mean %BF at 9 to 12 years of age; black girls had a greater %BF thereafter. Black girls had a greater mean SSF at every age. The correlation coefficient between SSF and %BF was 0.79, and there was good agreement between %BF and SSF in separating high (>85th percentile) from not high (kappa = 0.60 for whites and 0.66 for blacks). SSF associated more with %BF in prepuberty and early puberty than in late puberty. Discussion: Despite good correlations between %BF and SSF, the two methods indicate different fat patterns in black and white girls.  相似文献   

16.

Background

Tuberculin skin tests (TSTs) are long-established screening methods for tuberculosis (TB). We aimed to compare agreement between the intradermal Mantoux and multipuncture percutaneous Tine methods and to quantify risk factors for a positive test result.

Methodology/Principal Findings

1512 South African children younger than 5 years of age who were investigated for tuberculosis (TB) during a Bacille Calmette Guerin (BCG) trial were included in this analysis. Children underwent both Mantoux and Tine tests. A positive test was defined as Mantoux ≥15 mm or Tine ≥ Grade 3 for the binary comparison. Agreement was evaluated using kappa (binary) and weighted kappa (hierarchical). Multivariate regression models identified independent risk factors for TST positivity. The Mantoux test was positive in 430 children (28.4%) and the Tine test in 496 children (32.8%, p<0.0001), with observed binary agreement 87.3% (kappa 0.70) and hierarchical agreement 85.0% (weighted kappa 0.66). Among 173 children culture-positive for Mycobacterium tuberculosis, Mantoux was positive in 49.1% and Tine in 54.9%, p<0.0001 (kappa 0.70). Evidence of digit preference was noted for Mantoux readings at 5 mm threshold intervals. After adjustment for confounders, a positive culture, suggestive chest radiograph, and proximity of TB contact were risk factors for a positive test using both TST methods. There were no independent associations between ethnicity, gender, age, or over-crowding, and TST result.

Conclusions/Significance

The Tine test demonstrated a higher positive test rate than the Mantoux, with substantial agreement between TST methods among young BCG-vaccinated children. TB disease and exposure factors, but not demographic variables, were independent risk factors for a positive result using either test method. These findings suggest that the Tine might be a useful screening tool for childhood TB in resource-limited countries.  相似文献   

17.
The aim of this study was to evaluate whether distance data based on calculations by use of digitalized geographical information systems (GIS) and distance data based on measurements on 1:5000 maps agree sufficiently with on site distance measurements to be used as input to magnetic field calculations in epidemiological studies. The analysis were performed by use of weighted kappa (kappa(w)) statistical method described by Bland and Altman for comparison of measures of agreement. Map measurements showed better agreement with on site measurements than GIS calculations did. However, we consider both methods appropriate for use in larger epidemiological studies if the results are interpreted with caution. GIS calculations have the advantage of being both time and cost saving.  相似文献   

18.
To assess the variability among histopathologists in diagnosing and grading cervical intraepithelial neoplasia eight experienced histopathologists based at different hospitals examined the same set of 100 consecutive colposcopic cervical biopsy specimens and assigned them into one of six diagnostic categories. These were normal squamous epithelium, non-neoplastic squamous proliferations, cervical intraepithelial neoplasia grades I, II, and III, and other. The histopathologists were given currently accepted criteria for diagnosing and grading cervical intraepithelial neoplasia and asked to mark their degree of confidence about their decision on a visual linear analogue scale provided. The degree of agreement between the histopathologists was characterised by kappa statistics, which showed an overall poor agreement (unweighted kappa 0.358). Agreement between observers was excellent for invasive lesions, moderately good for cervical intraepithelial neoplasia grade III, and poor for cervical intraepithelial neoplasia grades I and II (unweighted kappa 0.832, 0.496, 0.172, and 0.175, respectively); the kappa value for all grades of cervical intraepithelial neoplasia taken together was 0.660. The most important source of disagreement lay in the distinction of reactive squamous proliferations from cervical intraepithelial neoplasia grade I. The histopathologists were confident in diagnosing cervical intraepithelial neoplasia grade III and invasive carcinoma (other) but not as confident in diagnosing cervical intraepithelial neoplasia grades I and II and glandular atypia (other). Experienced histopathologists show considerable interobserver variability in grading cervical intraepithelial neoplasia and more importantly in distinguishing between reactive squamous proliferations and cervical intraepithelial neoplasia grade I. It is suggested that the three grade division of cervical intraepithelial neoplasia should be abandoned and a borderline category introduced that entails follow up without treatment.  相似文献   

19.

Objective

To compare a novel computerized analysis program with visual cardiotocography (CTG) interpretation results.

Methods

Sixty-two intrapartum CTG tracings with 20- to 30-minute sections were independently interpreted using a novel computerized analysis program, as well as the visual interpretations of eight obstetricians, to evaluate the baseline fetal heart rate (FHR), baseline FHR variability, number of accelerations, number/type of decelerations, uterine contraction (UC) frequency, and the National Institute of Child Health and Human Development (NICHD) 3-Tier FHR classification system.

Results

There was no significant difference in interobserver variation after adding the components of computerized analysis to results from the obstetricians'' visual interpretations, with excellent agreement for the baseline FHR (ICC 0.91), the number of accelerations (ICC 0.85), UC frequency (ICC 0.97), and NICHD category I (kappa statistic 0.91); good agreement for baseline variability (kappa statistic 0.68), the numbers of early decelerations (ICC 0.78) and late decelerations (ICC 0.67), category II (kappa statistic 0.78), and overall categories (kappa statistic 0.80); and moderate agreement for the number of variable decelerations (ICC 0.60), and category III (kappa statistic 0.50).

Conclusions

This computerized analysis program is not inferior to visual interpretation, may improve interobserver variations, and could play a vital role in prenatal telemedicine.  相似文献   

20.

Objectives

To evaluate the reliability of semiquantitative Vertebral Fracture Assessment (VFA) on chest Computed Tomography (CT).

Methods

Four observers performed VFA twice upon sagittal reconstructions of 50 routine clinical chest CTs. Intra- and interobserver agreement (absolute agreement or 95% Limits of Agreement) and reliability (Cohen''s kappa or intraclass correlation coefficient(ICC)) were calculated for the visual VFA measures (fracture present, worst fracture grade, cumulative fracture grade on patient level) and for percentage height loss of each fractured vertebra compared to the adjacent vertebrae.

Results

Observers classified 24–38% patients as having at least one vertebral fracture, giving rise to kappa''s of 0.73–0.84 (intraobserver) and 0.56–0.81 (interobserver). For worst fracture grade we found good intraobserver (76–88%) and interobserver (74–88%) agreement, and excellent reliability with square-weighted kappa''s of 0.84–0.90 (intraobserver) and 0.84–0.94 (interobserver). For cumulative fracture grade the 95% Limits of Agreement were maximally ±1,99 (intraobserver) and ±2,69 (interobserver) and the reliability (ICC) varied from 0.84–0.94 (intraobserver) and 0.74–0.94 (interobserver). For percentage height-loss on a vertebral level the 95% Limits of Agreement were maximally ±11,75% (intraobserver) and ±12,53% (interobserver). The ICC was 0.59–0.90 (intraobserver) and 0.53–0–82 (interobserver). Further investigation is needed to evaluate the prognostic value of this approach.

Conclusion

In conclusion, these results demonstrate acceptable reproducibility of VFA on CT.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号