期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing the Intraclass Version of Kappa Coeffcient of Agreement with Binary Scale and Sample Size Determination

Jun‐mo Nam 《Biometrical journal. Biometrische Zeitschrift》2002,44(5):558-570

The intraclass version of kappa coefficient has been commonly applied as a measure of agreement for two ratings per subject with binary outcome in reliability studies. We present an efficient statistic for testing the strength of kappa agreement using likelihood scores, and derive asymptotic power and sample size formula. Exact evaluation shows that the score test is generally conservative and more powerful than a method based on a chi‐square goodness‐of‐fit statistic (Donner and Eliasziw , 1992, Statistics in Medicine 11 , 1511–1519). In particular, when the research question is one directional, the one‐sided score test is substantially more powerful and the reduction in sample size is appreciable. 相似文献

2.

Assessing agreement with multiple raters on correlated kappa statistics

下载免费PDF全文

Hongyuan Cao Pranab K. Sen Anne F. Peery Evan S. Dellon 《Biometrical journal. Biometrische Zeitschrift》2016,58(4):935-943

In clinical studies, it is often of interest to see the diagnostic agreement among clinicians on certain symptoms. Previous work has focused on the agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition. Few have discussed the agreement study with a design where multiple clinicians examine the same group of patients under two different conditions. In this paper, we use the intraclass kappa statistic for assessing nominal scale agreement with such a design. We derive an explicit variance formula for the difference of correlated kappa statistics and conduct hypothesis testing for the equality of kappa statistics. Simulation studies show that the method performs well with realistic sample sizes and may be superior to a method that did not take into account the measurement dependence structure. The practical utility of the method is illustrated on data from an eosinophilic esophagitis (EoE) study. 相似文献

3.

Reproducibility of the cytologic diagnosis of human papillomavirus infection 总被引：2，自引：0，他引：2

P L Horn D M Lowell V A LiVolsi C A Boyle 《Acta cytologica》1985,29(5):692-694

As part of a larger epidemiologic investigation of the association between human papillomavirus (HPV) infection and cervical intraepithelial neoplasia, the reliability of the cytologic diagnosis of HPV infection was examined. A random sample of cervicovaginal specimens with cytologic changes characteristic of HPV infection were matched with a second set of slides, with regard to the date and severity of the smear and the age of the woman from whom the smear was obtained. The kappa statistic for interobserver agreement was 0.38 (p less than 0.0005), increasing to 0.68 (p less than 0.0005) when uncertain diagnoses were excluded. Intraobserver agreement ranged from kappa = 0.40 to 1.00. Although this agreement is within the range of reliability found for the diagnosis of other atypical cytologic changes, considerable variation is present. The effect of this variability on the validity of estimating the risk of cervical cancer associated with HPV infection may be considerable. 相似文献

4.

Efficiency considerations in the analysis of inter-observer agreement

Shoukri MM Donner A 《Biostatistics (Oxford, England)》2001,2(3):323-336

The reliability of binary assessments is often measured by the proportion of agreement above chance, as estimated by the kappa statistic. In this paper, we develop a model to estimate inter-rater and intra-rater reliability when each of the two observers has the opportunity to obtain a pair of replicate measurements on each subject. The model is analogous to the nested beta-binomial model proposed by Rosner (1989, 1992). We show that the gain in precision obtained from increasing the number of measurements per rater from one to two may allow fewer subjects to be included in the study with no net loss in efficiency for estimating the inter-rater reliability. 相似文献

5.

Reliability of Trachoma Clinical Grading—Assessing Grading of Marginal Cases

Salman A. Rahman Sun N. Yu Abdou Amza Sintayehu Gebreselassie Boubacar Kadri Nassirou Baido Nicole E. Stoller Joseph P. Sheehan Travis C. Porco Bruce D. Gaynor Jeremy D. Keenan Thomas M. Lietman 《PLoS neglected tropical diseases》2014,8(5)

Background

Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score.

Methods and Findings

We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation - follicular (TF) and trachomatous inflammation - intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases.

Conclusions

The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee''s grading. 相似文献

6.

On assessing interrater agreement for multiple attribute responses 总被引：2，自引：0，他引：2

L L Kupper K B Hafner 《Biometrics》1989,45(3):957-967

New methods are developed for assessing the extent of interrater agreement when each unit to be rated is characterized by a (possibly empty) subset of a specified set of distinct nominal attributes. For such multiple attribute response data, a two-rater concordance statistic is derived, and associated statistical inference-making procedures are provided. This concordance statistic is corrected for chance agreement by using an underlying hypergeometric model. Numerical examples are given to illustrate the proposed methodology, and comparisons to other agreement statistics (e.g., kappa) are made. 相似文献

7.

Measuring Pairwise Agreement Among Many Observers. II. Some Improvements and Additions

H. J. A. Schouten 《Biometrical journal. Biometrische Zeitschrift》1982,24(5):431-435

Weighted kappa was defined as a measure of pairwise interobserver agreement for the case where the observers judging one subject are not necessarily the same as those judging another subject. In this paper improved formulas for the large sample variance of the weighted kappa statistic are derived, a new definition of interclass kappa coefficients is suggested, and the intraclass correlation coefficient is shown to be a special case of weighted kappa. 相似文献

8.

Estrogen and progesterone receptors in breast cancer. Immunohistochemical assay on scraping material.

B Frigo S Pilotti D Coradini G La Malfa F Rilke 《Analytical and quantitative cytology and histology / the International Academy of Cytology [and] American Society of Cytology》1992,14(2):129-136

In order to demonstrate the reliability of immunocytochemical results on cytologic specimens for receptor analysis, the expression of estrogen and progesterone receptors was investigated using immunohistochemistry on frozen sections and on scraping material from the same samples of 50 breast carcinomas. The level of agreement between the two procedures was evaluated by the kappa statistic, as was that between each immunohistochemical procedure and the dextran-coated-charcoal assay since the latter is still the assay employed most frequently for steroid receptor determination and is used for official reports. Statistical results revealed very good agreement regarding the estrogen receptor analysis, with kappa values of .910 and .952 for the comparison of the dextran-coated-charcoal assay with immunocytochemistry on frozen sections and on scrapes, respectively, and .950 for the comparison between the two immunocytochemical procedures. As to progesterone receptors, the kappa values were .795 and .712 for the comparison between the biochemical and immunocytochemical results and .915 for agreement evaluation between the two immunocytochemical procedures. The study showed that the scraping procedure is a valuable tool for the immunocytochemical assessment of steroid receptors in small mammary tumors; it yields representative cellular samples, thus permitting the investigation of heterogeneously distributed substances in tissues. 相似文献

9.

Mapping quantitative trait loci using multiple phenotypes in general pedigrees

Wang K 《Human heredity》2003,55(1):1-15

The use of correlated phenotypes may dramatically increase the power to detect the underlying quantitative trait loci (QTLs). Current approaches for multiple phenotypes include regression-based methods, the multivariate variance of components method, factor analysis and structural equations. Issues with these methods include: 1) They are computation intensive and are subject to problems of optimization algorithms; 2) Existing claims on the asymptotic distribution of the likelihood ratio statistic for the multivariate variance of components method are contradictory and erroneous; 3) The dimension reduction of the parameter space under the null hypothesis, a phenomenon that is unique to multivariate analyses, makes the asymptotic distribution of the likelihood ratio statistic more complicated than expected. In this article, three cases of varying complexity are considered. For each case, the efficient score statistic, which is asympotically equivalent to the likelihood ratio statistic, is derived, so is its asymptotic distribution [correction]. These methods are straightforward to calculate. Finite-sample properties of these score statistics are studied through extensive simulations. These score statistics are for use with general pedigrees. 相似文献

10.

Measuring pairwise agreement among many observers

H. J. A. Schouten 《Biometrical journal. Biometrische Zeitschrift》1980,22(6):497-504

Weighted kappa is defined as a measure of pairwise inter observer agreement. A weighted intra class kappa coefficient is proposed to measure agreement on a particular response category. An interclass kappa coefficient is proposed for each pair of response categories. Simple estimation procedures are presented for the case where the observers judging one subject are not necessarily the same as those judging another subject. Large sample standard errors are derived and a numerical example is given. 相似文献

11.

Comparison of a Novel Computerized Analysis Program and Visual Interpretation of Cardiotocography

Chen-Yu Chen Chun Yu Chia-Chen Chang Chii-Wann Lin 《PloS one》2014,9(12)

Objective

To compare a novel computerized analysis program with visual cardiotocography (CTG) interpretation results.

Methods

Sixty-two intrapartum CTG tracings with 20- to 30-minute sections were independently interpreted using a novel computerized analysis program, as well as the visual interpretations of eight obstetricians, to evaluate the baseline fetal heart rate (FHR), baseline FHR variability, number of accelerations, number/type of decelerations, uterine contraction (UC) frequency, and the National Institute of Child Health and Human Development (NICHD) 3-Tier FHR classification system.

Results

There was no significant difference in interobserver variation after adding the components of computerized analysis to results from the obstetricians'' visual interpretations, with excellent agreement for the baseline FHR (ICC 0.91), the number of accelerations (ICC 0.85), UC frequency (ICC 0.97), and NICHD category I (kappa statistic 0.91); good agreement for baseline variability (kappa statistic 0.68), the numbers of early decelerations (ICC 0.78) and late decelerations (ICC 0.67), category II (kappa statistic 0.78), and overall categories (kappa statistic 0.80); and moderate agreement for the number of variable decelerations (ICC 0.60), and category III (kappa statistic 0.50).

Conclusions

This computerized analysis program is not inferior to visual interpretation, may improve interobserver variations, and could play a vital role in prenatal telemedicine. 相似文献

12.

Interlaboratory reproducibility of the immunocytochemical assessment of oestrogen and progesterone receptors and proliferative activity in fine needle aspiration of breast cancer

M. CONFORTINI F. CAROZZI L. BOZZOLA G. MICCINESI F. MIRRI M. MOTTOLESE D. NOFERINI R. NIZZOLI G. TINACCI A. VOCATURO M. ZAPPA & C. MADDAU 《Cytopathology》2002,13(2):92-100

The purpose of this study was to establish the interlaboratory reproducibility of immunocytochemical analysis of oestrogen (ER) and progesterone (PR) expression and Mib1 growth fraction on fine needle aspiration (FNA) smears. A set of 44 immunostained slides for ER, PR and Mib1 were randomly selected from the archives of the Center for the Study and Prevention of Cancer (CSPO) of Florence, Italy, and submitted for reading to 6 Italian laboratories. The generalized kappa statistic was used as an indicator of agreement among the six laboratories. A good correlation for ER and PR was evident. For Mib1 the results showed some discrepancies. In addition to adequate standardization of procedures, these data confirm that the reliability of the immunocytochemistry is strictly linked to accurate analysis of the results. 相似文献

13.

Measuring agreement of multivariate discrete survival times using a modified weighted kappa coefficient

Guo Y Manatunga AK 《Biometrics》2009,65(1):125-134

Summary . Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study. 相似文献

14.

Examiner agreement on caries detection and plaque accumulation during dental surveys of elders

P. Mojon P. Favre J. P. Chung E. Budtz-Jrgensen 《Gerodontology》1995,12(1):49-55

Indices used to evaluate plaque accumulation and coronal caries have been widely accepted in epidemiological studies, yet their reliability cannot be guaranteed. The aim of this study was to evaluate the reliability of clinical criteria used in coronal and root caries diagnosis and oral hygiene evaluation as applied in elders. Nineteen elderly subjects, 73 years old on average, were examined at a first appointment by two independent examiners. They were re-examined two weeks later. Plaque accumulation was evaluated using the Plaque Index (PI) and coronal and root caries were detected according to the WHO criteria and Fejerskov et al (1991), respectively. Recurrent caries was recorded as recommended by WHO and by probing at the interface tooth-restoration. Inter- and intra-examiner agreement was evaluated using kappa statistics. The PI score showed good reliability except for examiner b, for whom a simplification of the 4-point scale in 3-point scale improved significantly the reliability. The prevalence of coronal caries was very low and intra- and inter-examiner agreement was poor. Most of the root caries lesions were covered by plaque and the kappa values indicated only poor agreement. Recurrent caries were found with good agreement using WHO criteria but the detection with the probe was not reliable. In conclusion, it seems that examiners should be trained carefully to maximise their reliability and that plaque should be removed to obtain reliable diagnoses of caries. Retraining and calibration may be necessary for surveys continuing over a long period. 相似文献

15.

Variance and sample size calculations in quality-of-life--adjusted survival analysis (Q-TWiST) 总被引：2，自引：0，他引：2

Murray S Cole B 《Biometrics》2000,56(1):173-182

The Quality-Adjusted Time Without Symptoms or Toxicity (Q-TWiST) statistic previously introduced by Glasziou, Simes and Gelber (1990, Statistics in Medicine 9, 1259-1276) combines toxicity, disease-free survival, and overall survival information in assessing the impact of treatments on the lives of patients. This methodology has received positive reviews from clinicians as intuitive and useful, but to date, the variance of this statistic has remained unspecified. We review aspects of the Q-TWiST method for analyzing clinical trial data, extend the method to accommodate multiple treatment arms, and provide closed-form asymptotic variance formulas. We also provide a framework for designing Q-TWiST clinical trials with sample sizes determined using the derived asymptotic variance formulas. Trials currently collecting quality of life data did not have the benefit of these sample size calculation techniques in designing their studies. 相似文献

16.

Computer modelling of kappa carrageenam-mannan interactions

Tristan Turquois Cyrille Rochas Franois-R. Taravel Igor Tvaroska 《Journal of molecular recognition : JMR》1994,7(4):243-250

Molecular modelling has been used as a theoretical approach to investigate the kappa carrageenan structure and its interaction with mannan chains. Calculations revealed the existence of six minima for the kappa carrageenan structure in solution. Two of them were very close to the structure found in the solid state. The methodology allowed the calculation of the theoretical counterpart of the structures based on x-ray fibre diffractions studies. In the second step of this study, we have shown that there is the possibility of interactions between kappa carrageenan double helices and mannan chains. This interacting process is allowed by the flexibility of the mannan chains and structural changes of the kappa carrageenan double helices. The calculations suggest that the disaccaride mannan fragment might be required for recognition. The result of our investigation are in good agreement with a model of gel structure based on experimental data. This approach could be applied to simulate and predict other associations in molecular assemblies. 相似文献

17.

Intra-Rater and Inter-Rater Reliability of a Medical Record Abstraction Study on Transition of Care after Childhood Cancer

Micòl E. Gianinazzi Corina S. Rueegg Karin Zimmerman Claudia E. Kuehni Gisela Michel the Swiss Paediatric Oncology Group 《PloS one》2015,10(5)

BackgroundThe abstraction of data from medical records is a widespread practice in epidemiological research. However, studies using this means of data collection rarely report reliability. Within the Transition after Childhood Cancer Study (TaCC) which is based on a medical record abstraction, we conducted a second independent abstraction of data with the aim to assess a) intra-rater reliability of one rater at two time points; b) the possible learning effects between these two time points compared to a gold-standard; and c) inter-rater reliability.MethodWithin the TaCC study we conducted a systematic medical record abstraction in the 9 Swiss clinics with pediatric oncology wards. In a second phase we selected a subsample of medical records in 3 clinics to conduct a second independent abstraction. We then assessed intra-rater reliability at two time points, the learning effect over time (comparing each rater at two time-points with a gold-standard) and the inter-rater reliability of a selected number of variables. We calculated percentage agreement and Cohen’s kappa.FindingsFor the assessment of the intra-rater reliability we included 154 records (80 for rater 1; 74 for rater 2). For the inter-rater reliability we could include 70 records. Intra-rater reliability was substantial to excellent (Cohen’s kappa 0-6-0.8) with an observed percentage agreement of 75%-95%. In all variables learning effects were observed. Inter-rater reliability was substantial to excellent (Cohen’s kappa 0.70-0.83) with high agreement ranging from 86% to 100%.ConclusionsOur study showed that data abstracted from medical records are reliable. Investigating intra-rater and inter-rater reliability can give confidence to draw conclusions from the abstracted data and increase data quality by minimizing systematic errors. 相似文献

18.

Examining Agreement between Clinicians when Assessing Sick Children

John Wagai John Senga Greg Fegan Mike English 《PloS one》2009,4(2)

Background

Case management guidelines use a limited set of clinical features to guide assessment and treatment for common childhood diseases in poor countries. Using video records of clinical signs we assessed agreement among experts and assessed whether Kenyan health workers could identify signs defined by expert consensus.

Methodology

104 videos representing 11 clinical sign categories were presented to experts using a web questionnaire. Proportionate agreement and agreement beyond chance were calculated using kappa and the AC1 statistic. 31 videos were selected and presented to local health workers, 20 for which experts had demonstrated clear agreement and 11 for which experts could not demonstrate agreement.

Principal Findings

Experts reached very high level of chance adjusted agreement for some videos while for a few videos no agreement beyond chance was found. Where experts agreed Kenyan hospital staff of all cadres recognised signs with high mean sensitivity and specificity (sensitivity: 0.897–0.975, specificity: 0.813–0.894); years of experience, gender and hospital had no influence on mean sensitivity or specificity. Local health workers did not agree on videos where experts had low or no agreement. Results of different agreement statistics for multiple observers, the AC1 and Fleiss'' kappa, differ across the range of proportionate agreement.

Conclusion

Videos provide a useful means to test agreement amongst geographically diverse groups of health workers. Kenyan health workers are in agreement with experts where clinical signs are clear-cut supporting the potential value of assessment and management guidelines. However, clinical signs are not always clear-cut. Video recordings offer one means to help standardise interpretation of clinical signs. 相似文献

19.

Testing Genetic Association by Regressing Genotype over Multiple Phenotypes

Kai Wang 《PloS one》2014,9(9)

Complex disorders are typically characterized by multiple phenotypes. Analyzing these phenotypes jointly is expected to be more powerful than dealing with one of them at a time. A recent approach (O''Reilly et al. 2012) is to regress the genotype at a SNP marker on multiple phenotypes and apply the proportional odds model. In the current research, we introduce an explicit expression for the score test statistic and its non-centrality parameter that determines its power. Same simulation studies as those reported in Galesloot et al. (2014) were conducted to assess its performance. We demonstrate by theoretical arguments and simulation studies that, despite its potential usefulness for multiple phenotypes, the proportional odds model method can be less powerful than regular methods for univariate traits. We also introduce an implementation of the proposed score statistic in an R package named iGasso. 相似文献

20.

A non-iterative confidence interval estimating procedure for the intraclass kappa statistic with multinomial outcomes

Zou G Klar N 《Biometrical journal. Biometrische Zeitschrift》2005,47(5):682-690

We obtain the asymptotic sample variance of the intraclass kappa statistic for multinomial outcome data. A modified Wald type procedure based on this theory is then used for confidence interval construction. The results of a simulation study show that the proposed non-iterative approach performs very well in terms of confidence interval coverage and width for samples as small as 50. The procedure is illustrated with two examples from previously published medical studies. 相似文献