期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Measuring intrarater association between correlated ordinal ratings

Kerrie P. Nelson Thomas J. Zhou Don Edwards 《Biometrical journal. Biometrische Zeitschrift》2020,62(7):1687-1701

Variability between raters' ordinal scores is commonly observed in imaging tests, leading to uncertainty in the diagnostic process. In breast cancer screening, a radiologist visually interprets mammograms and MRIs, while skin diseases, Alzheimer's disease, and psychiatric conditions are graded based on clinical judgment. Consequently, studies are often conducted in clinical settings to investigate whether a new training tool can improve the interpretive performance of raters. In such studies, a large group of experts each classify a set of patients' test results on two separate occasions, before and after some form of training with the goal of assessing the impact of training on experts' paired ratings. However, due to the correlated nature of the ordinal ratings, few statistical approaches are available to measure association between raters' paired scores. Existing measures are restricted to assessing association at just one time point for a single screening test. We propose here a novel paired kappa to provide a summary measure of association between many raters' paired ordinal assessments of patients' test results before versus after rater training. Intrarater association also provides valuable insight into the consistency of ratings when raters view a patient's test results on two occasions with no intervention undertaken between viewings. In contrast to existing correlated measures, the proposed kappa is a measure that provides an overall evaluation of the association among multiple raters' scores from two time points and is robust to the underlying disease prevalence. We implement our proposed approach in two recent breast-imaging studies and conduct extensive simulation studies to evaluate properties and performance of our summary measure of association. 相似文献

2.

Peer Review Evaluation Process of Marie Curie Actions under EU’s Seventh Framework Programme for Research

David G. Pina Darko Hren Ana Maru?i? 《PloS one》2015,10(6)

We analysed the peer review of grant proposals under Marie Curie Actions, a major EU research funding instrument, which involves two steps: an independent assessment (Individual Evaluation Report, IER) performed remotely by 3 raters, and a consensus opinion reached during a meeting by the same raters (Consensus Report, CR). For 24,897 proposals evaluated from 2007 to 2013, the association between average IER and CR scores was very high across different panels, grant calls and years. Median average deviation (AD) index, used as a measure of inter-rater agreement, was 5.4 points on a 0-100 scale (interquartile range 3.4-8.3), overall, demonstrating a good general agreement among raters. For proposals where one rater disagreed with the other two raters (n=1424; 5.7%), or where all 3 raters disagreed (n=2075; 8.3%), the average IER and CR scores were still highly associated. Disagreement was more frequent for proposals from Economics/Social Sciences and Humanities panels. Greater disagreement was observed for proposals with lower average IER scores. CR scores for proposals with initial disagreement were also significantly lower. Proposals with a large absolute difference between the average IER and CR scores (≥10 points; n=368, 1.5%) generally had lower CR scores. An inter-correlation matrix of individual raters'' scores of evaluation criteria of proposals indicated that these scores were, in general, a reflection of raters’ overall scores. Our analysis demonstrated a good internal consistency and general high agreement among raters. Consensus meetings appear to be relevant for particular panels and subsets of proposals with large differences among raters’ scores. 相似文献

3.

A Ratio Estimator of the Kappa Index of Agreement Between Two Observers

C. N. Bouza 《Biometrical journal. Biometrische Zeitschrift》1987,29(8):1011-1015

The kappa index is usually used for measuring the agreement between two observers when the scale is nominal. A modification of Cohen's kappa index was given by Krauth. The new estimator was biased and its large sample variance was obtained. An alternative estimator is developed here It is a ratio estimator and its mean square error is derived. A comparison with Cohen's estimator and Krauth's one is given by the examples used in the paper of Krauth. 相似文献

4.

An Estimating Equations Approach for Modelling Kappa

Neil Klar Stuart R. Lipsitz Joseph G. Ibrahim 《Biometrical journal. Biometrische Zeitschrift》2000,42(1):45-58

Agreement between raters for binary outcome data is typically assessed using the kappa coefficient. There has been considerable recent work extending logistic regression to provide summary estimates of interrater agreement adjusted for covariates predictive of the marginal probability of classification by each rater. We propose an estimating equations approach which can also be used to identify covariates predictive of kappa. Models may include an arbitrary and variable number of raters per subject and yet do not require any stringent parametric assumptions. Examples used to illustrate this procedure include an investigation of factors affecting agreement between primary and proxy respondents from a case‐control study and a study of the effects of gender and zygosity on twin concordance for smoking history. 相似文献

5.

Weighted least-squares approach for comparing correlated kappa 总被引：3，自引：0，他引：3

Barnhart HX Williamson JM 《Biometrics》2002,58(4):1012-1019

In the medical sciences, studies are often designed to assess the agreement between different raters or different instruments. The kappa coefficient is a popular index of agreement for binary and categorical ratings. Here we focus on testing for the equality of two dependent kappa coefficients. We use the weighted least-squares (WLS) approach of Koch et al. (1977, Biometrics 33, 133-158) to take into account the correlation between the estimated kappa statistics. We demonstrate how the SAS PROC CATMOD can be used to test for the equality of dependent Cohen's kappa coefficients and dependent intraclass kappa coefficients with nominal categorical ratings. We also test for the equality of dependent Cohen's kappa and dependent weighted kappa with ordinal ratings. The major advantage of the WLS approach is that it allows the data analyst a way of testing dependent kappa with popular SAS software. The WLS approach can handle any number of categories. Analyses of three biomedical studies are used for illustration. 相似文献

6.

The kappa coefficient of agreement for multiple observers when the number of subjects is small 总被引：2，自引：0，他引：2

S T Gross 《Biometrics》1986,42(4):883-893

Published results on the use of the kappa coefficient of agreement have traditionally been concerned with situations where a large number of subjects is classified by a small group of raters. The coefficient is then used to assess the degree of agreement among the raters through hypothesis testing or confidence intervals. A modified kappa coefficient of agreement for multiple categories is proposed and a parameter-free distribution for testing null agreement is provided, for use when the number of raters is large relative to the number of categories and subjects. The large-sample distribution of kappa is shown to be normal in the nonnull case, and confidence intervals for kappa are provided. The results are extended to allow for an unequal number of raters per subject. 相似文献

7.

Powerful Exact Unconditional Tests for Agreement between Two Raters with Binary Endpoints

Guogen Shan Gregory E. Wilding 《PloS one》2014,9(5)

Asymptotic and exact conditional approaches have often been used for testing agreement between two raters with binary outcomes. The exact conditional approach is guaranteed to respect the test size as compared to the traditionally used asymptotic approach based on the standardized Cohen''s kappa coefficient. An alternative to the conditional approach is an unconditional strategy which relaxes the restriction of fixed marginal totals as in the conditional approach. Three exact unconditional hypothesis testing procedures are considered in this article: an approach based on maximization, an approach based on the conditional p-value and maximization, and an approach based on estimation and maximization. We compared these testing procedures based on the commonly used Cohen''s kappa with regards to test size and power. We recommend the following two exact approaches for use in practice due to power advantages: the approach based on conditional p-value and maximization and the approach based on estimation and maximization. 相似文献

8.

Measuring agreement of multivariate discrete survival times using a modified weighted kappa coefficient

Guo Y Manatunga AK 《Biometrics》2009,65(1):125-134

Summary . Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study. 相似文献

9.

A sequential test for assessing observed agreement between raters

下载免费PDF全文

Sotiris Bersimis Athanasios Sachlas Subha Chakraborti 《Biometrical journal. Biometrische Zeitschrift》2018,60(1):128-145

Assessing the agreement between two or more raters is an important topic in medical practice. Existing techniques, which deal with categorical data, are based on contingency tables. This is often an obstacle in practice as we have to wait for a long time to collect the appropriate sample size of subjects to construct the contingency table. In this paper, we introduce a nonparametric sequential test for assessing agreement, which can be applied as data accrues, does not require a contingency table, facilitating a rapid assessment of the agreement. The proposed test is based on the cumulative sum of the number of disagreements between the two raters and a suitable statistic representing the waiting time until the cumulative sum exceeds a predefined threshold. We treat the cases of testing two raters' agreement with respect to one or more characteristics and using two or more classification categories, the case where the two raters extremely disagree, and finally the case of testing more than two raters' agreement. The numerical investigation shows that the proposed test has excellent performance. Compared to the existing methods, the proposed method appears to require significantly smaller sample size with equivalent power. Moreover, the proposed method is easily generalizable and brings the problem of assessing the agreement between two or more raters and one or more characteristics under a unified framework, thus providing an easy to use tool to medical practitioners. 相似文献

10.

Body Shape Preferences: Associations with Rater Body Shape and Sociosexuality

Michael E. Price Nicholas Pound James Dunn Sian Hopkins Jinsheng Kang 《PloS one》2013,8(1)

There is accumulating evidence of condition-dependent mate choice in many species, that is, individual preferences varying in strength according to the condition of the chooser. In humans, for example, people with more attractive faces/bodies, and who are higher in sociosexuality, exhibit stronger preferences for attractive traits in opposite-sex faces/bodies. However, previous studies have tended to use only relatively simple, isolated measures of rater attractiveness. Here we use 3D body scanning technology to examine associations between strength of rater preferences for attractive traits in opposite-sex bodies, and raters’ body shape, self-perceived attractiveness, and sociosexuality. For 118 raters and 80 stimuli models, we used a 3D scanner to extract body measurements associated with attractiveness (male waist-chest ratio [WCR], female waist-hip ratio [WHR], and volume-height index [VHI] in both sexes) and also measured rater self-perceived attractiveness and sociosexuality. As expected, WHR and VHI were important predictors of female body attractiveness, while WCR and VHI were important predictors of male body attractiveness. Results indicated that male rater sociosexuality scores were positively associated with strength of preference for attractive (low) VHI and attractive (low) WHR in female bodies. Moreover, male rater self-perceived attractiveness was positively associated with strength of preference for low VHI in female bodies. The only evidence of condition-dependent preferences in females was a positive association between attractive VHI in female raters and preferences for attractive (low) WCR in male bodies. No other significant associations were observed in either sex between aspects of rater body shape and strength of preferences for attractive opposite-sex body traits. These results suggest that among male raters, rater self-perceived attractiveness and sociosexuality are important predictors of preference strength for attractive opposite-sex body shapes, and that rater body traits –with the exception of VHI in female raters– may not be good predictors of these preferences in either sex. 相似文献

11.

A discussion on disease severity index values. Part I: warning on inherent errors and suggestions to maximise accuracy

下载免费PDF全文

C.H. Bock 《The Annals of applied biology》2017,171(2):139-154

A special type of ordinal scale comprising a number of intervals of known numeric ranges can be used when estimating severity of a plant disease. The interval ranges are most often based on the percent area with symptoms [e.g. the Horsfall–Barratt (H–B) scale]. Studies in plant pathology and plant breeding often use this type of ordinal scale. The disease severity is estimated by a rater as a value on the scale and has been used to determine a disease severity index (DSI) on a percentage basis, where DSI (%) = [sum (class frequency × score of rating class)]/[(total number of plants) × (maximal disease index)] × 100. However, very few studies have investigated the effects of different scales on accuracy of the DSI. Therefore, the objectives of this study were to investigate the process of calculating a DSI on a percentage basis from ordinal scale data, and to use simulation approaches to explore the effect of using different methods for calculation of the interval range and the nature of the ordinal scales used on the DSI estimates (%). We found that the DSI is particularly prone to overestimation when using the above formula if the midpoint values of the rating class are not considered. Moreover, the results of the simulation studies show that, if rater estimates are unbiased, compared with other methods tested in this study, the most accurate method for estimation of a DSI is to use the midpoint of the severity range for each class with an amended 10% ordinal scale (an ordinal scale based on a 10% linear scale emphasising severities ≤50% disease, with additional grades at low severities). As for biased conditions, the accuracy for calculating DSI estimates (%) will depend mainly on the degree and direction of the rater bias relative to the actual mean value. 相似文献

12.

Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading

Fanshawe TR Lynch AG Ellis IO Green AR Hanka R 《PloS one》2008,3(8):e2925

Background

We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples.

Methodology/Principal Findings

We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples.

Conclusions/Significance

Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in clinical studies. 相似文献

13.

Assessing interrater agreement on binary measurements via intraclass odds ratio

下载免费PDF全文

Isabella Locatelli Valentin Rousson 《Biometrical journal. Biometrische Zeitschrift》2016,58(4):962-973

相似文献

14.

Bayesian inference for kappa from single and multiple studies

Basu S Banerjee M Sen A 《Biometrics》2000,56(2):577-582

Cohen's kappa coefficient is a widely popular measure for chance-corrected nominal scale agreement between two raters. This article describes Bayesian analysis for kappa that can be routinely implemented using Markov chain Monte Carlo (MCMC) methodology. We consider the case of m > or = 2 independent samples of measured agreement, where in each sample a given subject is rated by two rating protocols on a binary scale. A major focus here is on testing the homogeneity of the kappa coefficient across the different samples. The existing frequentist tests for this case assume exchangeability of rating protocols, whereas our proposed Bayesian test does not make any such assumption. Extensive simulation is carried out to compare the performances of the Bayesian and the frequentist tests. The developed methodology is illustrated using data from a clinical trial in ophthalmology. 相似文献

15.

Matrix-based concordance correlation coefficient for repeated measures

Hiriote S Chinchilli VM 《Biometrics》2011,67(3):1007-1016

Summary In many clinical studies, Lin's concordance correlation coefficient (CCC) is a common tool to assess the agreement of a continuous response measured by two raters or methods. However, the need for measures of agreement may arise for more complex situations, such as when the responses are measured on more than one occasion by each rater or method. In this work, we propose a new CCC in the presence of repeated measurements, called the matrix‐based concordance correlation coefficient (MCCC) based on a matrix norm that possesses the properties needed to characterize the level of agreement between two p× 1 vectors of random variables. It can be shown that the MCCC reduces to Lin's CCC when p= 1. For inference, we propose an estimator for the MCCC based on U‐statistics. Furthermore, we derive the asymptotic distribution of the estimator of the MCCC, which is proven to be normal. The simulation studies confirm that overall in terms of accuracy, precision, and coverage probability, the estimator of the MCCC works very well in general cases especially when n is greater than 40. Finally, we use real data from an Asthma Clinical Research Network (ACRN) study and the Penn State Young Women's Health Study for demonstration. 相似文献

16.

A model for radiofrequency electromagnetic field predictions at outdoor and indoor locations in the context of epidemiological research

Alfred Bürgi Patrizia Frei Gaston Theis Evelyn Mohler Charlotte Braun‐Fahrländer Jürg Fröhlich Georg Neubauer Matthias Egger Martin Röösli 《Bioelectromagnetics》2010,31(3):226-236

We present a geospatial model to predict the radiofrequency electromagnetic field from fixed site transmitters for use in epidemiological exposure assessment. The proposed model extends an existing model toward the prediction of indoor exposure, that is, at the homes of potential study participants. The model is based on accurate operation parameters of all stationary transmitters of mobile communication base stations, and radio broadcast and television transmitters for an extended urban and suburban region in the Basel area (Switzerland). The model was evaluated by calculating Spearman rank correlations and weighted Cohen's kappa (κ) statistics between the model predictions and measurements obtained at street level, in the homes of volunteers, and in front of the windows of these homes. The correlation coefficients of the numerical predictions with street level measurements were 0.64, with indoor measurements 0.66, and with window measurements 0.67. The kappa coefficients were 0.48 (95%‐confidence interval: 0.35–0.61) for street level measurements, 0.44 (95%‐CI: 0.32–0.57) for indoor measurements, and 0.53 (95%‐CI: 0.42–0.65) for window measurements. Although the modeling of shielding effects by walls and roofs requires considerable simplifications of a complex environment, we found a comparable accuracy of the model for indoor and outdoor points. Bioelectromagnetics 31:226–236, 2010. © 2009 Wiley‐Liss, Inc. 相似文献

17.

Measurement of interrater agreement with adjustment for covariates

Barlow W 《Biometrics》1996,52(2):695-702

The kappa coefficient measures chance-corrected agreement between two observers in the dichotomous classification of subjects. The marginal probability of classification by each rater may depend on one or more confounding variables, however. Failure to account for these confounders may lead to inflated estimates of agreement. A multinomial model is used that assumes both raters have the same marginal probability of classification, but this probability may depend on one or more covariates. The model may be fit using software for conditional logistic regression. Additionally, likelihood-based confidence intervals for the parameter representing agreement may be computed. A simple example is discussed to illustrate model-fitting and application of the technique. 相似文献

18.

Care quality: reliability and usefulness of observation data in bench marking nursing homes and homes for the aged in the Netherlands

Frijters D Gerritsen D Steverink N 《Tijdschrift voor gerontologie en geriatrie》2003,34(1):21-29

Before including quality of care indicators in the Benchmark of Nursing Homes and Homes for the Aged in the Netherlands the reliability of the patient data collection, and usefulness had to be established. The patient data items were derived from the Resident Assessment Instruments (RAI) and a questionnaire on social interaction in elderly people. Three nursing homes and three homes for the aged participated in the test with 550 patients. 279 x 2 assessments were collected by independent raters for an inter rater reliability test; 259 x 2 by the same rater for a reliability test-retest; and 24 by a single rater. The scores on paired assessment forms were compared with the weighted Kappa agreement test. The test results allowed 10 of the 13 quality indicators from RAI to be retained. In addition new quality indicators could be defined on 'giving attention' and 'unrespectful addressing'. We estimate on the basis of a questionnaire for the raters that on average 9 to 12 minutes per patient are needed to collect and enter data for the resulting 12 quality indicators. 相似文献

19.

Measuring Pairwise Agreement Among Many Observers. II. Some Improvements and Additions

H. J. A. Schouten 《Biometrical journal. Biometrische Zeitschrift》1982,24(5):431-435

Weighted kappa was defined as a measure of pairwise interobserver agreement for the case where the observers judging one subject are not necessarily the same as those judging another subject. In this paper improved formulas for the large sample variance of the weighted kappa statistic are derived, a new definition of interclass kappa coefficients is suggested, and the intraclass correlation coefficient is shown to be a special case of weighted kappa. 相似文献

20.

Detection of genes for ordinal traits in nuclear families and a unified approach for association studies

Zhang H Wang X Ye Y 《Genetics》2006,172(1):693-699

There is growing interest in genomewide association analysis using single-nucleotide polymorphisms (SNPs), because traditional linkage studies are not as powerful in identifying genes for common, complex diseases. Tests for linkage disequilibrium have been developed for binary and quantitative traits. However, since many human conditions and diseases are measured in an ordinal scale, methods need to be developed to investigate the association of genes and ordinal traits. Thus, in the current report we propose and derive a score test statistic that identifies genes that are associated with ordinal traits when gametic disequilibrium between a marker and trait loci exists. Through simulation, the performance of this new test is examined for both ordinal traits and quantitative traits. The proposed statistic not only accommodates and is more powerful for ordinal traits, but also has similar power to that of existing tests when the trait is quantitative. Therefore, our proposed statistic has the potential to serve as a unified approach to identifying genes that are associated with any trait, regardless of how the trait is measured. We further demonstrated the advantage of our test by revealing a significant association (P = 0.00067) between alcohol dependence and a SNP in the growth-associated protein 43. 相似文献