首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using the variance stabilizing technique, a product multinomial model is introduced to generate a new statistic to test observers' uncertainty in a weighted concordance analysis. Distance matrices which follow some specific rules are obtained by linear combinations of hierarchical distance matrices whose elements are equal to 0 or 1 and unit diagonal. The new statistic is compared with the kappa statistic interpreted by considering the covariance matrix generated by the data. By rewriting the test statistic in a barycentric form, one demonstrates how to modify the barycentric coefficients to derive an adequate measure of the interobserver agreement. The methods are illustrated using two examples.  相似文献   

2.
An internal quality control system which is used in the centralized cytology laboratory of a population-based cervical cancer screening programme in Florence is described. It includes a peer review procedure. Abnormal cervical smears are circulated among all the cytologists and a consensus on the final diagnosis is reached. This daily procedure is designed to evaluate the performance of each cytologist and of the laboratory as a whole but can also be considered a valuable training opportunity. During an 18-month period 1197 smears were reviewed by 15 readers using a reporting form with six main categories of reporting (from ‘regative’ to ‘invasive carcinoma’), plus an ‘unsatisfactory’ category. Overall the concordance between the 15 cytologists, assessed using the kappa statistic (range 0.46–0.71; median 0.60), was good. the level of agreement increased when a weighted kappa statistic (range 0.55–0.78; median 0.68) was used. Kappa values were also calculated for specific categories and suggested an increasing concordance with increasing severity of the lesions, the categories of ‘severe dysplasia’ and ‘invasive carcinoma’ showing the highest agreement. the poor results for the ‘moderate dysplasia’ confirmed the need for combining this group with the ‘severe dysplasia’, as proposed in the Bethesda system.  相似文献   

3.

Background

Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score.

Methods and Findings

We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation - follicular (TF) and trachomatous inflammation - intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases.

Conclusions

The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee''s grading.  相似文献   

4.
Clinical methods of investigation, such as tooth colour determination, should be simple, quick and reproducible. The determination of tooth colours usually relies upon manual comparison of a patient's tooth colour with a colour ring. After some days, however, measurement results frequently lack unequivocal reproducibility. This study aimed to examine an electronic method for reliable colour measurement. The colours of the teeth 14 to 24 were determined by three different examiners in 10 subjects using the colour measuring device Shade Inspector. In total, 12 measurements per tooth were taken. Two measurement time points were scheduled to be taken, namely at study onset (T(1)) and after 6 months (T(2)). At either time point, two measurement series per subject were taken by the different examiners at 2-week intervals. The inter-examiner and intra-examiner agreement of the measurement results was assessed. The concordance for lightness and colour intensity (saturation) was represented by the intra-class correlation coefficient. The categorical variable colour shade (hue) was assessed using the kappa statistic. The study results show that tooth colour can be measured independently of the examiner. Good agreement was found between the examiners.  相似文献   

5.
Taylor JM  Wang Y  Thiébaut R 《Biometrics》2005,61(4):1102-1111
In a randomized clinical trial, a statistic that measures the proportion of treatment effect on the primary clinical outcome that is explained by the treatment effect on a surrogate outcome is a useful concept. We investigate whether a statistic proposed to estimate this proportion can be given a causal interpretation as defined by models of counterfactual variables. For the situation of binary surrogate and outcome variables, two counterfactual models are considered, both of which include the concept of the proportion of the treatment effect, which acts through the surrogate. In general, the statistic does not equal either of the two proportions from the counterfactual models, and can be substantially different. Conditions are given for which the statistic does equal the counterfactual model proportions. A randomized clinical trial with potential surrogate endpoints is undertaken in a scientific context; this context will naturally place constraints on the parameters of the counterfactual model. We conducted a simulation experiment to investigate what impact these constraints had on the relationship between the proportion explained (PE) statistic and the counterfactual model proportions. We found that observable constraints had very little impact on the agreement between the statistic and the counterfactual model proportions, whereas unobservable constraints could lead to more agreement.  相似文献   

6.
Clinical studies are often concerned with assessing whether different raters/methods produce similar values for measuring a quantitative variable. Use of the concordance correlation coefficient as a measure of reproducibility has gained popularity in practice since its introduction by Lin (1989, Biometrics 45, 255-268). Lin's method is applicable for studies evaluating two raters/two methods without replications. Chinchilli et al. (1996, Biometrics 52, 341-353) extended Lin's approach to repeated measures designs by using a weighted concordance correlation coefficient. However, the existing methods cannot easily accommodate covariate adjustment, especially when one needs to model agreement. In this article, we propose a generalized estimating equations (GEE) approach to model the concordance correlation coefficient via three sets of estimating equations. The proposed approach is flexible in that (1) it can accommodate more than two correlated readings and test for the equality of dependent concordant correlation estimates; (2) it can incorporate covariates predictive of the marginal distribution; (3) it can be used to identify covariates predictive of concordance correlation; and (4) it requires minimal distribution assumptions. A simulation study is conducted to evaluate the asymptotic properties of the proposed approach. The method is illustrated with data from two biomedical studies.  相似文献   

7.
Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen’s kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants.  相似文献   

8.
BackgroundSnakebite is a neglected problem with a high mortality in India. There are no simple clinical prognostic tools which can predict mortality in viper envenomings. We aimed to develop and validate a mortality-risk prediction score for patients of viper envenoming from Southern India.MethodsWe used clinical predictors from a prospective cohort of 248 patients with syndromic diagnosis of viper envenoming and had a positive 20-minute whole blood clotting test (WBCT 20) from a tertiary-care hospital in Puducherry, India. We applied multivariable logistic regression with backward elimination approach. External validation of this score was done among 140 patients from the same centre and its performance was assessed with concordance statistic and calibration plots.FindingsThe final model termed VENOMS from the term “Viper ENvenOming Mortality Score included 7 admission clinical parameters (recorded in the first 48 hours after bite): presence of overt bleeding manifestations, presence of capillary leak syndrome, haemoglobin <10 g/dL, bite to antivenom administration time > 6.5 h, systolic blood pressure < 100 mm Hg, urine output <20 mL/h in 24 h and female gender. The lowest possible VENOMS score of 0 predicted an in-hospital mortality risk of 0.06% while highest score of 12 predicted a mortality of 99.1%. The model had a concordance statistic of 0·86 (95% CI 0·79–0·94) in the validation cohort. Calibration plots indicated good agreement of predicted and observed outcomes.ConclusionsThe VENOMS score is a good predictor of the mortality in viper envenoming in southern India where Russell’s viper envenoming burden is high. The score may have potential applications in triaging patients and guiding management after further validation.  相似文献   

9.
Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories consider. Deciding how to weigh each type of evidence is difficult, and standards have been needed. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases. Nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium piloted these guidelines on 99 variants spanning all categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign). Nine variants were distributed to all laboratories, and the remaining 90 were evaluated by three laboratories. The laboratories classified each variant by using both the laboratory’s own method and the ACMG-AMP criteria. The agreement between the two methods used within laboratories was high (K-alpha = 0.91) with 79% concordance. However, there was only 34% concordance for either classification system across laboratories. After consensus discussions and detailed review of the ACMG-AMP criteria, concordance increased to 71%. Causes of initial discordance in ACMG-AMP classifications were identified, and recommendations on clarification and increased specification of the ACMG-AMP criteria were made. In summary, although an initial pilot of the ACMG-AMP guidelines did not lead to increased concordance in variant interpretation, comparing variant interpretations to identify differences and having a common framework to facilitate resolution of those differences were beneficial for improving agreement, allowing iterative movement toward increased reporting consistency for variants in genes associated with monogenic disease.  相似文献   

10.
The use of finite element (FE) methods in spinal research is increasing, but there is only limited information available on the influence of different input parameters on the model predictions. The aim of this study was to investigate the role of these parameters in FE models of the vertebral body. Experimental tests were undertaken on porcine lumbar vertebral bodies and scans of the specimens were used to create specimen-specific FE models. Three models were created for each specimen with combinations of generic and specimen-specific parameters. Stiffness and strength predictions were also made directly from the specimen trabecular bone volume fraction (BVF) and cross-sectional area (CSA). The agreement between the experimental results and the FE models with generic morphology was poorer (concordance coefficients = 0.058, 0.125 for stiffness, strength) than those made from the BVF and CSA (concordance coefficients = 0.638, 0.609). The greatest levels of agreement were found with the morphologically specific models including element-specific material properties (concordance coefficients = 0.881, 0.752). This indicates that highly specific models, both in terms of morphology and bone quality, are necessary if the FE tool is to be used effectively for spinal research and clinical practice.  相似文献   

11.
Guo Y  Manatunga AK 《Biometrics》2007,63(1):164-172
Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. Lin's (1989, Biometrics 45, 255-268) concordance correlation coefficient (CCC) has become a popular measure of agreement for correlated continuous outcomes. However, commonly used estimation methods for the CCC do not accommodate censored observations and are, therefore, not applicable for survival outcomes. In this article, we estimate the CCC nonparametrically through the bivariate survival function. The proposed estimator of the CCC is proven to be strongly consistent and asymptotically normal, with a consistent bootstrap variance estimator. Furthermore, we propose a time-dependent agreement coefficient as an extension of Lin's (1989) CCC for measuring the agreement between survival times among subjects who survive beyond a specified time point. A nonparametric estimator is developed for the time-dependent agreement coefficient as well. It has the same asymptotic properties as the estimator of the CCC. Simulation studies are conducted to evaluate the performance of the proposed estimators. A real data example from a prostate cancer study is used to illustrate the method.  相似文献   

12.
In epidemiological studies, cases cannot always be interviewed due to them being too ill or already deceased. Under these circumstances, proxy interviews are often conducted; however, the veridicality of information about mobile phone use gained by proxy interviews has been doubted. The issue is undecided due to the lack of empirical data. We conducted a study of 119 heterosexual couples. Both partners answered two questionnaires about mobile phone use, one about their own use and one about their partner's use. Overall agreement assessed using Cohen's kappa, Passing and Bablok regression, and concordance coefficients between self and proxy data was poor to moderate (e.g., concordance coefficients of 0.55 for duration of use). The only item with good agreement was whether or not a prepaid phone was used (Cohen's kappa 0.78 and 0.63 for male and female estimates, respectively), and to a lesser degree, the onset of mobile phone use (concordance coefficients of 0.66 and 0.61). Poorest agreement was obtained for the side of the head the mobile phone was held during calls (kappa coefficients of 0.20 and 0.24 for female and male estimates, respectively). We conclude that the assessment of mobile phone use by proxy data cannot be relied on except for information about onset of mobile phone use, use of prepaid or contract phones, and, to a lesser degree, duration of daily use. Agreement concerning the important information about side of the head the mobile phone is held during calls was poorest and only slightly better than chance. Bioelectromagnetics 33:561–567, 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

13.
Hiriote S  Chinchilli VM 《Biometrics》2011,67(3):1007-1016
Summary In many clinical studies, Lin's concordance correlation coefficient (CCC) is a common tool to assess the agreement of a continuous response measured by two raters or methods. However, the need for measures of agreement may arise for more complex situations, such as when the responses are measured on more than one occasion by each rater or method. In this work, we propose a new CCC in the presence of repeated measurements, called the matrix‐based concordance correlation coefficient (MCCC) based on a matrix norm that possesses the properties needed to characterize the level of agreement between two p× 1 vectors of random variables. It can be shown that the MCCC reduces to Lin's CCC when p= 1. For inference, we propose an estimator for the MCCC based on U‐statistics. Furthermore, we derive the asymptotic distribution of the estimator of the MCCC, which is proven to be normal. The simulation studies confirm that overall in terms of accuracy, precision, and coverage probability, the estimator of the MCCC works very well in general cases especially when n is greater than 40. Finally, we use real data from an Asthma Clinical Research Network (ACRN) study and the Penn State Young Women's Health Study for demonstration.  相似文献   

14.
The etiology of chronic Inflammatory Bowel Diseases (IBD) remains unknown, with both genetic and environmental risk factors having been implicated. A recent collaborative study of IBD provides clinical data from families with three or more affected first-degree relatives. The scientific question is whether specific clinical characteristics aggregate among affected individuals within families. Gastroenterological researchers have examined the number of concordant familial pairs in familial aggregation studies, but methods and results have been discrepant. This article investigates concepts of concordance and gives a comprehensive statistical treatment for testing concordance of various clinical traits in familial studies. For dichotomous traits, the distribution of this statistic under the null hypothesis of no familial aggregation is obtained by three methods: asymptotic, probability generating function, and permutation. The permutation method is extended to analyze aggregation for non-dichotomous traits and co-aggregations between two traits. We apply the permutation method to analyze the aforementioned multiply-affected IBD family data. Evidence is found for familial clustering of various traits, some of which are not revealed in existing studies. Such analyses provide a basis for investigating the dependence of trait aggregation upon genetic or environmental risk factors.  相似文献   

15.
Carrasco JL  Jover L 《Biometrics》2003,59(4):849-858
The intraclass correlation coefficient (ICC) and the concordance correlation coefficient (CCC) are two of the most popular measures of agreement for variables measured on a continuous scale. Here, we demonstrate that ICC and CCC are the same measure of agreement estimated in two ways: by the variance components procedure and by the moment method. We propose estimating the CCC using variance components of a mixed effects model, instead of the common method of moments. With the variance components approach, the CCC can easily be extended to more than two observers, and adjusted using confounding covariates, by incorporating them in the mixed model. A simulation study is carried out to compare the variance components approach with the moment method. The importance of adjusting by confounding covariates is illustrated with a case example.  相似文献   

16.
In clinical research and in more general classification problems, a frequent concern is the reliability of a rating system. In the absence of a gold standard, agreement may be considered as an indication of reliability. When dealing with categorical data, the well‐known kappa statistic is often used to measure agreement. The aim of this paper is to obtain a theoretical result about the asymptotic distribution of the kappa statistic with multiple items, multiple raters, multiple conditions, and multiple rating categories (more than two), based on recent work. The result settles a long lasting quest for the asymptotic variance of the kappa statistic in this situation and allows for the construction of asymptotic confidence intervals. A recent application to clinical endoscopy and to the diagnosis of inflammatory bowel diseases (IBDs) is shortly presented to complement the theoretical perspective.  相似文献   

17.
BackgroundThermodilution technique using a pulmonary artery catheter is widely used for the assessment of cardiac output (CO) in patients undergoing liver transplantation. However, the unclearness of the risk-benefit ratio of this method has led to an interest in less invasive modalities. Thus, we evaluated whether noninvasive bioreactance CO monitoring is interchangeable with thermodilution technique.MethodsNineteen recipients undergoing adult-to-adult living donor liver transplantation were enrolled in this prospective observational study. COs were recorded automatically by the two devices and compared simultaneously at 3-minute intervals. The Bland–Altman plot was used to evaluate the agreement between bioreactance and thermodilution. Clinically acceptable agreement was defined as a percentage error of limits of agreement <30%. The four quadrant plot was used to evaluate concordance between bioreactance and thermodilution. Clinically acceptable concordance was defined as a concordance rate >92%.ResultsA total of 2640 datasets were collected. The mean CO difference between the two techniques was 0.9 l/min, and the 95% limits of agreement were -3.5 l/min and 5.4 l/min with a percentage error of 53.9%. The percentage errors in the dissection, anhepatic, and reperfusion phase were 50.6%, 56.1%, and 53.5%, respectively. The concordance rate between the two techniques was 54.8%.ConclusionBioreactance and thermodilution failed to show acceptable interchangeability in terms of both estimating CO and tracking CO changes in patients undergoing liver transplantation. Thus, the use of bioreactance as an alternative CO monitoring to thermodilution, in spite of its noninvasiveness, would be hard to recommend in these surgical patients.  相似文献   

18.
HER2 fluorescence in situ hybridization (FISH) testing for breast cancer is largely limited to academic centers and commercial laboratories. As testing demands increase, methods for rapid and cost-effective technical validation and quality assessment will be required. Tissue microarray (TMA), a technique for high-throughput biomarker evaluation, could help facilitate these needs. Our objective was to assess the usefulness of TMA technology for validation of HER2 FISH testing. Two TMA blocks containing paired cores from 41 breast cancers were constructed. HER2 FISH was performed in parallel at two institutions and the results compared. One institution, with considerable HER2 FISH experience, served as the reference laboratory. HER2 chromogenic in situ hybridization (CISH) and immunohistochemistry (IHC) were compared to the FISH results. For positive and negative results, the concordance rate between laboratories was 100%. Using kappa statistical analysis to determine interobserver agreement, HER2 to chromosome 17 gene copy ratios showed strong agreement between laboratories with kappa = 0.85 (perfect agreement = 1.0). Four cases displaying low-level amplification by CISH contained chromosome 17 polysomy and gene copy ratios of <2.0 by FISH. Good concordance was observed between HER2 IHC and in situ hybridization testing. TMA is a robust and effective method for the technical validation of HER2 FISH testing and should be considered for use by quality assessment programs.  相似文献   

19.

Objective

To compare a novel computerized analysis program with visual cardiotocography (CTG) interpretation results.

Methods

Sixty-two intrapartum CTG tracings with 20- to 30-minute sections were independently interpreted using a novel computerized analysis program, as well as the visual interpretations of eight obstetricians, to evaluate the baseline fetal heart rate (FHR), baseline FHR variability, number of accelerations, number/type of decelerations, uterine contraction (UC) frequency, and the National Institute of Child Health and Human Development (NICHD) 3-Tier FHR classification system.

Results

There was no significant difference in interobserver variation after adding the components of computerized analysis to results from the obstetricians'' visual interpretations, with excellent agreement for the baseline FHR (ICC 0.91), the number of accelerations (ICC 0.85), UC frequency (ICC 0.97), and NICHD category I (kappa statistic 0.91); good agreement for baseline variability (kappa statistic 0.68), the numbers of early decelerations (ICC 0.78) and late decelerations (ICC 0.67), category II (kappa statistic 0.78), and overall categories (kappa statistic 0.80); and moderate agreement for the number of variable decelerations (ICC 0.60), and category III (kappa statistic 0.50).

Conclusions

This computerized analysis program is not inferior to visual interpretation, may improve interobserver variations, and could play a vital role in prenatal telemedicine.  相似文献   

20.
Understanding general selectivity trends across the kinome has implications ranging from target selection, compound prioritization, toxicity and patient tailoring. Several recent publications have described the characterization of kinase inhibitors via large assay panels, offering a range of generalizations that influenced kinase inhibitor research trends. Since a subset of profiled inhibitors overlap across reports, we evaluated the concordance of activity results for the same compound–kinase pairs across four data sources generated from different kinase biochemical assay technologies. Overall, 77% of all results are within 3 fold or qualitatively in agreement across sources. However, the agreement for active compounds is only 37%, indicating that different profiling panels are in better agreement to determine a compound's lack of activity rather than degree of activity. Low concordance is also found when comparing the promiscuity of kinase targets evaluated from different sources, and the pharmacological similarity of kinases. In contrast, the overall promiscuity of kinase inhibitors was consistent across sources. We highlight the difficulty of drawing general conclusions from such data by showing that no significant selectivity difference distinguishes type I vs. type II inhibitors, and limited kinase space similarity that is consistent across different sources. This article is part of a Special Issue entitled: Inhibitors of Protein Kinases (2012).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号