期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficiency considerations in the analysis of inter-observer agreement

Shoukri MM Donner A 《Biostatistics (Oxford, England)》2001,2(3):323-336

The reliability of binary assessments is often measured by the proportion of agreement above chance, as estimated by the kappa statistic. In this paper, we develop a model to estimate inter-rater and intra-rater reliability when each of the two observers has the opportunity to obtain a pair of replicate measurements on each subject. The model is analogous to the nested beta-binomial model proposed by Rosner (1989, 1992). We show that the gain in precision obtained from increasing the number of measurements per rater from one to two may allow fewer subjects to be included in the study with no net loss in efficiency for estimating the inter-rater reliability. 相似文献

2.

Intra-Rater,Inter-Rater and Test-Retest Reliability of an Instrumented Timed Up and Go (iTUG) Test in Patients with Parkinson’s Disease

Rob C. van Lummel Stefan Walgaard Markus A. Hobert Walter Maetzler Jaap H. van Die?n Francisca Galindo-Garre Caroline B. Terwee 《PloS one》2016,11(3)

Background

The “Timed Up and Go” (TUG) is a widely used measure of physical functioning in older people and in neurological populations, including Parkinson’s Disease. When using an inertial sensor measurement system (instrumented TUG [iTUG]), the individual components of the iTUG and the trunk kinematics can be measured separately, which may provide relevant additional information.

Objective

The aim of this study was to determine intra-rater, inter-rater and test-retest reliability of the iTUG in patients with Parkinson’s Disease.

Methods

Twenty eight PD patients, aged 50 years or older, were included. For the iTUG the DynaPort Hybrid (McRoberts, The Hague, The Netherlands) was worn at the lower back. The device measured acceleration and angular velocity in three directions at a rate of 100 samples/s. Patients performed the iTUG five times on two consecutive days. Repeated measurements by the same rater on the same day were used to calculate intra-rater reliability. Repeated measurements by different raters on the same day were used to calculate intra-rater and inter-rater reliability. Repeated measurements by the same rater on different days were used to calculate test-retest reliability.

Results

Nineteen ICC values (15%) were ≥ 0.9 which is considered as excellent reliability. Sixty four ICC values (49%) were ≥ 0.70 and < 0.90 which is considered as good reliability. Thirty one ICC values (24%) were ≥ 0.50 and < 0.70, indicating moderate reliability. Sixteen ICC values (12%) were ≥ 0.30 and < 0.50 indicating poor reliability. Two ICT values (2%) were < 0.30 indicating very poor reliability.

Conclusions

In conclusion, in patients with Parkinson’s disease the intra-rater, inter-rater, and test-retest reliability of the individual components of the instrumented TUG (iTUG) was excellent to good for total duration and for turning durations, and good to low for the sub durations and for the kinematics of the SiSt and StSi. The results of this fully automated analysis of instrumented TUG movements demonstrate that several reliable TUG parameters can be identified that provide a basis for a more precise, quantitative use of the TUG test, in clinical practice. 相似文献

3.

Adverse Drug Events in Older Hospitalized Patients: Results and Reliability of a Comprehensive and Structured Identification Strategy

Joanna E. Klopotowska Peter C. Wierenga Clementine C. M. Stuijt Lambertus Arisz Marcel G. W. Dijkgraaf Paul F. M. Kuks Henk Asscheman Sophia E. de Rooij Loraine Lie-A-Huen Susanne M. Smorenburg 《PloS one》2013,8(8)

Background

Older patients are at high risk for experiencing Adverse Drug Events (ADEs) during hospitalization. To be able to reduce ADEs in these vulnerable patients, hospitals first need to measure the occurrence of ADEs, especially those that are preventable. However, data on preventable ADEs (pADEs) occurring during hospitalization in older patients are scarce, and no ‘gold standard’ for the identification of ADEs exists.

Methodology

The study was conducted in three hospitals in the Netherlands in 2007. ADEs were retrospectively identified by a team of experts using a comprehensive and structured patient chart review (PCR) combined with a trigger-tool as an aid. This ADE identification strategy was applied to a cohort of 250 older hospitalized patients. To estimate the intra- and inter-rater reliabilities, Cohen’s kappa values were calculated.

Principal Findings

In total, 118 ADEs were detected which occurred in 62 patients. This ADE yield was 1.1 to 2.7 times higher in comparison to other ADE studies in older hospitalized patients. Of the 118 ADEs, 83 (70.3%) were pADEs; 51 pADEs (43.2% of all ADEs identified) caused serious patient harm. Patient harm caused by ADEs resulted in various events. The overall intra-rater agreement of the developed strategy was substantial (κ = 0.74); the overall inter-rater agreement was only fair (κ = 0.24).

Conclusions/Significance

The ADE identification strategy provided a detailed insight into the scope of ADEs occurring in older hospitalized patients, and showed that the majority of (serious) ADEs can be prevented. Several strategy related aspects, as well as setting/study specific aspects, may have contributed to the results gained. These aspects should be considered whenever ADE measurements need to be conducted. The results regarding pADEs can be used to design tailored interventions to effectively reduce harm caused by medication errors. Improvement of the inter-rater reliability of a PCR remains challenging. 相似文献

4.

Point-of-Care Urine Tests for Smoking Status and Isoniazid Treatment Monitoring in Adult Patients

Ioana Nicolau Lulu Tian Dick Menzies Gaston Ostiguy Madhukar Pai 《PloS one》2012,7(9)

Background

Poor adherence to isoniazid (INH) preventive therapy (IPT) is an impediment to effective control of latent tuberculosis (TB) infection. TB patients who smoke are at higher risk of latent TB infection, active disease, and TB mortality, and may have lower adherence to their TB medications. The objective of our study was to validate IsoScreen and SmokeScreen (GFC Diagnostics, UK), two point-of-care tests for monitoring INH intake and determining smoking status. The tests could be used together in the same individual to help identify patients with a high-risk profile and provide a tailored treatment plan that includes medication management, adherence interventions, and smoking cessation programs.

Methodology/Principal Findings

200 adult outpatients attending the TB and/or the smoking cessation clinic were recruited at the Montreal Chest Institute. Sensitivity and specificity were measured for each test against the corresponding composite reference standard. Test reliability was measured using kappa statistic for intra-rater and inter-rater agreement. Univariate and multivariate logistic regression models were used to explore possible covariates that might be related to false-positive and false-negative test results. IsoScreen had a sensitivity of 93.2% (95% confidence interval [CI] 80.3, 98.2) and specificity of 98.7% (94.8, 99.8). IsoScreen had intra-rater agreement (kappa) of 0.75 (0.48, 0.94) and inter-rater agreement of 0.61 (0.27, 0.90). SmokeScreen had a sensitivity of 69.2% (56.4, 79.8), specificity of 81.6% (73.0, 88.0), intra-rater agreement of 0.77 (0.56, 0.94), and inter-rater agreement of 0.66 (0.42, 0.88). False-positive SmokeScreen tests were strongly associated with INH treatment.

Conclusions

IsoScreen had high validity and reliability, whereas SmokeScreen had modest validity and reliability. SmokeScreen tests did not perform well in a population receiving INH due to the association between INH treatment and false-positive SmokeScreen test results. Development of the next generation SmokeScreen assay should account for this potential interference. 相似文献

5.

Norovirus infections in preterm infants: wide variety of clinical courses

Sven Armbrust Axel Kramer Dirk Olbertz Kathrin Zimmermann Christoph Fusch 《BMC research notes》2009,2(1):1-6

Background

The Clubfoot Assessment Protocol (CAP) was developed for follow-up of children treated for clubfoot. The objective of this study was to analyze reliability and validity of the six items used in the domain CAPMotion Quality using inexperienced assessors.

Findings

Four raters (two paediatric orthopaedic surgeons, two senior physiotherapists) used the CAP scores to analyze, on two different occasions, 11 videotapes containing standardized recordings of motion activity according to the domain CAPMotion Quality These results were compared to a criterion (two raters, well experienced CAP assessors) for validity and for checking for learning effect. Weighted kappa statistics, exact percentage observer agreement (Po), percentage observer agreement including one level difference (Po-1) and amount of scoring scales defined how reliability was to be interpreted. Inter- and intra rater differences were calculated using median and inter quartile ranges (IQR) on item level and mean and limits of agreement on domain level. Inter-rater reliability varied between fair and moderate (kappa) and had a mean agreement of 48/88% (Po/Po-1). Intra -rater reliability varied between moderate to good with a mean agreement of 63/96%. The intra- and inter-rater differences in the present study were generally small both on item (0.00) and domain level (-1.10). There was exact agreement of 51% and Po-1 of 91% of the six items with the criterion. No learning effect was found.

Conclusion

The CAPMotion quality can be used by inexperienced assessors with sufficient reliability in daily clinical practice and showed acceptable accuracy compared to the criterion. 相似文献

6.

Inter-Coder Agreement in One-to-Many Classification: Fuzzy Kappa

Andrei P. Kirilenko Svetlana Stepchenkova 《PloS one》2016,11(3)

Content analysis involves classification of textual, visual, or audio data. The inter-coder agreement is estimated by making two or more coders to classify the same data units, with subsequent comparison of their results. The existing methods of agreement estimation, e.g., Cohen’s kappa, require that coders place each unit of content into one and only one category (one-to-one coding) from the pre-established set of categories. However, in certain data domains (e.g., maps, photographs, databases of texts and images), this requirement seems overly restrictive. The restriction could be lifted, provided that there is a measure to calculate the inter-coder agreement in the one-to-many protocol. Building on the existing approaches to one-to-many coding in geography and biomedicine, such measure, fuzzy kappa, which is an extension of Cohen’s kappa, is proposed. It is argued that the measure is especially compatible with data from certain domains, when holistic reasoning of human coders is utilized in order to describe the data and access the meaning of communication. 相似文献

7.

The ZZP Questionnaire. Reliability of a new resource utilization measure

Frijters DH Achterberg WP 《Tijdschrift voor gerontologie en geriatrie》2007,38(4):165-172

Data to determine the resource utilization of care recipients need to be reliable and the items that are measured need to be useful. In 2006, the Dutch Ministry of Health and Welfare has mandated all nursing homes and homes for the elderly to measure the Resource Utilization of all residents with the ZZP Questionnaire. Are the data resulting from this measurement reliable and is each of the 54 items of the ZZP Questionnaire useful? To answer this we tested the reliability of the data in a nursing home and a home for the elderly in two wards each. For 122 residents questionnaires were completed such that the inter- and intra-rater reliability of the answers could be assessed. Ten of the 54 items in the questionnaire showed insufficient inter rater reliability (<0.40) on the weighted Cohen kappa and another sixteen moderate (0.40 - 0.60). On the intra rater reliability test seven items had an insufficient kappa and another fifteen moderate. Besides, ten clusters of items could be formed with in-cluster Spearman correlation rates of .75 or higher. From the results of the reliability tests and the item intercorrelation rates we concluded that a substantial number of items needs to be improved and that in the ZZP Questionnaire 15 of the 54 items appear to be redundant on statistical grounds. 相似文献

8.

Inference for kappas for longitudinal study data: applications to sexual health research

Ma Y Tang W Feng C Tu XM 《Biometrics》2008,64(3):781-789

Summary . Analysis of instrument reliability and rater agreement is used in a wide range of behavioral, medical, psychosocial, and health-care-related research to assess psychometric properties of instruments, consensus in disease diagnoses, fidelity of psychosocial intervention, and accuracy of proxy outcomes. For categorical outcomes, Cohen's kappa is the most widely used index of agreement and reliability. In many modern-day applications, data are often clustered, making inference difficult to perform using existing methods. In addition, as longitudinal study designs become increasingly popular, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this article, we develop a novel approach based on a new class of kappa estimates to tackle the complexities involved in addressing missing data and other related issues arising from a general multirater and longitudinal data setting. The approach is illustrated with real data in sexual health research. 相似文献

9.

Reliability of Multifrequency Bioelectrical Impedance Analysis to Quantify Body Composition in Patients After Musculoskeletal Trauma

Brandon Koch Aspen Miller Natalie A. Glass Erin Owen Tessa Kirkpatrick Ruth Grossman Steven M. Leary John Davison Michael C. Willey 《The Iowa orthopaedic journal》2022,42(1):75

BackgroundChanges in body composition, especially loss of lean mass, commonly occur in the orthopedic trauma population due to physical inactivity and inadequate nutrition. The purpose of this study was to assess inter-rater and intra-rater reliability of a portable bioelectrical impedance analysis (BIA) device to measure body composition in an orthopedic trauma population after operative fracture fixation. BIA uses a weak electric current to measure impedance (resistance) in the body and uses this to calculate the components of body composition using extensively studied formulas.MethodsTwenty subjects were enrolled, up to 72 hours after operative fixation of musculoskeletal injuries and underwent body composition measurements by two independent raters. One measurement was obtained by each rater at the time of enrollment and again between 1-4 hours after the initial measurement. Reliability was assessed using intraclass correlation coefficients (ICC) and minimum detectable change (MDC) values were calculated from these results.ResultsInter-rater reliability was excellent with ICC values for body fat mass (BFM), lean body mass (LBM), skeletal muscle mass (SMM), dry lean mass (DLM), and percent body fat (PBF) of 0.993, 0.984, 0.984, 0.979, and 0.986 respectively. Intra-rater reliability was also high for BFM, LBM, SMM, DLM, and PBF, at 0.994, 0.989, 0.990, 0.983, 0.987 (rater 1) and 0.994, 0.988, 0.989, 0.985, 0.989 (rater 2). MDC values were calculated to be 4.05 kg for BFM, 4.10 kg for LBM, 2.45 kg for SMM, 1.21 kg for DLM, and 4.83% for PBF.ConclusionPortable BIA devices are a versatile and attractive option that can reliably be used to assess body composition and changes in lean body mass in the orthopedic trauma population for both research and clinical endeavors. Level of Evidence: III 相似文献

10.

Betrouwbaarheid van de Zorgzwaartepakket (ZZP) scorelijst

D. H. M. Frijters W. P. Achterberg 《Tijdschrift voor gerontologie en geriatrie》2007,38(4):145-151

The ZZP Questionnaire. Reliability of a new resource utilization measure. Data to determine the resource utilization of care recipients need to be reliable and the items that are measured need to be useful. In 2006, the Dutch Ministry of Health and Welfare has mandated all nursing homes and homes for the elderly to measure the Resource Utilization of all residents with the ZZP Questionnaire. Are the data resulting from this measurement reliable and is each of the 54 items of the ZZP Questionnaire useful? To answer this we tested the reliability of the data in a nursing home and a home for the elderly in two wards each. For 122 residents questionnaires were completed such that the inter- and intra-rater reliability of the answers could be assessed. Ten of the 54 items in the questionnaire showed insufficient inter rater reliability (<0.40) on the weighted Cohen kappa and another sixteen moderate (0.40 – 0.60). On the intra rater reliability test seven items had an insufficient kappa and another fifteen moderate. Besides, ten clusters of items could be formed with in-cluster Spearman correlation rates of .75 or higher. From the results of the reliability tests and the item intercorrelation rates we concluded that a substantial number of items needs to be improved and that in the ZZP Questionnaire 15 of the 54 items appear to be redundant on statistical grounds.Tijdschr Gerontol Geriatr 2007; 38: 166-173 相似文献

11.

Factors Affecting Accuracy of Data Abstracted from Medical Records

Meredith N. Zozus Carl Pieper Constance M. Johnson Todd R. Johnson Amy Franklin Jack Smith Jiajie Zhang 《PloS one》2015,10(10)

ObjectiveMedical record abstraction (MRA) is often cited as a significant source of error in research data, yet MRA methodology has rarely been the subject of investigation. Lack of a common framework has hindered application of the extant literature in practice, and, until now, there were no evidence-based guidelines for ensuring data quality in MRA. We aimed to identify the factors affecting the accuracy of data abstracted from medical records and to generate a framework for data quality assurance and control in MRA.MethodsCandidate factors were identified from published reports of MRA. Content validity of the top candidate factors was assessed via a four-round two-group Delphi process with expert abstractors with experience in clinical research, registries, and quality improvement. The resulting coded factors were categorized into a control theory-based framework of MRA. Coverage of the framework was evaluated using the recent published literature.ResultsAnalysis of the identified articles yielded 292 unique factors that affect the accuracy of abstracted data. Delphi processes overall refuted three of the top factors identified from the literature based on importance and five based on reliability (six total factors refuted). Four new factors were identified by the Delphi. The generated framework demonstrated comprehensive coverage. Significant underreporting of MRA methodology in recent studies was discovered.ConclusionThe framework generated from this research provides a guide for planning data quality assurance and control for studies using MRA. The large number and variability of factors indicate that while prospective quality assurance likely increases the accuracy of abstracted data, monitoring the accuracy during the abstraction process is also required. Recent studies reporting research results based on MRA rarely reported data quality assurance or control measures, and even less frequently reported data quality metrics with research results. Given the demonstrated variability, these methods and measures should be reported with research results. 相似文献

12.

Inter-rater reliability of physiotherapists using the Action Research Arm Test in chronic stroke

Nicky Spence Nancy C.L. Rodrigues Polykarpos Angelos Nomikos Khalid Mohammed Yaseen Mansour Abdullah Alshehri 《Journal of musculoskeletal & neuronal interactions》2020,20(4):480

Objectives:The purpose of this study is to establish whether physiotherapists’ ratings are consistent, when using the Action Research Arm Test (ARAT) to score a chronic stroke patient.Methods:This was part of a large project establishing the reliability in chronic stroke. This study used a correlational design comparing the association between physiotherapist scores of the same patient, to establish the ARAT’s inter-rater reliability. The COSMIN checklist was followed to enhance the methodology of the study.Results:Twenty physiotherapists (8 female and 12 male) aged between 25 and 53 years were selected. There were no participant dropouts or withdrawals. The sample size was normally distributed. The physiotherapists appeared representative of the UK physiotherapy population, with the exception of gender. The distribution of scores showed a normal distribution with standard deviation of score of 1.9. The Kendall’s W test showed 0.711 of agreement between the raters. The scores achieved statistical significance showing consistency between physiotherapists’ scores with chronic stroke. Limitations of the study were the use of a small single center convenience sample that may reduce the generalizability of the findings.Conclusions:The ARAT is consistent when scored by physiotherapists in a chronic stroke population. The inter-rater reliability range was (0.70 to 0.90) which is categorized as good. 相似文献

13.

Reliability and intermethod agreement for body fat assessment among two field and two laboratory methods in adolescents

Vicente-Rodríguez G Rey-López JP Mesana MI Poortvliet E Ortega FB Polito A Nagy E Widhalm K Sjöström M Moreno LA;HELENA Study Group 《Obesity (Silver Spring, Md.)》2012,20(1):221-228

To increase knowledge about reliability and intermethods agreement for body fat (BF) is of interest for assessment, interpretation, and comparison purposes. It was aimed to examine intra- and inter-rater reliability, interday variability, and degree of agreement for BF using air-displacement plethysmography (Bod-Pod), dual-energy X-ray absorptiometry (DXA), bioelectrical impedance analysis (BIA), and skinfold measurements in European adolescents. Fifty-four adolescents (25 females) from Zaragoza and 30 (14 females) from Stockholm, aged 13-17 years participated in this study. Two trained raters in each center assessed BF with Bod-Pod, DXA, BIA, and anthropometry (DXA only in Zaragoza). Intermethod agreement and reliability were studied using a 4-way ANOVA for the same rater on the first day and two additional measurements on a second day, one each rater. Technical error of measurement (TEM) and percentage coefficient of reliability (%R) were also reported. No significant intrarater, inter-rater, or interday effect was observed for %BF for any method in either of the cities. In Zaragoza, %BF was significantly different when measured by Bod-Pod and BIA in comparison with anthropometry and DXA (all P < 0.001). The same result was observed in Stockholm (P < 0.001), except that DXA was not measured. Bod-Pod, DXA, BIA, and anthropometry are reliable for %BF repeated assessment within the same day by the same or different raters or in consecutive days by the same rater. Bod-Pod showed close agreement with BIA as did DXA with anthropometry; however, Bod-Pod and BIA presented higher values of %BF than anthropometry and DXA. 相似文献

14.

Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading

Fanshawe TR Lynch AG Ellis IO Green AR Hanka R 《PloS one》2008,3(8):e2925

Background

We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples.

Methodology/Principal Findings

We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples.

Conclusions/Significance

Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in clinical studies. 相似文献

15.

Feasibility and Inter-Rater Reliability of Physical Performance Measures in Acutely Admitted Older Medical Patients

Ann Christine Bodilsen Helle Gybel Juul-Larsen Janne Petersen Nina Beyer Ove Andersen Thomas Bandholm 《PloS one》2015,10(2)

ObjectivePhysical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter-rater reliability of four simple measures of physical performance in acutely admitted older medical patients.DesignDuring the first 24 hours of hospitalization, the following were assessed twice by different raters in 52 (≥ 65 years) patients admitted for acute medical illness: isometric hand grip strength, 4-meter gait speed, 30-s chair stand and Cumulated Ambulation Score. Relative reliability was expressed as weighted kappa for the Cumulated Ambulation Score or as intra-class correlation coefficient (ICC_1,1) and lower limit of the 95%-confidence interval (LL_95%) for grip strength, gait speed, and 30-s chair stand. Absolute reliability was expressed as the standard error of measurement and the smallest real difference as a percentage of their respective means (SEM% and SRD%).ResultsThe primary reasons for admission of the 52 included patients were infectious disease and cardiovascular illness. The mean± SD age was 78±8.3 years, and 73.1% were women. All patients performed grip strength and Cumulated Ambulation Score testing, 81% performed the gait speed test, and 54% completed the 30-s chair stand test (46% were unable to rise without using the armrests). No systematic bias was found between first and second tests or between raters. The weighted kappa for the Cumulated Ambulation Score was 0.76 (0.60–0.92). The ICC1,1 values were as follows: grip strength, 0.95 (LL_95% 0.92); gait speed, 0.92 (LL_95% 0.73), and 30-s chair stand, 0.82 (LL_95% 0.67). The SEM% values for grip strength, gait speed, and 30-s chair stand were 8%, 7%, and 18%, and the SRD_95% values were 22%, 17%, and 49%.ConclusionIn acutely admitted older medical patients, grip strength, gait speed, and the Cumulated Ambulation Score measurements were feasible and showed high inter-rater reliability when administered by different raters. The feasibility and inter-rater reliability of the 30-s chair stand were moderate, complicating the use of the 30-s chair stand in acutely admitted older medical patients. However, the predefined modified version of the chair stand test was both feasible and with high inter-rater reliability in this population. 相似文献

16.

Evaluating Written Patient Information for Eczema in German: Comparing the Reliability of Two Instruments,DISCERN and EQIP

Megan E. McCool Josepha Wahl Inga Schlecht Christian Apfelbacher 《PloS one》2015,10(10)

Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters’ scores for each instrument was measured with Pearson’s correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters’ scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema. 相似文献

17.

Verbal Autopsy: Evaluation of Methods to Certify Causes of Death in Uganda

Arthur Mpimbaza Scott Filler Agaba Katureebe Linda Quick Daniel Chandramohan Sarah G. Staedke 《PloS one》2015,10(6)

To assess different methods for determining cause of death from verbal autopsy (VA) questionnaire data, the intra-rater reliability of Physician-Certified Verbal Autopsy (PCVA) and the accuracy of PCVA, expert-derived (non-hierarchical) and data-driven (hierarchal) algorithms were assessed for determining common causes of death in Ugandan children. A verbal autopsy validation study was conducted from 2008-2009 in three different sites in Uganda. The dataset included 104 neonatal deaths (0-27 days) and 615 childhood deaths (1-59 months) with the cause(s) of death classified by PCVA and physician review of hospital medical records (the ‘reference standard’). Of the original 719 questionnaires, 141 (20%) were selected for a second review by the same physicians; the repeat cause(s) of death were compared to the original,and agreement assessed using the Kappa statistic.Physician reviewers’ refined non-hierarchical algorithms for common causes of death from existing expert algorithms, from which, hierarchal algorithms were developed. The accuracy of PCVA, non-hierarchical, and hierarchical algorithms for determining cause(s) of death from all 719 VA questionnaires was determined using the reference standard. Overall, intra-rater repeatability was high (83% agreement, Kappa 0.79 [95% CI 0.76-0.82]). PCVA performed well, with high specificity for determining cause of neonatal (>67%), and childhood (>83%) deaths, resulting in fairly accurate cause-specific mortality fraction (CSMF) estimates. For most causes of death in children, non-hierarchical algorithms had higher sensitivity, but correspondingly lower specificity, than PCVA and hierarchical algorithms, resulting in inaccurate CSMF estimates. Hierarchical algorithms were specific for most causes of death, and CSMF estimates were comparable to the reference standard and PCVA. Inter-rater reliability of PCVA was high, and overall PCVA performed well. Hierarchical algorithms performed better than non-hierarchical algorithms due to higher specificity and more accurate CSMF estimates. Use of PCVA to determine cause of death from VA questionnaire data is reasonable while automated data-driven algorithms are improved. 相似文献

18.

Measurement of Shoulder Range of Motion in Patients with Adhesive Capsulitis Using a Kinect

Seung Hak Lee Chiyul Yoon Sun Gun Chung Hee Chan Kim Youngbin Kwak Hee-won Park Keewon Kim 《PloS one》2015,10(6)

Range of motion (ROM) measurements are essential for the evaluation for and diagnosis of adhesive capsulitis of the shoulder (AC). However, taking these measurements using a goniometer is inconvenient and sometimes unreliable. The Kinect (Microsoft, Seattle, WA, USA) is gaining attention as a new motion detecting device that is nonintrusive and easy to implement. This study aimed to apply Kinect to measure shoulder ROM in AC; we evaluated its validity by calculating the agreement of the measurements obtained using Kinect with those obtained using goniometer and assessed its utility for the diagnosis of AC. Both shoulders of 15 healthy volunteers and affected shoulders of 12 patients with AC were included in the study. The passive and active ROM of each were measured with a goniometer for flexion, abduction, and external rotation. Their active shoulder motions for each direction were again captured using Kinect and the ROM values were calculated. The agreement between the two measurements was tested with the intraclass correlation coefficient (ICC). Diagnostic performance using the Kinect ROM was evaluated with Cohen’s kappa value. The cutoff values of the limited ROM were determined in the following ways: the same as passive ROM values, reflecting the mean difference, and based on receiver operating characteristic curves. The ICC for flexion/abduction/external rotation between goniometric passive ROM and the Kinect ROM were 0.906/0.942/0.911, while those between active ROMs and the Kinect ROMs were 0.864/0.932/0.925. Cohen’s kappa values were 0.88, 0.88, and 1.0 with the cutoff values in the order above. Measurements of the shoulder ROM using Kinect show excellent agreement with those taken using a goniometer. These results indicate that the Kinect can be used to measure shoulder ROM and to diagnose AC as an alternative to goniometer. 相似文献

19.

Adaptation and performance of a mobile application for early detection of cutaneous leishmaniasis

Luisa Rubiano Neal D. E. Alexander Ruth Mabel Castillo lvaro Jos Martínez Jonny Alejandro García Luna Juan David Arango Leonardo Vargas Patricia Madrin Lina-Rocío Hurtado Yenifer Orobio Carlos A. Rojas Helena del Corral Andrs Navarro Nancy Gore Saravia Eliah Aronoff-Spencer 《PLoS neglected tropical diseases》2021,15(2)

BackgroundDetection and management of neglected tropical diseases such as cutaneous leishmaniasis present unmet challenges stemming from their prevalence in remote, rural, resource constrained areas having limited access to health services. These challenges are frequently compounded by armed conflict or illicit extractive industries. The use of mobile health technologies has shown promise in such settings, yet data on outcomes in the field remain scarce.MethodsWe adapted a validated prediction rule for the presumptive diagnosis of CL to create a mobile application for use by community health volunteers. We used human-centered design practices and agile development for app iteration. We tested the application in three rural areas where cutaneous leishmaniasis is endemic and an urban setting where patients seek medical attention in the municipality of Tumaco, Colombia. The application was assessed for usability, sensitivity and inter-rater reliability (kappa) when used by community health volunteers (CHV), health workers and a general practitioner, study physician.ResultsThe application was readily used and understood. Among 122 screened cases with cutaneous ulcers, sensitivity to detect parasitologically proven CL was >95%. The proportion of participants with parasitologically confirmed CL was high (88%), precluding evaluation of specificity, and driving a high level of crude agreement between the app and parasitological diagnosis. The chance-adjusted agreement (kappa) varied across the components of the risk score. Time to diagnosis was reduced significantly, from 8 to 4 weeks on average when CHV conducted active case detection using the application, compared to passive case detection by health facility-based personnel.ConclusionsTranslating a validated prediction rule to a mHealth technology has shown the potential to improve the capacity of community health workers and healthcare personnel to provide opportune care, and access to health services for underserved populations. These findings support the use of mHealth tools for NTD research and healthcare. 相似文献

20.

Reliability and Validity of the Chinese Version Appropriateness Evaluation Protocol

Wenwei Liu Suwei Yuan Fengqing Wei Jing Yang Zhe Zhang Changbin Zhu Jin Ma 《PloS one》2015,10(8)

Objective

To adapt the Appropriateness Evaluation Protocol (AEP) to the specific settings of health care in China and to validate the Chinese version AEP (C-AEP).

Methods

Forward and backward translations were carried out to the original criteria. Twenty experts participated in the consultancy to form a preliminary version of the C-AEP. To ensure applicability, tests of reliability and validity were performed on 350 admissions and 3,226 hospital days of acute myocardial infraction patients and total hip replacement patients in two tertiary hospitals by two C-AEP reviewers and two physician reviewers. Overall agreement, specific agreement, and Cohen’s Kappa were calculated to compare the concordance of decisions between pairs of reviewers to test inter-rater reliability and convergent validity. The use of “overrides” and opinions of experts were recorded as measurements of content validity. Face validity was tested through collecting perspectives of nonprofessionals. Sensitivity, specificity, and predictive values were also reported.

Results

There are 14 admission and 24 days of care criteria in the initial version of C-AEP. Kappa coefficients indicate substantial agreement between reviewers: with regard to inter-rater reliability, Kappa (κ) coefficients are 0.746 (95% confidence interval [CI] 0.644–0.834) and 0.743 (95% CI 0.698–0.767) of admission and hospital days, respectively; for convergent validity, the κ statistics are 0.678 (95% CI 0.567–0.778) and 0.691 (95% CI 0.644–0.717), respectively. Overrides account for less than 2% of all judgments. Content validity and face validity were confirmed by experts and nonprofessionals, respectively. According to the C-AEP reviewers, 18.3% of admissions and 28.5% of inpatient days were deemed inappropriate.

Conclusions

The C-AEP is a reliable and valid screening tool in China’s tertiary hospitals. The prevalence of inappropriateness is substantial in our research. To reduce inappropriate utilization, further investigation is needed to elucidate the reasons and risk factors for this inappropriateness. 相似文献