首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky, or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. Existing verification bias correction in three‐way ROC analysis focuses on ordinal tests. We propose verification bias‐correction methods to construct ROC surface and estimate the VUS for a continuous diagnostic test, based on inverse probability weighting. By applying U‐statistics theory, we develop asymptotic properties for the estimator. A Jackknife estimator of variance is also derived. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. The proposed methods are used to assess the ability of a biomarker to accurately identify stages of Alzheimer's disease.  相似文献   

2.
C B Begg  R A Greenes 《Biometrics》1983,39(1):207-215
In the assessment of the statistical properties of a diagnostic test, for example the sensitivity and specificity of the test, it is common to derive estimates from a sample limited to those cases for whom subsequent definitive disease verification is obtained. Omission of nonverified cases can seriously bias the estimates. In order to adjust the estimates it is necessary to make assumptions about the mechanism for selecting cases for verification. Methods for making the necessary adjustments can then be derived.  相似文献   

3.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.  相似文献   

4.
Disease prevalence is ideally estimated using a 'gold standard' to ascertain true disease status on all subjects in a population of interest. In practice, however, the gold standard may be too costly or invasive to be applied to all subjects, in which case a two-phase design is often employed. Phase 1 data consisting of inexpensive and non-invasive screening tests on all study subjects are used to determine the subjects that receive the gold standard in the second phase. Naive estimates of prevalence in two-phase studies can be biased (verification bias). Imputation and re-weighting estimators are often used to avoid this bias. We contrast the forms and attributes of the various prevalence estimators. Distribution theory and simulation studies are used to investigate their bias and efficiency. We conclude that the semiparametric efficient approach is the preferred method for prevalence estimation in two-phase studies. It is more robust and comparable in its efficiency to imputation and other re-weighting estimators. It is also easy to implement. We use this approach to examine the prevalence of depression in adolescents with data from the Great Smoky Mountain Study.  相似文献   

5.
The ROC (receiver operating characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky, Faraggi and Schisterman (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non‐ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.  相似文献   

6.
The sensitivity and specificity of a new medical device are often compared relative to that of an existing device by calculating ratios of sensitivities and specificities. Although it would be ideal for all study subjects to receive the gold standard so true disease status was known for all subjects, it is often not feasible or ethical to obtain disease status for everyone. This paper proposes two unpaired designs where each subject is only administered one of the devices and device results dictate which subjects are to receive disease verification. Estimators of the ratio of accuracy and corresponding confidence intervals are proposed for these designs as well as sample size formulae. Simulation studies are performed to investigate the small sample bias of the estimators and the performance of the variance estimators and sample size formulae. The sample size formulae are applied to the design of a cervical cancer study to compare the accuracy of a new device with the conventional Pap smear.  相似文献   

7.
Albert PS 《Biometrics》2007,63(3):947-957
Interest often focuses on estimating sensitivity and specificity of a group of raters or a set of new diagnostic tests in situations in which gold standard evaluation is expensive or invasive. Various authors have proposed semilatent class modeling approaches for estimating diagnostic accuracy in this situation. This article presents imputation approaches for this problem. I show how imputation provides a simpler way of performing diagnostic accuracy and prevalence estimation than the use of semilatent modeling. Furthermore, the imputation approach is more robust to modeling assumptions and, in general, there is only a moderate efficiency loss relative to a correctly specified semilatent class model. I apply imputation to a study designed to estimate the diagnostic accuracy of digital radiography for gastric cancer. The feasibility and robustness of imputation is illustrated with analysis, asymptotic results, and simulations.  相似文献   

8.
Zheng Y  Barlow WE  Cutter G 《Biometrics》2005,61(1):259-268
The performance of a medical diagnostic test is often evaluated by comparing the outcome of the test to the patient's true disease state. Receiver operating characteristic analysis may then be used to summarize test accuracy. However, such analysis may encounter several complications in actual practice. One complication is verification bias, i.e., gold standard assessment of disease status may only be partially available and the probability of ascertainment of disease may depend on both the test result and characteristics of the subject. A second issue is that tests interpreted by the same rater may not be independent. Using estimating equations, we generalize previous methods that address these problems. We contrast the performance of alternative estimators of accuracy using robust sandwich variance estimators to permit valid asymptotic inference. We suggest that in the context of an observational cohort study where rich covariate information is available, a weighted estimating equations approach may be preferable for its robustness against model misspecification. We apply the methodology to mammography as performed by community radiologists.  相似文献   

9.
The comparison of the efficiency of two binary diagnostic tests requires one to know the disease status for all patients in the sample, by applying a gold standard. In two-phase studies the gold standard is not applied to all patients in a sample, and the problem of partial verification of the disease arises. At present, one of the approaches most used for comparing two binary diagnostic tests are the likelihood ratios. In this study, the maximum likelihood estimators of likelihood ratios are obtained. The tests of hypothesis to compare the likelihood ratios of two binary diagnostic tests when both are applied to the same random sample in the presence of verification bias are deduced, and simulation experiments are performed in order to investigate the asymptotic behaviour of the tests of hypothesis. The results obtained have been applied to the study of Alzheimer's disease.  相似文献   

10.
Zhou XH  Castelluccio P  Zhou C 《Biometrics》2005,61(2):600-609
In the evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such a gold standard. If an imperfect standard is used, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this article we develop a nonparametric maximum likelihood method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows that the proposed estimators for the ROC curve areas have good finite-sample properties in terms of bias and mean squared error. Further simulation studies show that our nonparametric approach is comparable to the binormal parametric method, and is easier to implement. Finally, we illustrate the application of the proposed method in a real clinical study on assessing the accuracy of seven specific pathologists in detecting carcinoma in situ of the uterine cervix.  相似文献   

11.
Summary In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have been criticized in the literature from both a conceptual and a robustness perspective. As an alternative, we propose an approach where we exploit an imperfect reference standard with unknown diagnostic accuracy and conduct sensitivity analysis by varying this accuracy over scientifically reasonable ranges. In this article, a latent class model with crossed random effects is proposed for estimating the diagnostic accuracy of regional obstetrics and gynaecological (OB/GYN) physicians in diagnosing endometriosis. To avoid the pitfalls of models without a gold standard, we exploit the diagnostic results of a group of OB/GYN physicians with an international reputation for the diagnosis of endometriosis. We construct an ordinal reference standard based on the discordance among these international experts and propose a mechanism for conducting sensitivity analysis relative to the unknown diagnostic accuracy among them. A Monte Carlo EM algorithm is proposed for parameter estimation and a BIC‐type model selection procedure is presented. Through simulations and data analysis we show that this new approach provides a useful alternative to traditional latent class modeling approaches used in this setting.  相似文献   

12.
BACKGROUND: The current, arbitrarily defined gold standard for the diagnosis of H. pylori infection requires histologic examination of two specially stained antral biopsy specimens. However, routine histology is potentially limited in general clinical practice by both sampling and observer error. The current study was designed to examine the diagnostic performance of invasive and non-invasive H. pylori detection methods that would likely be available in general clinical practice. METHODS: The diagnostic performance of rotating clinical pathology faculty using thiazine staining was compared with that of an expert gastrointestinal pathologist in 38 patients. In situ hybridization stains of adjacent biopsy cuts were also examined by the expert pathologist for further comparison. Receiver operator characteristic (ROC) analysis was performed to evaluate whether the diagnostic performance of the expert pathologist differed depending upon the histologic method employed. A similar analysis was made to evaluate the diagnostic performance of pathology trainees relative to the expert. In the absence of an established invasive gold standard, non-invasive testing methods (rapid serum antibodies, formal Elisa antibodies and carbon-14 urea breath testing) were evaluated in 74 patients by comparison with a gold standard defined using a combination of diagnostic tests. RESULTS: Using either rapid urease testing of biopsy specimens or urea breath testing as the gold standard for comparison, the diagnostic performance of the rotating clinical pathology faculty was inferior to that of the expert gastrointestinal pathologist especially with regard to specificity (e.g., 69 percent for the former versus 88 percent, with the latter relative to rapid urease testing). Although interpretation of in situ hybridization staining by the expert appeared to have an even higher specificity, ROC analysis failed to show a difference. The mean ROC areas for thiazine and in situ hybridization staining for trainee pathologists relative to the expert were 0.88 and 0.94, respectively. In untreated patients, urea breath testing had a sensitivity and specificity of 100 percent as compared with thiazine staining with a sensitivity of 83 percent and a specificity of 97 percent. Post-therapy, breath testing had a sensitivity of 100 percent but a specificity of only 86 percent as compared with invasive testing with a sensitivity and specificity of 100 percent. Rapid serum antibody testing and formal Elisa antibody testing agreed in 93 percent of cases (Kappa 0.78) with the rapid test being correct in three of the four disagreements. CONCLUSIONS: The current study illustrates a number of realities regarding H. pylori diagnosis. There is no diagnostic gold standard in general clinical practice. Accurate interpretation of specially stained slides is a learned activity with a tendency towards overdiagnosis early on. Urea breath testing is likely to be the diagnostic method of choice for untreated patients in general clinical practice although antibody testing is almost as accurate. Rapid antibody tests are at least as accurate as formal Elisa antibody tests. Urea breath testing is useful for confirming cure after therapy, but false-positive results may occur in some patients.  相似文献   

13.
In diagnostic studies, a new diagnostic test is often compared with a standard test and both tests are applied on the same patients, called paired design. The true disease state is in general given by the so‐called gold standard (most reliable method for classification), which has to be known for all patients. The benefit of the new diagnostic test can be evaluated by sensitivity and specificity, which are in fact proportions. This means, for the comparison of two diagnostic tests, confidence intervals for the difference of the dependent estimated sensitivities and specificities are calculated. In the literature, many comparisons of different approaches can be found, but none explicitly for diagnostic studies. For this reason we compare 13 approaches for a set of scenarios that represent data of diagnostic studies (e.g., with sensitivity and specificity ?0.8). With simulation studies, we show that the nonparametric interval with normal approximation can be recommended for the difference of two dependent sensitivities or specificities without restriction, the Wald interval with the limitation of slightly anti‐conservative results for small sample sizes, and the nonparametric intervals with t‐approximation, and the Tango interval with the limitation of conservative results for high correlations.  相似文献   

14.
Colorectal cancer (CRC) is ranked as the second most common cause of cancer deaths and the third most common cancer globally. It has been described as a ‘silent disease’ which is often easily treatable if detected early—before progression to carcinoma. Colonoscopy, which is the gold standard for diagnosis is not only expensive but is also an invasive diagnostic procedure, thus, effective and non-invasive diagnostic methods are urgently needed. Unfortunately, the current methods are not sensitive and specific enough in detecting adenomas and early colorectal neoplasia, hampering treatment and consequently, survival rates. Studies have shown that imbalances in such a relationship which renders the gut microbiota in a dysbiotic state are implicated in the development of adenomas ultimately resulting in CRC. The differences found in the makeup and diversity of the gut microbiota of healthy individuals relative to CRC patients have in recent times gained attention as potential biomarkers in early non-invasive diagnosis of CRC, with promising sensitivity, specificity and even cost-effectiveness. This review summarizes recent studies in the application of these microbiota biomarkers in early CRC diagnosis, limitations encountered in the area of the faecal microbiota studies as biomarkers for CRC, and future research exploits that address these limitations.  相似文献   

15.
Rodenberg C  Zhou XH 《Biometrics》2000,56(4):1256-1262
A receiver operating characteristic (ROC) curve is commonly used to measure the accuracy of a medical test. It is a plot of the true positive fraction (sensitivity) against the false positive fraction (1-specificity) for increasingly stringent positivity criterion. Bias can occur in estimation of an ROC curve if only some of the tested patients are selected for disease verification and if analysis is restricted only to the verified cases. This bias is known as verification bias. In this paper, we address the problem of correcting for verification bias in estimation of an ROC curve when the verification process and efficacy of the diagnostic test depend on covariates. Our method applies the EM algorithm to ordinal regression models to derive ML estimates for ROC curves as a function of covariates, adjusted for covariates affecting the likelihood of being verified. Asymptotic variance estimates are obtained using the observed information matrix of the observed data. These estimates are derived under the missing-at-random assumption, which means that selection for disease verification depends only on the observed data, i.e., the test result and the observed covariates. We also address the issues of model selection and model checking. Finally, we illustrate the proposed method on data from a two-phase study of dementia disorders, where selection for verification depends on the screening test result and age.  相似文献   

16.
Continuous biomarkers are common for disease screening and diagnosis. To reach a dichotomous clinical decision, a threshold would be imposed to distinguish subjects with disease from nondiseased individuals. Among various performance metrics, specificity at a controlled sensitivity level (or vice versa) is often desirable because it directly targets the clinical utility of the intended clinical test. Meanwhile, covariates, such as age, race, as well as sample collection conditions, could impact the biomarker distribution and may also confound the association between biomarker and disease status. Therefore, covariate adjustment is important in such biomarker evaluation. Most existing covariate adjustment methods do not specifically target the desired sensitivity/specificity level, but rather do so for the entire biomarker distribution. As such, they might be more prone to model misspecification. In this paper, we suggest a parsimonious quantile regression model for the diseased population, only locally at the controlled sensitivity level, and assess specificity with covariate-specific control of the sensitivity. Variance estimates are obtained from a sample-based approach and bootstrap. Furthermore, our proposed local model extends readily to a global one for covariate adjustment for the receiver operating characteristic (ROC) curve over the sensitivity continuum. We demonstrate computational efficiency of this proposed method and restore the inherent monotonicity in the estimated covariate-adjusted ROC curve. The asymptotic properties of the proposed estimators are established. Simulation studies show favorable performance of the proposal. Finally, we illustrate our method in biomarker evaluation for aggressive prostate cancer.  相似文献   

17.
Various global health initiatives are currently advocating the elimination of schistosomiasis within the next decade. Schistosomiasis is a highly debilitating tropical infectious disease with severe burden of morbidity and thus operational research accurately evaluating diagnostics that quantify the epidemic status for guiding effective strategies is essential. Latent class models (LCMs) have been generally considered in epidemiology and in particular in recent schistosomiasis diagnostic studies as a flexible tool for evaluating diagnostics because assessing the true infection status (via a gold standard) is not possible. However, within the biostatistics literature, classical LCM have already been criticised for real-life problems under violation of the conditional independence (CI) assumption and when applied to a small number of diagnostics (i.e. most often 3-5 diagnostic tests). Solutions of relaxing the CI assumption and accounting for zero-inflation, as well as collecting partial gold standard information, have been proposed, offering the potential for more robust model estimates. In the current article, we examined such approaches in the context of schistosomiasis via analysis of two real datasets and extensive simulation studies. Our main conclusions highlighted poor model fit in low prevalence settings and the necessity of collecting partial gold standard information in such settings in order to improve the accuracy and reduce bias of sensitivity and specificity estimates.  相似文献   

18.
Sensitivity and specificity have traditionally been used to assess the performance of a diagnostic procedure. Diagnostic procedures with both high sensitivity and high specificity are desirable, but these procedures are frequently too expensive, hazardous, and/or difficult to operate. A less sophisticated procedure may be preferred, if the loss of the sensitivity or specificity is determined to be clinically acceptable. This paper addresses the problem of simultaneous testing of sensitivity and specificity for an alternative test procedure with a reference test procedure when a gold standard is present. The hypothesis is formulated as a compound hypothesis of two non‐inferiority (one‐sided equivalence) tests. We present an asymptotic test statistic based on the restricted maximum likelihood estimate in the framework of comparing two correlated proportions under the prospective and retrospective sampling designs. The sample size and power of an asymptotic test statistic are derived. The actual type I error and power are calculated by enumerating the exact probabilities in the rejection region. For applications that require high sensitivity as well as high specificity, a large number of positive subjects and a large number of negative subjects are needed. We also propose a weighted sum statistic as an alternative test by comparing a combined measure of sensitivity and specificity of the two procedures. The sample size determination is independent of the sampling plan for the two tests.  相似文献   

19.
Salivary diagnostics has great potential to be used in the early detection and prevention of many cancerous diseases. If implemented with rigour and efficiency, it can result in improving patient survival times and achieving earlier diagnosis of disease. Recently, extraordinary efforts have been taken to develop non‐invasive technologies that can be applied without complicated and expensive procedures. Saliva is a biofluid that has demonstrated excellent properties and can be used as a diagnostic fluid, since many of the biomarkers suggested for cancers can also be found in whole saliva, apart from blood or other body fluids. The currently accepted gold standard methods for biomarker development include chromatography, mass spectometry, gel electrophoresis, microarrays and polymerase chain reaction‐based quantification. However, salivary diagnostics is a flourishing field with the rapid development of novel technologies associated with point‐of‐care diagnostics, RNA sequencing, electrochemical detection and liquid biopsy. Those technologies will help introduce population‐based screening programs, thus enabling early detection, prognosis assessment and disease monitoring. The purpose of this review is to give a comprehensive update on the emerging diagnostic technologies and tools for the early detection of cancerous diseases based on saliva.  相似文献   

20.
Diagnostic or screening tests are widely used in medical fields to classify patients according to their disease status. Several statistical models for meta‐analysis of diagnostic test accuracy studies have been developed to synthesize test sensitivity and specificity of a diagnostic test of interest. Because of the correlation between test sensitivity and specificity, modeling the two measures using a bivariate model is recommended. In this paper, we extend the current standard bivariate linear mixed model (LMM) by proposing two variance‐stabilizing transformations: the arcsine square root and the Freeman–Tukey double arcsine transformation. We compared the performance of the proposed methods with the standard method through simulations using several performance measures. The simulation results showed that our proposed methods performed better than the standard LMM in terms of bias, root mean square error, and coverage probability in most of the scenarios, even when data were generated assuming the standard LMM. We also illustrated the methods using two real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号