首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The sensitivity and specificity of a new medical device are often compared relative to that of an existing device by calculating ratios of sensitivities and specificities. Although it would be ideal for all study subjects to receive the gold standard so true disease status was known for all subjects, it is often not feasible or ethical to obtain disease status for everyone. This paper proposes two unpaired designs where each subject is only administered one of the devices and device results dictate which subjects are to receive disease verification. Estimators of the ratio of accuracy and corresponding confidence intervals are proposed for these designs as well as sample size formulae. Simulation studies are performed to investigate the small sample bias of the estimators and the performance of the variance estimators and sample size formulae. The sample size formulae are applied to the design of a cervical cancer study to compare the accuracy of a new device with the conventional Pap smear.  相似文献   

2.
A "gold" standard test, providing definitive verification of disease status, may be quite invasive or expensive. Current technological advances provide less invasive, or less expensive, diagnostic tests. Ideally, a diagnostic test is evaluated by comparing it with a definitive gold standard test. However, the decision to perform the gold standard test to establish the presence or absence of disease is often influenced by the results of the diagnostic test, along with other measured, or not measured, risk factors. If only data from patients who received the gold standard test were used to assess the test performance, the commonly used measures of diagnostic test performance--sensitivity and specificity--are likely to be biased. Sensitivity would often be higher, and specificity would be lower, than the true values. This bias is called verification bias. Without adjustment for verification bias, one may possibly introduce into the medical practice a diagnostic test with apparent, but not truly, high sensitivity. In this article, verification bias is treated as a missing covariate problem. We propose a flexible modeling and computational framework for evaluating the performance of a diagnostic test, with adjustment for nonignorable verification bias. The presented computational method can be utilized with any software that can repetitively use a logistic regression module. The approach is likelihood-based, and allows use of categorical or continuous covariates. An explicit formula for the observed information matrix is presented, so that one can easily compute standard errors of estimated parameters. The methodology is illustrated with a cardiology data example. We perform a sensitivity analysis of the dependency of verification selection process on disease.  相似文献   

3.
Exact tests for one sample correlated binary data   总被引:1,自引:0,他引:1  
In this paper we developed exact tests for one sample correlated binary data whose cluster sizes are at most two. Although significant progress has been made in the development and implementation of the exact tests for uncorrelated data, exact tests for correlated data are rare. Lack of a tractable likelihood function has made it difficult to develop exact tests for correlated binary data. However, when cluster sizes of binary data are at most two, only three parameters are needed to characterize the problem. One parameter is fixed under the null hypothesis, while the other two parameters can be removed by both conditional and unconditional approaches, respectively, to construct exact tests. We compared the exact and asymptotic p-values in several cases. The proposed method is applied to real-life data.  相似文献   

4.
Alonzo TA  Kittelson JM 《Biometrics》2006,62(2):605-612
The accuracy (sensitivity and specificity) of a new screening test can be compared with that of a standard test by applying both tests to a group of subjects in which disease status can be determined by a gold standard (GS) test. However, it is not always feasible to administer a GS test to all study subjects. For example, a study is planned to determine whether a new screening test for cervical cancer ("ThinPrep") is better than the standard test ("Pap"), and in this setting it is not feasible (or ethical) to determine disease status by biopsy in order to identify women with and without disease for participation in a study. When determination of disease status is not possible for all study subjects, the relative accuracy of two screening tests can still be estimated by using a paired screen-positive (PSP) design in which all subjects receive both screening tests, but only have the GS test if one of the screening tests is positive. Unfortunately in the cervical cancer example, the PSP design is also infeasible because it is not technically possible to administer both the ThinPrep and Pap at the same time. In this article, we describe a randomized paired screen-positive (RPSP) design in which subjects are randomized to receive one of the two screening tests initially, and only receive the other screening test and GS if the first screening test is positive. We derive maximum likelihood estimators and confidence intervals for the relative accuracy of the two screening tests, and assess the small sample behavior of these estimators using simulation studies. Sample size formulae are derived and applied to the cervical cancer screening trial example, and the efficiency of the RPSP design is compared with other designs.  相似文献   

5.
Sensitivity and specificity have traditionally been used to assess the performance of a diagnostic procedure. Diagnostic procedures with both high sensitivity and high specificity are desirable, but these procedures are frequently too expensive, hazardous, and/or difficult to operate. A less sophisticated procedure may be preferred, if the loss of the sensitivity or specificity is determined to be clinically acceptable. This paper addresses the problem of simultaneous testing of sensitivity and specificity for an alternative test procedure with a reference test procedure when a gold standard is present. The hypothesis is formulated as a compound hypothesis of two non‐inferiority (one‐sided equivalence) tests. We present an asymptotic test statistic based on the restricted maximum likelihood estimate in the framework of comparing two correlated proportions under the prospective and retrospective sampling designs. The sample size and power of an asymptotic test statistic are derived. The actual type I error and power are calculated by enumerating the exact probabilities in the rejection region. For applications that require high sensitivity as well as high specificity, a large number of positive subjects and a large number of negative subjects are needed. We also propose a weighted sum statistic as an alternative test by comparing a combined measure of sensitivity and specificity of the two procedures. The sample size determination is independent of the sampling plan for the two tests.  相似文献   

6.
Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

7.
In diagnostic studies, a new diagnostic test is often compared with a standard test and both tests are applied on the same patients, called paired design. The true disease state is in general given by the so‐called gold standard (most reliable method for classification), which has to be known for all patients. The benefit of the new diagnostic test can be evaluated by sensitivity and specificity, which are in fact proportions. This means, for the comparison of two diagnostic tests, confidence intervals for the difference of the dependent estimated sensitivities and specificities are calculated. In the literature, many comparisons of different approaches can be found, but none explicitly for diagnostic studies. For this reason we compare 13 approaches for a set of scenarios that represent data of diagnostic studies (e.g., with sensitivity and specificity ?0.8). With simulation studies, we show that the nonparametric interval with normal approximation can be recommended for the difference of two dependent sensitivities or specificities without restriction, the Wald interval with the limitation of slightly anti‐conservative results for small sample sizes, and the nonparametric intervals with t‐approximation, and the Tango interval with the limitation of conservative results for high correlations.  相似文献   

8.
Zhou XH  Castelluccio P  Zhou C 《Biometrics》2005,61(2):600-609
In the evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such a gold standard. If an imperfect standard is used, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this article we develop a nonparametric maximum likelihood method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows that the proposed estimators for the ROC curve areas have good finite-sample properties in terms of bias and mean squared error. Further simulation studies show that our nonparametric approach is comparable to the binormal parametric method, and is easier to implement. Finally, we illustrate the application of the proposed method in a real clinical study on assessing the accuracy of seven specific pathologists in detecting carcinoma in situ of the uterine cervix.  相似文献   

9.

Background

Culture remains the diagnostic gold standard for many bacterial infections, and the method against which other tests are often evaluated. Specificity of culture is 100% if the pathogenic organism is not found in healthy subjects, but the sensitivity of culture is more difficult to determine and may be low. Here, we apply Bayesian latent class models (LCMs) to data from patients with a single Gram-negative bacterial infection and define the true sensitivity of culture together with the impact of misclassification by culture on the reported accuracy of alternative diagnostic tests.

Methods/Principal Findings

Data from published studies describing the application of five diagnostic tests (culture and four serological tests) to a patient cohort with suspected melioidosis were re-analysed using several Bayesian LCMs. Sensitivities, specificities, and positive and negative predictive values (PPVs and NPVs) were calculated. Of 320 patients with suspected melioidosis, 119 (37%) had culture confirmed melioidosis. Using the final model (Bayesian LCM with conditional dependence between serological tests), the sensitivity of culture was estimated to be 60.2%. Prediction accuracy of the final model was assessed using a classification tool to grade patients according to the likelihood of melioidosis, which indicated that an estimated disease prevalence of 61.6% was credible. Estimates of sensitivities, specificities, PPVs and NPVs of four serological tests were significantly different from previously published values in which culture was used as the gold standard.

Conclusions/Significance

Culture has low sensitivity and low NPV for the diagnosis of melioidosis and is an imperfect gold standard against which to evaluate alternative tests. Models should be used to support the evaluation of diagnostic tests with an imperfect gold standard. It is likely that the poor sensitivity/specificity of culture is not specific for melioidosis, but rather a generic problem for many bacterial and fungal infections.  相似文献   

10.
The ROC (receiver operating characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky, Faraggi and Schisterman (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non‐ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.  相似文献   

11.
Disease prevalence is ideally estimated using a 'gold standard' to ascertain true disease status on all subjects in a population of interest. In practice, however, the gold standard may be too costly or invasive to be applied to all subjects, in which case a two-phase design is often employed. Phase 1 data consisting of inexpensive and non-invasive screening tests on all study subjects are used to determine the subjects that receive the gold standard in the second phase. Naive estimates of prevalence in two-phase studies can be biased (verification bias). Imputation and re-weighting estimators are often used to avoid this bias. We contrast the forms and attributes of the various prevalence estimators. Distribution theory and simulation studies are used to investigate their bias and efficiency. We conclude that the semiparametric efficient approach is the preferred method for prevalence estimation in two-phase studies. It is more robust and comparable in its efficiency to imputation and other re-weighting estimators. It is also easy to implement. We use this approach to examine the prevalence of depression in adolescents with data from the Great Smoky Mountain Study.  相似文献   

12.
Yu Shen  Dongfeng Wu  Marvin Zelen 《Biometrics》2001,57(4):1009-1017
Consider two diagnostic procedures having binary outcomes. If one of the tests results in a positive finding, a more definitive diagnostic procedure will be administered to establish the presence or absence of a disease. The use of both tests will improve the overall screening sensitivity when the two tests are independent, compared with employing two tests that are positively correlated. We estimate the correlation coefficient of the two tests and derive statistical methods for testing the independence of the two diagnostic procedures conditional on disease status. The statistical tests are used to investigate the independence of mammography and clinical breast exams aimed at establishing the benefit of early detection of breast cancer. The data used in the analysis are obtained from periodic screening examinations of three randomized clinical trials of breast cancer screening. Analysis of each of these trials confirms the independence of the clinical breast and mammography examinations. Based on these three large clinical trials, we conclude that a clinical breast exam considerably increases the overall sensitivity relative to screening with mammography alone and should be routinely included in early breast cancer detection programs.  相似文献   

13.
The effect of conditional dependence on the evaluation of diagnostic tests   总被引:5,自引:0,他引:5  
P M Vacek 《Biometrics》1985,41(4):959-968
The accuracy of a new diagnostic test is often determined by comparison with a reference test which also has unknown error rates. Maximum likelihood estimation of the error rates of both tests is possible if they are simultaneously applied to two populations with different disease prevalences. The estimation procedure assumes that the two tests are independent, conditional on a subject's true diagnostic status. If the tests are conditionally dependent, error rates for both tests can be substantially underestimated. Estimators for the prevalence rates in the two populations can be positively or negatively biased, depending on the relative magnitude of the two conditional covariances and the value of the prevalence parameter.  相似文献   

14.
McNemar's test is popular for assessing the difference between proportions when two observations are taken on each experimental unit. It is useful under a variety of epidemiological study designs that produce correlated binary outcomes. In studies involving outcome ascertainment, cost or feasibility concerns often lead researchers to employ error-prone surrogate diagnostic tests. Assuming an available gold standard diagnostic method, we address point and confidence interval estimation of the true difference in proportions and the paired-data odds ratio by incorporating external or internal validation data. We distinguish two special cases, depending on whether it is reasonable to assume that the diagnostic test properties remain the same for both assessments (e.g., at baseline and at follow-up). Likelihood-based analysis yields closed-form estimates when validation data are external and requires numeric optimization when they are internal. The latter approach offers important advantages in terms of robustness and efficient odds ratio estimation. We consider internal validation study designs geared toward optimizing efficiency given a fixed cost allocated for measurements. Two motivating examples are presented, using gold standard and surrogate bivariate binary diagnoses of bacterial vaginosis (BV) on women participating in the HIV Epidemiology Research Study (HERS).  相似文献   

15.
Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high‐cost infallible gold‐standard classifiers or low‐cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold‐standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close‐to‐nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies.  相似文献   

16.
《CMAJ》1983,129(9):947-954
The use of simple maths with the likelihood ratio strategy fits in nicely with our clinical views. By making the most out of the entire range of diagnostic test results (i.e., several levels, each with its own likelihood ratio, rather than a single cut-off point and a single ratio) and by permitting us to keep track of the likelihood that a patient has the target disorder at each point along the diagnostic sequence, this strategy allows us to place patients at an extremely high or an extremely low likelihood of disease. Thus, the numbers of patients with ultimately false-positive results (who suffer the slings of labelling and the arrows of needless therapy) and of those with ultimately false-negative results (who therefore miss their chance for diagnosis and, possibly, efficacious therapy) will be dramatically reduced. The following guidelines will be useful in interpreting signs, symptoms and laboratory tests with the likelihood ratio strategy: Seek out, and demand from the clinical or laboratory experts who ought to know, the likelihood ratios for key symptoms and signs, and several levels (rather than just the positive and negative results) of diagnostic test results. Identify, when feasible, the logical sequence of diagnostic tests. Estimate the pretest probability of disease for the patient, and, using either the nomogram or the conversion formulas, apply the likelihood ratio that corresponds to the first diagnostic test result. While remembering that the resulting post-test probability or odds from the first test becomes the pretest probability or odds for the next diagnostic test, repeat the process for all the pertinent symptoms, signs and laboratory studies that pertain to the target disorder. However, these combinations may not be independent, and convergent diagnostic tests, if treated as independent, will combine to overestimate the final post-test probability of disease. You are now far more sophisticated in interpreting diagnostic tests than most of your teachers. In the last part of our series we will show you some rather complex strategies that combine diagnosis and therapy, quantify our as yet nonquantified ideas about use, and require the use of at least a hand calculator.  相似文献   

17.
The nonparametric Behrens‐Fisher hypothesis is the most appropriate null hypothesis for the two‐sample comparison when one does not wish to make restrictive assumptions about possible distributions. In this paper, a numerical approach is described by which the likelihood ratio test can be calculated for the nonparametric Behrens‐Fisher problem. The approach taken here effectively reduces the number of parameters in the score equations to one by using a recursive formula for the remaining parameters. The resulting single dimensional problem can be solved numerically. The power of the likelihood ratio test is compared by simulation to that of a generalized Wilcoxon test of Brunner and Munzel. The tests have similar power for all alternatives considered when a simulated null distribution is used to generate cutoff values for the tests. The methods are illustrated on data on shoulder pain from a clinical trial.  相似文献   

18.
Dendukuri N  Joseph L 《Biometrics》2001,57(1):158-167
Many analyses of results from multiple diagnostic tests assume the tests are statistically independent conditional on the true disease status of the subject. This assumption may be violated in practice, especially in situations where none of the tests is a perfectly accurate gold standard. Classical inference for models accounting for the conditional dependence between tests requires that results from at least four different tests be used in order to obtain an identifiable solution, but it is not always feasible to have results from this many tests. We use a Bayesian approach to draw inferences about the disease prevalence and test properties while adjusting for the possibility of conditional dependence between tests, particularly when we have only two tests. We propose both fixed and random effects models. Since with fewer than four tests the problem is nonidentifiable, the posterior distributions are strongly dependent on the prior information about the test properties and the disease prevalence, even with large sample sizes. If the degree of correlation between the tests is known a priori with high precision, then our methods adjust for the dependence between the tests. Otherwise, our methods provide adjusted inferences that incorporate all of the uncertainty inherent in the problem, typically resulting in wider interval estimates. We illustrate our methods using data from a study on the prevalence of Strongyloides infection among Cambodian refugees to Canada.  相似文献   

19.
Prospective studies of diagnostic test accuracy have important advantages over retrospective designs. Yet, when the disease being detected by the diagnostic test(s) has a low prevalence rate, a prospective design can require an enormous sample of patients. We consider two strategies to reduce the costs of prospective studies of binary diagnostic tests: stratification and two-phase sampling. Utilizing neither, one, or both of these strategies provides us with four study design options: (1) the conventional design involving a simple random sample (SRS) of patients from the clinical population; (2) a stratified design where patients from higher-prevalence subpopulations are more heavily sampled; (3) a simple two-phase design using a SRS in the first phase and selection for the second phase based on the test results from the first; and (4) a two-phase design with stratification in the first phase. We describe estimators for sensitivity and specificity and their variances for each design, along with sample size estimation. We offer some recommendations for choosing among the various designs. We illustrate the study designs with two examples.  相似文献   

20.
The pitfall of several reviews of noninvasive venous assessment has been the expression of the test data solely in terms of diagnostic accuracy (the number of correct tests in ratio to all tests performed), where results of a test will vary according to disease prevalence. The advantages of receiver operator characteristic curve analysis are twofold: (1) it describes the dynamic relationship between sensitivity (the ratio of the number of true positive tests to the patients with deep venous thrombosis) and specificity (the ratio of true negative tests to the number of patients with no deep venous thrombosis) independent of disease prevalence; and (2) the threshold criteria that defines a positive test can be set by the best balance between sensitivity and specificity and then applied to a given patient population for its diagnostic accuracy. Venous volume plethysmography is a widely used, simple and rapid method. It was compared to the "gold standard" of phlebography in a prospective blind study of 70 limbs that were clinically suspect of having deep venous thrombosis (DVT). Venous volume displacement plethysmography was defined objectively by three quantitative parameters: (1) maximum venous outflow, (2) integer ratio, and (3) segmental venous capacitance ratio. The DVT (22 to 70 positive phlebograms) was divided by anatomic location into either calf vein DVT or proximal DVT (popliteal vein or above). By combining these three parameters, a balance between sensitivity and specificity was obtained to provide a rapid, objective method for screening patients with suspected DVT.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号