首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Dukic V  Gatsonis C 《Biometrics》2003,59(4):936-946
Current meta-analytic methods for diagnostic test accuracy are generally applicable to a selection of studies reporting only estimates of sensitivity and specificity, or at most, to studies whose results are reported using an equal number of ordered categories. In this article, we propose a new meta-analytic method to evaluate test accuracy and arrive at a summary receiver operating characteristic (ROC) curve for a collection of studies evaluating diagnostic tests, even when test results are reported in an unequal number of nonnested ordered categories. We discuss both non-Bayesian and Bayesian formulations of the approach. In the Bayesian setting, we propose several ways to construct summary ROC curves and their credible bands. We illustrate our approach with data from a recently published meta-analysis evaluating a single serum progesterone test for diagnosing pregnancy failure.  相似文献   

2.
Many medical diagnostic studies involve three ordinal diagnostic groups in which the diagnostic accuracy can be summarized by the volume or partial volume under a Receiver Operating Characteristic (ROC) surface. We study in this paper the statistical comparison of diagnostic accuracy from multiple diagnostic tests when three ordinal diagnostic groups are involved. Under the assumption that the multiple diagnostic tests follow a multivariate normal distribution within each diagnostic group, we provide the asymptotic variance and covariance for the maximum likelihood estimates of the volumes under the ROC surfaces from multiple diagnostic tests and propose statistical tests to test whether the diagnostic accuracy as measured by the volume under the ROC surface is the same for multiple diagnostic tests. We also propose a confidence interval estimate to the difference of two volumes under two ROC surfaces. Our approach depends crucially on the assumptions of normal distributions on diagnostic tests, which might not be robust when such assumptions are violated. Finally, we apply our proposed methodology to a real data set of 118 subjects to compare the diagnostic accuracy of early stage Alzheimer's disease (AD) from multiple neuropsychological tests.  相似文献   

3.
Assessing the diagnostic accuracy of a sequence of tests   总被引:10,自引:0,他引:10  
We consider the assessment of the overall diagnostic accuracy of a sequence of tests (e.g. repeated screening tests). The complexity of diagnostic choices when two or more continuous tests are used in sequence is illustrated, and different approaches to reducing the dimensionality are presented and evaluated. For instance, in practice, when a single test is used repeatedly in routine screening, the same screening threshold is typically used at each screening visit. One possible alternative is to adjust the threshold at successive visits according to individual-specific characteristics. Such possibilities represent a particular slice of a receiver operating characteristic surface, corresponding to all possible combinations of test thresholds. We focus in the development and examples on the setting where an overall test is defined to be positive if any of the individual tests are positive ('believe the positive'). The ideas developed are illustrated by an example of application to screening for prostate cancer using prostate-specific antigen.  相似文献   

4.
Alonzo TA  Kittelson JM 《Biometrics》2006,62(2):605-612
The accuracy (sensitivity and specificity) of a new screening test can be compared with that of a standard test by applying both tests to a group of subjects in which disease status can be determined by a gold standard (GS) test. However, it is not always feasible to administer a GS test to all study subjects. For example, a study is planned to determine whether a new screening test for cervical cancer ("ThinPrep") is better than the standard test ("Pap"), and in this setting it is not feasible (or ethical) to determine disease status by biopsy in order to identify women with and without disease for participation in a study. When determination of disease status is not possible for all study subjects, the relative accuracy of two screening tests can still be estimated by using a paired screen-positive (PSP) design in which all subjects receive both screening tests, but only have the GS test if one of the screening tests is positive. Unfortunately in the cervical cancer example, the PSP design is also infeasible because it is not technically possible to administer both the ThinPrep and Pap at the same time. In this article, we describe a randomized paired screen-positive (RPSP) design in which subjects are randomized to receive one of the two screening tests initially, and only receive the other screening test and GS if the first screening test is positive. We derive maximum likelihood estimators and confidence intervals for the relative accuracy of the two screening tests, and assess the small sample behavior of these estimators using simulation studies. Sample size formulae are derived and applied to the cervical cancer screening trial example, and the efficiency of the RPSP design is compared with other designs.  相似文献   

5.
  1. Obtaining accurate estimates of disease prevalence is crucial for the monitoring and management of wildlife populations but can be difficult if different diagnostic tests yield conflicting results and if the accuracy of each diagnostic test is unknown. Bayesian latent class analysis (BLCA) modeling offers a potential solution, providing estimates of prevalence levels and diagnostic test accuracy under the realistic assumption that no diagnostic test is perfect.
  2. In typical applications of this approach, the specificity of one test is fixed at or close to 100%, allowing the model to simultaneously estimate the sensitivity and specificity of all other tests, in addition to infection prevalence. In wildlife systems, a test with near‐perfect specificity is not always available, so we simulated data to investigate how decreasing this fixed specificity value affects the accuracy of model estimates.
  3. We used simulations to explore how the trade‐off between diagnostic test specificity and sensitivity impacts prevalence estimates and found that directional biases depend on pathogen prevalence. Both the precision and accuracy of results depend on the sample size, the diagnostic tests used, and the true infection prevalence, so these factors should be considered when applying BLCA to estimate disease prevalence and diagnostic test accuracy in wildlife systems. A wildlife disease case study, focusing on leptospirosis in California sea lions, demonstrated the potential for Bayesian latent class methods to provide reliable estimates under real‐world conditions.
  4. We delineate conditions under which BLCA improves upon the results from a single diagnostic across a range of prevalence levels and sample sizes, demonstrating when this method is preferable for disease ecologists working in a wide variety of pathogen systems.
  相似文献   

6.
Leisenring W  Alonzo T  Pepe MS 《Biometrics》2000,56(2):345-351
Positive and negative predictive values of a diagnostic test are key clinically relevant measures of test accuracy. Surprisingly, statistical methods for comparing tests with regard to these parameters have not been available for the most common study design in which each test is applied to each study individual. In this paper, we propose a statistic for comparing the predictive values of two diagnostic tests using this paired study design. The proposed statistic is a score statistic derived from a marginal regression model and bears some relation to McNemar's statistic. As McNemar's statistic can be used to compare sensitivities and specificities of diagnostic tests, parameters that condition on disease status, our statistic can be considered as an analog of McNemar's test for the problem of comparing predictive values, parameters that condition on test outcome. We report on the results of a simulation study designed to examine the properties of this test under a variety of conditions. The method is illustrated with data from a study of methods for diagnosis of coronary artery disease.  相似文献   

7.
Liu D  Zhou XH 《Biometrics》2011,67(3):906-916
Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.  相似文献   

8.
Recent advancement in technology promises to yield a multitude of tests for disease diagnosis and prognosis. When there are multiple sources of information available, it is often of interest to construct a composite score that can provide better classification accuracy than any individual measurement. In this paper, we consider robust procedures for optimally combining tests when test results are measured prior to disease onset and disease status evolves over time. To account for censoring of disease onset time, the most commonly used approach to combining tests to detect subsequent disease status is to fit a proportional hazards model (Cox, 1972) and use the estimated risk score. However, simulation studies suggested that such a risk score may have poor accuracy when the proportional hazards assumption fails. We propose the use of a nonparametric transformation model (Han, 1987) as a working model to derive an optimal composite score with theoretical justification. We demonstrate that the proposed score is the optimal score when the model holds and is optimal "on average" among linear scores even if the model fails. Time-dependent sensitivity, specificity, and receiver operating characteristic curve functions are used to quantify the accuracy of the resulting composite score. We provide consistent and asymptotically Gaussian estimators of these accuracy measures. A simple model-free resampling procedure is proposed to obtain all consistent variance estimators. We illustrate the new proposals with simulation studies and an analysis of a breast cancer gene expression data set.  相似文献   

9.
Planning studies involving diagnostic tests is complicated by the fact that virtually no test provides perfectly accurate results. The misclassification induced by imperfect sensitivities and specificities of diagnostic tests must be taken into account, whether the primary goal of the study is to estimate the prevalence of a disease in a population or to investigate the properties of a new diagnostic test. Previous work on sample size requirements for estimating the prevalence of disease in the case of a single imperfect test showed very large discrepancies in size when compared to methods that assume a perfect test. In this article we extend these methods to include two conditionally independent imperfect tests, and apply several different criteria for Bayesian sample size determination to the design of such studies. We consider both disease prevalence studies and studies designed to estimate the sensitivity and specificity of diagnostic tests. As the problem is typically nonidentifiable, we investigate the limits on the accuracy of parameter estimation as the sample size approaches infinity. Through two examples from infectious diseases, we illustrate the changes in sample sizes that arise when two tests are applied to individuals in a study rather than a single test. Although smaller sample sizes are often found in the two-test situation, they can still be prohibitively large unless accurate information is available about the sensitivities and specificities of the tests being used.  相似文献   

10.
Zheng Y  Barlow WE  Cutter G 《Biometrics》2005,61(1):259-268
The performance of a medical diagnostic test is often evaluated by comparing the outcome of the test to the patient's true disease state. Receiver operating characteristic analysis may then be used to summarize test accuracy. However, such analysis may encounter several complications in actual practice. One complication is verification bias, i.e., gold standard assessment of disease status may only be partially available and the probability of ascertainment of disease may depend on both the test result and characteristics of the subject. A second issue is that tests interpreted by the same rater may not be independent. Using estimating equations, we generalize previous methods that address these problems. We contrast the performance of alternative estimators of accuracy using robust sandwich variance estimators to permit valid asymptotic inference. We suggest that in the context of an observational cohort study where rich covariate information is available, a weighted estimating equations approach may be preferable for its robustness against model misspecification. We apply the methodology to mammography as performed by community radiologists.  相似文献   

11.
Data with varying age at disease onset arise frequently in studies of mapping disease associated genes. Naively combining affected subjects with different ages at onset may result in a much reduced power in detecting the disease genes. In this paper we present a weighted score test statistic to detect the linkage between marker and latent disease loci using affected sibpairs, where the weight is used for assigning differential contribution due to the varying age at onset of each affected sibpair to the test statistic. We show that the weighted test has a correct type I error rate asymptotically. For an illustrative purpose, we analyze a data set from the 12th Genetic Analysis Workshop. The result shows that the weighted tests appear to be able to pinpoint the location of latent disease genes better than the mean IBD test with equal weight with respect to the age at onset. To avoid the potential power loss due to the improper weight, we propose to use a combined test statistic, taking the maximum of two tests, one that is weighted by the age-dependent penetrance function and the other that may be invariant to the age. We conduct an analytical study, comparing the combined test with weighted and equal weight with respect to age test. It shows that the combined test retains the most power of the better one of the two tests being combined.  相似文献   

12.
PurposeTo develop a real-time alignment monitoring system (RAMS) to compensate for the limitations of the conventional room-laser-based alignment system. To verify the feasibility of the RAMS, reproducibility and accuracy tests were conducted.MethodsRAMS was composed of a room laser sensing array (RLSA), an electric circuit, an analog-to-digital converter (ADC), and a control PC. The RLSA was designed to arrange photodiodes in a pattern that results in the RAMS having a resolution of 1 mm. The photodiodes were used for quantitative assessment of the alignment condition. To verify the usability of the developed system, we conducted tests of temporal reproducibility, repeatability, and accuracy.ResultsThe results of the temporal reproducibility test suggested that the signal of the RAMS was stable with respect to time. Further, the repeatability test resulted in a maximum coefficient of variance of 1.14%, suggesting that the signal of the RAMS was stable over repeated set-ups. The accuracy test confirmed that the “on” and “off” signals could be distinguished by signal intensity, considering that the “off” signal was below 75% of the “on” signal in every case. In addition, we confirmed that the system can detect 1 mm of movement by monitoring the pattern of the “on” and “off” signals.ConclusionWe developed a room laser based alignment monitoring system. The feasibility test verified that the system is capable of quantitative alignment monitoring in real time. We expect that the RAMS can propose the potential of the room laser based alignment monitoring method.  相似文献   

13.
Disease screening is a fundamental part of health care. To evaluate the accuracy of a new screening modality, ideally the results of the screening test are compared with those of a definitive diagnostic test in a set of study subjects. However, definitive diagnostic tests are often invasive and cannot be applied to subjects whose screening tests are negative for disease. For example, in cancer screening, the assessment of true disease status requires a biopsy sample, which for ethical reasons can only be obtained if a subject's screening test indicates presence of cancer. Although the absolute accuracy of screening tests cannot be evaluated in such circumstances, it is possible to compare the accuracies of screening tests. Specifically, using relative true positive rate (the ratio of the true positive rate of one test to another) and relative false positive rate (the ratio of the false positive rates of two tests) as measures of relative accuracy, we show that inference about relative accuracy can be made from such studies. Analogies with case-control studies can be drawn where inference about absolute risk cannot be made, but inference about relative risk can. In this paper, we develop a marginal regression analysis framework for making inference about relative accuracy when only screen positives are followed for true disease. In this context factors influencing the relative accuracies of tests can be evaluated. It is important to determine such factors in order to understand circumstances in which one test is preferable to another. The methods are applied to two cancer screening studies, one concerning the effect of race on screening for prostate cancer and the other concerning the effect of tumour grade on the detection of cervical cancer with cytology versus cervicography screening.  相似文献   

14.
The receiver operating characteristic curve is a popular tool to characterize the capabilities of diagnostic tests with continuous or ordinal responses. One common design for assessing the accuracy of diagnostic tests involves multiple readers and multiple tests, in which all readers read all test results from the same patients. This design is most commonly used in a radiology setting, where the results of diagnostic tests depend on a radiologist's subjective interpretation. The most widely used approach for analyzing data from such a study is the Dorfman-Berbaum-Metz (DBM) method (Dorfman et al., 1992) which utilizes a standard analysis of variance (ANOVA) model for the jackknife pseudovalues of the area under the ROC curves (AUCs). Although the DBM method has performed well in published simulation studies, there is no clear theoretical basis for this approach. In this paper, focusing on continuous outcomes, we investigate its theoretical basis. Our result indicates that the DBM method does not satisfy the regular assumptions for standard ANOVA models, and thus might lead to erroneous inference. We then propose a marginal model approach based on the AUCs which can adjust for covariates as well. Consistent and asymptotically normal estimators are derived for regression coefficients. We compare our approach with the DBM method via simulation and by an application to data from a breast cancer study. The simulation results show that both our method and the DBM method perform well when the accuracy of tests under the study is the same and that our method outperforms the DBM method for inference on individual AUCs when the accuracy of tests is not the same. The marginal model approach can be easily extended to ordinal outcomes.  相似文献   

15.
Summary Absence of a perfect reference test is an acknowledged source of bias in diagnostic studies. In the case of tuberculous pleuritis, standard reference tests such as smear microscopy, culture and biopsy have poor sensitivity. Yet meta‐analyses of new tests for this disease have always assumed the reference standard is perfect, leading to biased estimates of the new test’s accuracy. We describe a method for joint meta‐analysis of sensitivity and specificity of the diagnostic test under evaluation, while considering the imperfect nature of the reference standard. We use a Bayesian hierarchical model that takes into account within‐ and between‐study variability. We show how to obtain pooled estimates of sensitivity and specificity, and how to plot a hierarchical summary receiver operating characteristic curve. We describe extensions of the model to situations where multiple reference tests are used, and where index and reference tests are conditionally dependent. The performance of the model is evaluated using simulations and illustrated using data from a meta‐analysis of nucleic acid amplification tests (NAATs) for tuberculous pleuritis. The estimate of NAAT specificity was higher and the sensitivity lower compared to a model that assumed that the reference test was perfect.  相似文献   

16.
Summary In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have been criticized in the literature from both a conceptual and a robustness perspective. As an alternative, we propose an approach where we exploit an imperfect reference standard with unknown diagnostic accuracy and conduct sensitivity analysis by varying this accuracy over scientifically reasonable ranges. In this article, a latent class model with crossed random effects is proposed for estimating the diagnostic accuracy of regional obstetrics and gynaecological (OB/GYN) physicians in diagnosing endometriosis. To avoid the pitfalls of models without a gold standard, we exploit the diagnostic results of a group of OB/GYN physicians with an international reputation for the diagnosis of endometriosis. We construct an ordinal reference standard based on the discordance among these international experts and propose a mechanism for conducting sensitivity analysis relative to the unknown diagnostic accuracy among them. A Monte Carlo EM algorithm is proposed for parameter estimation and a BIC‐type model selection procedure is presented. Through simulations and data analysis we show that this new approach provides a useful alternative to traditional latent class modeling approaches used in this setting.  相似文献   

17.
A Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is designed to distinguish humans from machines. Most of the existing tests require reading distorted text embedded in a background image. However, many existing CAPTCHAs are either too difficult for humans due to excessive distortions or are trivial for automated algorithms to solve. These CAPTCHAs also suffer from inherent language as well as alphabet dependencies and are not equally convenient for people of different demographics. Therefore, there is a need to devise other Turing tests which can mitigate these challenges. One such test is matching two faces to establish if they belong to the same individual or not. Utilizing face recognition as the Turing test, we propose FR-CAPTCHA based on finding matching pairs of human faces in an image. We observe that, compared to existing implementations, FR-CAPTCHA achieves a human accuracy of 94% and is robust against automated attacks.  相似文献   

18.
The Exact Test for Cytonuclear Disequilibria   总被引:2,自引:0,他引:2       下载免费PDF全文
C. J. Basten  M. A. Asmussen 《Genetics》1997,146(3):1165-1171
We extend the analysis of the statistical properties of cytonuclear disequilibria in two major ways. First, we develop the asymptotic sampling theory for the nonrandom associations between the alleles at a haploid cytoplasmic locus and the alleles and genotypes at a diploid nuclear locus, when there are an arbitrary number of alleles at each marker. This includes the derivation of the maximum likelihood estimators and their sampling variances for each disequilibrium measure, together with simple tests of the null hypothesis of no disequilibrium. In addition to these new asymptotic tests, we provide the first implementation of Fisher's exact test for the genotypic cytonuclear disequilibria and some approximations of the exact test. We also outline an exact test for allelic cytonuclear disequilibria in multiallelic systems. An exact test should be used for data sets when either the marginal frequencies are extreme or the sample size is small. The utility of this new sampling theory is illustrated through applications to recent nuclear-mtDNA and nuclear-cpDNA data sets. The results also apply to population surveys of nuclear loci in conjunction with markers in cytoplasmically inherited microorganisms.  相似文献   

19.
Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

20.
Recently, Brown , Hwang , and Munk (1998) proposed and unbiased test for the average equivalence problem which improves noticeably in power on the standard two one‐sided tests procedure. Nevertheless, from a practical point of view there are some objections against the use of this test which are mainly adressed to the ‘unusual’ shape of the critical region. We show that every unbiased test has a critical region with such an ‘unusual’ shape. Therefore, we discuss three (biased) modifications of the unbiased test. We come to the conclusion that a suitable modification represents a good compromise between a most powerful test and a test with an appealing shape of its critical region. In order to perform these tests figures are given containing the rejection region. Finally, we compare all tests in an example from neurophysiology. This shows that it is beneficial to use these improved tests instead of the two one‐sided tests procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号