首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Alonzo TA  Kittelson JM 《Biometrics》2006,62(2):605-612
The accuracy (sensitivity and specificity) of a new screening test can be compared with that of a standard test by applying both tests to a group of subjects in which disease status can be determined by a gold standard (GS) test. However, it is not always feasible to administer a GS test to all study subjects. For example, a study is planned to determine whether a new screening test for cervical cancer ("ThinPrep") is better than the standard test ("Pap"), and in this setting it is not feasible (or ethical) to determine disease status by biopsy in order to identify women with and without disease for participation in a study. When determination of disease status is not possible for all study subjects, the relative accuracy of two screening tests can still be estimated by using a paired screen-positive (PSP) design in which all subjects receive both screening tests, but only have the GS test if one of the screening tests is positive. Unfortunately in the cervical cancer example, the PSP design is also infeasible because it is not technically possible to administer both the ThinPrep and Pap at the same time. In this article, we describe a randomized paired screen-positive (RPSP) design in which subjects are randomized to receive one of the two screening tests initially, and only receive the other screening test and GS if the first screening test is positive. We derive maximum likelihood estimators and confidence intervals for the relative accuracy of the two screening tests, and assess the small sample behavior of these estimators using simulation studies. Sample size formulae are derived and applied to the cervical cancer screening trial example, and the efficiency of the RPSP design is compared with other designs.  相似文献   

2.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.  相似文献   

3.
Clinical trials with Poisson distributed count data as the primary outcome are common in various medical areas such as relapse counts in multiple sclerosis trials or the number of attacks in trials for the treatment of migraine. In this article, we present approximate sample size formulae for testing noninferiority using asymptotic tests which are based on restricted or unrestricted maximum likelihood estimators of the Poisson rates. The Poisson outcomes are allowed to be observed for unequal follow‐up schemes, and both the situations that the noninferiority margin is expressed in terms of the difference and the ratio are considered. The exact type I error rates and powers of these tests are evaluated and the accuracy of the approximate sample size formulae is examined. The test statistic using the restricted maximum likelihood estimators (for the difference test problem) and the test statistic that is based on the logarithmic transformation and employs the maximum likelihood estimators (for the ratio test problem) show favorable type I error control and can be recommended for practical application. The approximate sample size formulae show high accuracy even for small sample sizes and provide power values identical or close to the aspired ones. The methods are illustrated by a clinical trial example from anesthesia.  相似文献   

4.
Disease prevalence is ideally estimated using a 'gold standard' to ascertain true disease status on all subjects in a population of interest. In practice, however, the gold standard may be too costly or invasive to be applied to all subjects, in which case a two-phase design is often employed. Phase 1 data consisting of inexpensive and non-invasive screening tests on all study subjects are used to determine the subjects that receive the gold standard in the second phase. Naive estimates of prevalence in two-phase studies can be biased (verification bias). Imputation and re-weighting estimators are often used to avoid this bias. We contrast the forms and attributes of the various prevalence estimators. Distribution theory and simulation studies are used to investigate their bias and efficiency. We conclude that the semiparametric efficient approach is the preferred method for prevalence estimation in two-phase studies. It is more robust and comparable in its efficiency to imputation and other re-weighting estimators. It is also easy to implement. We use this approach to examine the prevalence of depression in adolescents with data from the Great Smoky Mountain Study.  相似文献   

5.
Recent advancement in technology promises to yield a multitude of tests for disease diagnosis and prognosis. When there are multiple sources of information available, it is often of interest to construct a composite score that can provide better classification accuracy than any individual measurement. In this paper, we consider robust procedures for optimally combining tests when test results are measured prior to disease onset and disease status evolves over time. To account for censoring of disease onset time, the most commonly used approach to combining tests to detect subsequent disease status is to fit a proportional hazards model (Cox, 1972) and use the estimated risk score. However, simulation studies suggested that such a risk score may have poor accuracy when the proportional hazards assumption fails. We propose the use of a nonparametric transformation model (Han, 1987) as a working model to derive an optimal composite score with theoretical justification. We demonstrate that the proposed score is the optimal score when the model holds and is optimal "on average" among linear scores even if the model fails. Time-dependent sensitivity, specificity, and receiver operating characteristic curve functions are used to quantify the accuracy of the resulting composite score. We provide consistent and asymptotically Gaussian estimators of these accuracy measures. A simple model-free resampling procedure is proposed to obtain all consistent variance estimators. We illustrate the new proposals with simulation studies and an analysis of a breast cancer gene expression data set.  相似文献   

6.
S Chen  C Cox 《Biometrics》1992,48(2):593-598
We consider a regression to the mean problem with a very large sample for the first measurement and relatively small subsample for the second measurement, selected on the basis of the initial measurement. This is a situation that often occurs in screening trials. We propose to estimate the unselected population mean and variance from the first measurement in the larger sample. Using these estimates, the correlation between the two measurements, as well as an effect of treatment, can be estimated in simple and explicit form. Under the condition that the size of the subsample is of a smaller order, the new estimators for all the four parameters are as asymptotically efficient as the usual maximum likelihood estimators. Tests based on this new approach are also discussed. An illustration from a cholesterol screening study is included.  相似文献   

7.
Prospective studies of diagnostic test accuracy have important advantages over retrospective designs. Yet, when the disease being detected by the diagnostic test(s) has a low prevalence rate, a prospective design can require an enormous sample of patients. We consider two strategies to reduce the costs of prospective studies of binary diagnostic tests: stratification and two-phase sampling. Utilizing neither, one, or both of these strategies provides us with four study design options: (1) the conventional design involving a simple random sample (SRS) of patients from the clinical population; (2) a stratified design where patients from higher-prevalence subpopulations are more heavily sampled; (3) a simple two-phase design using a SRS in the first phase and selection for the second phase based on the test results from the first; and (4) a two-phase design with stratification in the first phase. We describe estimators for sensitivity and specificity and their variances for each design, along with sample size estimation. We offer some recommendations for choosing among the various designs. We illustrate the study designs with two examples.  相似文献   

8.
As the global burden of mental illness is estimated to become a severe issue in the near future, it demands the development of more effective treatments. Most psychiatric diseases are moderately to highly heritable and believed to involve many genes. Development of new treatment options demands more knowledge on the molecular basis of psychiatric diseases. Toward this end, we propose to develop new statistical methods with improved sensitivity and accuracy to identify disease‐related genes specialized for psychiatric diseases. The qualitative psychiatric diagnoses such as case control often suffer from high rates of misdiagnosis and oversimplify the disease phenotypes. Our proposed method utilizes endophenotypes, the quantitative traits hypothesized to underlie disease syndromes, to better characterize the heterogeneous phenotypes of psychiatric diseases. We employ the structural equation modeling using the liability‐index model to link multiple genetically regulated expressions from PrediXcan and the manifest variables including endophenotypes and case‐control status. The proposed method can be considered as a general method for multivariate regression, which is particularly helpful for psychiatric diseases. We derive penalized retrospective likelihood estimators to deal with the typical small sample size issue. Simulation results demonstrate the advantages of the proposed method and the real data analysis of Alzheimer's disease illustrates the practical utility of the techniques. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative database.  相似文献   

9.
Inverse Adaptive Cluster Sampling   总被引:3,自引:0,他引:3  
Consider a population in which the variable of interest tends to be at or near zero for many of the population units but a subgroup exhibits values distinctly different from zero. Such a population can be described as rare in the sense that the proportion of elements having nonzero values is very small. Obtaining an estimate of a population parameter such as the mean or total that is nonzero is difficult under classical fixed sample-size designs since there is a reasonable probability that a fixed sample size will yield all zeroes. We consider inverse sampling designs that use stopping rules based on the number of rare units observed in the sample. We look at two stopping rules in detail and derive unbiased estimators of the population total. The estimators do not rely on knowing what proportion of the population exhibit the rare trait but instead use an estimated value. Hence, the estimators are similar to those developed for poststratification sampling designs. We also incorporate adaptive cluster sampling into the sampling design to allow for the case where the rare elements tend to cluster within the population in some manner. The formulas for the variances of the estimators do not allow direct analytic comparison of the efficiency of the various designs and stopping rules, so we provide the results of a small simulation study to obtain some insight into the differences among the stopping rules and sampling approaches. The results indicate that a modified stopping rule that incorporates an adaptive sampling component and utilizes an initial random sample of fixed size is the best in the sense of having the smallest variance.  相似文献   

10.
Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease with respect to a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using age as a covariate is based on a dichotomous outcome and does not efficiently use such age-at-onset (time-to-event) information. We propose to analyze age-at-onset data using a modified case-cohort method by treating the control group as an approximation of a subcohort assuming rare events. We investigate the asymptotic bias of this approximation and show that the asymptotic bias of the proposed estimator is small when the disease rate is low. We evaluate the finite sample performance of the proposed method through a simulation study and illustrate the method using a breast cancer case-control data set.  相似文献   

11.

Background

The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy.

Methodology/Principal Findings

We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability.

Conclusions/Significance

This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.  相似文献   

12.
Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under‐ or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re‐assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one‐sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups.  相似文献   

13.
This paper addresses optimal design and efficiency of two-phase (2P) case-control studies in which the first phase uses an error-prone exposure measure, Z, while the second phase measures true, dichotomous exposure, X, in a subset of subjects. Optimal design of a separate second phase, to be added to a preexisting study, is also investigated. Differential misclassification is assumed throughout. Results are also applicable to 2P cohort studies with error-prone and error-free measures of disease status but error-free exposure measures. While software based on the mean score method of Reilly and Pepe (1995, Biometrika 82, 299--314) can find optimal designs given pilot data, the lack of simple formulae makes it difficult to generalize about efficiency compared to one-phase (1P) studies based on X alone. Here, formulae for the optimal ratios of cases to controls and first- to second-phase sizes, and the optimal second-phase stratified sampling fractions, given a fixed budget, are given. The maximum efficiency of 2P designs compared to a 1P design is deduced and is shown to be bounded from above by a function of the sensitivities and specificities of Z. The efficiency of 'balanced' separate second-phase designs (Breslow and Cain, 1988, Biometrika 75, 11--20)-in which equal numbers of subjects are chosen from each first-phase strata-compared to optimal design is deduced, enabling situations where balanced designs are nearly optimal to be identified.  相似文献   

14.
Hamada M  Kiryu H  Iwasaki W  Asai K 《PloS one》2011,6(2):e16450
In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.  相似文献   

15.
In this article, we compare Wald-type, logarithmic transformation, and Fieller-type statistics for the classical 2-sided equivalence testing of the rate ratio under matched-pair designs with a binary end point. These statistics can be implemented through sample-based, constrained least squares estimation and constrained maximum likelihood (CML) estimation methods. Sample size formulae based on the CML estimation method are developed. We consider formulae that control a prespecified power or confidence width. Our simulation studies show that statistics based on the CML estimation method generally outperform other statistics and methods with respect to actual type I error rate and average width of confidence intervals. Also, the corresponding sample size formulae are valid asymptotically in the sense that the exact power and actual coverage probability for the estimated sample size are generally close to their prespecified values. The methods are illustrated with a real example from a clinical laboratory study.  相似文献   

16.
Sensitivity and specificity are common measures used to evaluate the performance of a diagnostic test. A diagnostic test is often administrated at a subunit level, e.g. at the level of vessel, ear or eye of a patient so that the treatment can be targeted at the specific subunit. Therefore, it is essential to evaluate the diagnostic test at the subunit level. Often patients with more negative subunit test results are less likely to receive the gold standard tests than patients with more positive subunit test results. To account for this type of missing data and correlation between subunit test results, we proposed a weighted generalized estimating equations (WGEE) approach to evaluate subunit sensitivities and specificities. A simulation study was conducted to evaluate the performance of the WGEE estimators and the weighted least squares (WLS) estimators (Barnhart and Kosinski, 2003) under a missing at random assumption. The results suggested that WGEE estimator is consistent under various scenarios of percentage of missing data and sample size, while the WLS approach could yield biased estimators due to a misspecified missing data mechanism. We illustrate the methodology with a cardiology example.  相似文献   

17.
Summary .  Genomewide association studies attempting to unravel the genetic etiology of complex traits have recently gained attention. Frequently, these studies employ a sequential genotyping strategy: A large panel of markers is examined in a subsample of subjects, and the most promising markers are genotyped in the remaining subjects. In this article, we introduce a novel method for such designs enabling investigators to, for example, modify marker densities and sample proportions while strongly controlling the family-wise type I error rate. Loss of efficiency is avoided by redistributing conditional type I error rates of discarded markers. Our approach can be combined with cost optimal designs and entails a greater flexibility than all previously suggested designs. Among other features, it allows for marker selections based upon biological criteria instead of statistical criteria alone, or the option to modify the sample size at any time during the course of the project. For practical applicability, we develop a new algorithm, subsequently evaluate it by simulations, and illustrate it using a real data set.  相似文献   

18.
Summary .   We propose robust and efficient tests and estimators for gene–environment/gene–drug interactions in family-based association studies in which haplotypes, dichotomous/quantitative phenotypes, and complex exposure/treatment variables are analyzed. Using causal inference methodology, we show that the tests and estimators are robust against unmeasured confounding due to population admixture and stratification, provided that Mendel's law of segregation holds and that the considered exposure/treatment variable is not affected by the candidate gene under study. We illustrate the practical relevance of our approach by an application to a chronic obstructive pulmonary disease study. The data analysis suggests a gene–environment interaction between a single nucleotide polymorphism in the Serpine2 gene and smoking status/pack-years of smoking. Simulation studies show that the proposed methodology is sufficiently powered for realistic sample sizes and that it provides valid tests and effect size estimators in the presence of admixture and stratification.  相似文献   

19.
In cancer clinical proteomics, MALDI and SELDI profiling are used to search for biomarkers of potentially curable early-stage disease. A given number of samples must be analysed in order to detect clinically relevant differences between cancers and controls, with adequate statistical power. From clinical proteomic profiling studies, expression data for each peak (protein or peptide) from two or more clinically defined groups of subjects are typically available. Typically, both exposure and confounder information on each subject are also available, and usually the samples are not from randomized subjects. Moreover, the data is usually available in replicate. At the design stage, however, covariates are not typically available and are often ignored in sample size calculations. This leads to the use of insufficient numbers of samples and reduced power when there are imbalances in the numbers of subjects between different phenotypic groups. A method is proposed for accommodating information on covariates, data imbalances and design-characteristics, such as the technical replication and the observational nature of these studies, in sample size calculations. It assumes knowledge of a joint distribution for the protein expression values and the covariates. When discretized covariates are considered, the effect of the covariates enters the calculations as a function of the proportions of subjects with specific attributes. This makes it relatively straightforward (even when pilot data on subject covariates is unavailable) to specify and to adjust for the effect of the expected heterogeneities. The new method suggests certain experimental designs which lead to the use of a smaller number of samples when planning a study. Analysis of data from the proteomic profiling of colorectal cancer reveals that fewer samples are needed when a study is balanced than when it is unbalanced, and when the IMAC30 chip-type is used. The method is implemented in the clippda package and is available in R at: http://www.bioconductor.org/help/bioc-views/release/bioc/html/clippda.html.  相似文献   

20.
Abstract Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号