首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Holcroft CA  Spiegelman D 《Biometrics》1999,55(4):1193-1201
We compared several validation study designs for estimating the odds ratio of disease with misclassified exposure. We assumed that the outcome and misclassified binary covariate are available and that the error-free binary covariate is measured in a subsample, the validation sample. We considered designs in which the total size of the validation sample is fixed and the probability of selection into the validation sample may depend on outcome and misclassified covariate values. Design comparisons were conducted for rare and common disease scenarios, where the optimal design is the one that minimizes the variance of the maximum likelihood estimator of the true log odds ratio relating the outcome to the exposure of interest. Misclassification rates were assumed to be independent of the outcome. We used a sensitivity analysis to assess the effect of misspecifying the misclassification rates. Under the scenarios considered, our results suggested that a balanced design, which allocates equal numbers of validation subjects into each of the four outcome/mismeasured covariate categories, is preferable for its simplicity and good performance. A user-friendly Fortran program is available from the second author, which calculates the optimal sampling fractions for all designs considered and the efficiencies of these designs relative to the optimal hybrid design for any scenario of interest.  相似文献   

2.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

3.
Summary The two‐stage case–control design has been widely used in epidemiology studies for its cost‐effectiveness and improvement of the study efficiency ( White, 1982 , American Journal of Epidemiology 115, 119–128; Breslow and Cain, 1988 , Biometrika 75, 11–20). The evolution of modern biomedical studies has called for cost‐effective designs with a continuous outcome and exposure variables. In this article, we propose a new two‐stage outcome‐dependent sampling (ODS) scheme with a continuous outcome variable, where both the first‐stage data and the second‐stage data are from ODS schemes. We develop a semiparametric empirical likelihood estimation for inference about the regression parameters in the proposed design. Simulation studies were conducted to investigate the small‐sample behavior of the proposed estimator. We demonstrate that, for a given statistical power, the proposed design will require a substantially smaller sample size than the alternative designs. The proposed method is illustrated with an environmental health study conducted at National Institutes of Health.  相似文献   

4.
Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor.  相似文献   

5.
One-stage and two-stage closed form estimators of latent cell frequencies in multidimensional contingency tables are derived from the weighted least squares criterion. The first stage estimator is asymptotically equivalent to the conditional maximum likelihood estimator and does not necessarily have minimum asymptotic variance. The second stage estimator does have minimum asymptotic variance relative to any other existing estimator. The closed form estimators are defined for any number of latent cells in contingency tables of any order under exact general linear constraints on the logarithms of the nonlatent and latent cell frequencies.  相似文献   

6.
Two-stage designs for experiments with a large number of hypotheses   总被引:1,自引:0,他引:1  
MOTIVATION: When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the 'promising' hypotheses which are further investigated at the second stage with an increased sample size. A multiple test procedure based on sequential individual P-values is proposed to control the FDR for the case of independent normal distributions with known variance. RESULTS: The power of optimal two-stage designs is impressively larger than the power of the corresponding single-stage design with equal costs. Extensions to the case of unknown variances and correlated test statistics are investigated by simulations. Moreover, it is shown that the simple multiple test procedure using first stage data for screening purposes and deriving the test decisions only from second stage data is a very powerful option.  相似文献   

7.
Polley MY  Cheung YK 《Biometrics》2008,64(1):232-241
Summary.   We deal with the design problem of early phase dose-finding clinical trials with monotone biologic endpoints, such as biological measurements, laboratory values of serum level, and gene expression. A specific objective of this type of trial is to identify the minimum dose that exhibits adequate drug activity and shifts the mean of the endpoint from a zero dose to the so-called minimum effective dose. Stepwise test procedures for dose finding have been well studied in the context of nonhuman studies where the sampling plan is done in one stage. In this article, we extend the notion of stepwise testing to a two-stage enrollment plan in an attempt to reduce the potential sample size requirement by shutting down unpromising doses in a futility interim. In particular, we examine four two-stage designs and apply them to design a statin trial with four doses and a placebo in patients with Hodgkin's disease. We discuss the calibration of the design parameters and the implementation of these proposed methods. In the context of the statin trial, a calibrated two-stage design can reduce the average total sample size up to 38% (from 125 to 78) from a one-stage step-down test, while maintaining comparable error rates and probability of correct selection. The price for the reduction in the average sample size is the slight increase in the maximum total sample size from 125 to 130.  相似文献   

8.
W W Hauck 《Biometrics》1984,40(4):1117-1123
The finite-sample properties of various point estimators of a common odds ratio from multiple 2 X 2 tables have been considered in a number of simulation studies. However, the conditional maximum likelihood estimator has received only limited attention. That omission is partially rectified here for cases of relatively small numbers of tables and moderate to large within-table sample sizes. The conditional maximum likelihood estimator is found to be superior to the unconditional maximum likelihood estimator, and equal or superior to the Mantel-Haenszel estimator in both bias and precision.  相似文献   

9.
Optimal sampling in retrospective logistic regression via two-stage method   总被引:1,自引:0,他引:1  
Case-control sampling is popular in epidemiological research because of its cost and time saving. In a logistic regression model, with limited knowledge on the covariance matrix of the point estimator of the regression coefficients a priori, there exists no fixed sample size analysis. In this study, we propose a two-stage sequential analysis, in which the optimal sample fraction and the required sample size to achieve a predetermined volume of a joint confidence set are estimated in an interim analysis. Additionally required observations are collected in the second stage according to the estimated optimal sample fraction. At the end of the experiment, data from these two stages are combined and analyzed for statistical inference. Simulation studies are conducted to justify the proposed two-stage procedure and an example is presented for illustration. It is found that the proposed two-stage procedure performs adequately in the sense that the resultant joint confidence set has a well-controlled volume and achieves the required coverage probability. Furthermore, the optimal sample fractions among all the selected scenarios are close to one. Hence, the proposed procedure can be simplified by always considering a balance design.  相似文献   

10.

Aims

The fitting of statistical distributions to microbial sampling data is a common application in quantitative microbiology and risk assessment applications. An underlying assumption of most fitting techniques is that data are collected with simple random sampling, which is often times not the case. This study develops a weighted maximum likelihood estimation framework that is appropriate for microbiological samples that are collected with unequal probabilities of selection.

Methods and Results

A weighted maximum likelihood estimation framework is proposed for microbiological samples that are collected with unequal probabilities of selection. Two examples, based on the collection of food samples during processing, are provided to demonstrate the method and highlight the magnitude of biases in the maximum likelihood estimator when data are inappropriately treated as a simple random sample.

Conclusions

Failure to properly weight samples to account for how data are collected can introduce substantial biases into inferences drawn from the data.

Significance and Impact of the Study

The proposed methodology will reduce or eliminate an important source of bias in inferences drawn from the analysis of microbial data. This will also make comparisons between studies and the combination of results from different studies more reliable, which is important for risk assessment applications.  相似文献   

11.
Obtaining accurate estimates of diversity indices is difficult because the number of species encountered in a sample increases with sampling intensity. We introduce a novel method that requires that the presence of species in a sample to be assessed while the counts of the number of individuals per species are only required for just a small part of the sample. To account for species included as incidence data in the species abundance distribution, we modify the likelihood function of the classical Poisson log-normal distribution. Using simulated community assemblages, we contrast diversity estimates based on a community sample, a subsample randomly extracted from the community sample, and a mixture sample where incidence data are added to a subsample. We show that the mixture sampling approach provides more accurate estimates than the subsample and at little extra cost. Diversity indices estimated from a freshwater zooplankton community sampled using the mixture approach show the same pattern of results as the simulation study. Our method efficiently increases the accuracy of diversity estimates and comprehension of the left tail of the species abundance distribution. We show how to choose the scale of sample size needed for a compromise between information gained, accuracy of the estimates and cost expended when assessing biological diversity. The sample size estimates are obtained from key community characteristics, such as the expected number of species in the community, the expected number of individuals in a sample and the evenness of the community.  相似文献   

12.
Outcome-dependent sampling designs have been shown to be a cost-effectiveway to enhance study efficiency. We show that the outcome-dependentsampling design with a continuous outcome can be viewed as anextension of the two-stage case-control designs to the continuous-outcomecase. We further show that the two-stage outcome-dependent samplinghas a natural link with the missing-data and biased-samplingframeworks. Through the use of semiparametric inference andmissing-data techniques, we show that a certain semiparametricmaximum-likelihood estimator is computationally convenient andachieves the semiparametric efficient information bound. Wedemonstrate this both theoretically and through simulation.  相似文献   

13.
There is an increasing interest in the use of two-stage case-control studies to reduce genotyping costs in the search for genes underlying common disorders. Instead of analyzing the data from the second stage separately, a more powerful test can be performed by combining the data from both stages. However, standard tests cannot be used because only the markers that are significant in the first stage are selected for the second stage and the test statistics at both stages are dependent because they partly involve the same data. Theoretical approximations are not available for commonly used test statistics and in this specific context simulations can be problematic because of the computational burden. We therefore derived a cost-effective, that is, accurate but fast in terms of central processing unit (CPU) time, approximation for the distribution of Pearson's statistic on 2 xm contingency tables in two-stage design with combined data. We included this approximation in an iterative method for designing optimal two-stage studies. Simulations supported the accuracy of our approximation. Numerical results confirmed that the use of two-stage designs reduces the genotyping burden substantially. Compared to not combining data, combining the data decreases the required sample sizes on average by 15% and the genotyping burden by 5%.  相似文献   

14.
Inference after two‐stage single‐arm designs with binary endpoint is challenging due to the nonunique ordering of the sampling space in multistage designs. We illustrate the problem of specifying test‐compatible confidence intervals for designs with nonconstant second‐stage sample size and present two approaches that guarantee confidence intervals consistent with the test decision. Firstly, we extend the well‐known Clopper–Pearson approach of inverting a family of two‐sided hypothesis tests from the group‐sequential case to designs with fully adaptive sample size. Test compatibility is achieved by using a sample space ordering that is derived from a test‐compatible estimator. The resulting confidence intervals tend to be conservative but assure the nominal coverage probability. In order to assess the possibility of further improving these confidence intervals, we pursue a direct optimization approach minimizing the mean width of the confidence intervals. While the latter approach produces more stable coverage probabilities, it is also slightly anti‐conservative and yields only negligible improvements in mean width. We conclude that the Clopper–Pearson‐type confidence intervals based on a test‐compatible estimator are the best choice if the nominal coverage probability is not to be undershot and compatibility of test decision and confidence interval is to be preserved.  相似文献   

15.
Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.  相似文献   

16.
Matrix models are widely used in biology to predict the temporal evolution of stage-structured populations. One issue related to matrix models that is often disregarded is the sampling variability. As the sample used to estimate the vital rates of the models are of finite size, a sampling error is attached to parameter estimation, which has in turn repercussions on all the predictions of the model. In this study, we address the question of building confidence bounds around the predictions of matrix models due to sampling variability. We focus on a density-dependent Usher model, the maximum likelihood estimator of parameters, and the predicted stationary stage vector. The asymptotic distribution of the stationary stage vector is specified, assuming that the parameters of the model remain in a set of the parameter space where the model admits one unique equilibrium point. Tests for density-dependence are also incidentally provided. The model is applied to a tropical rain forest in French Guiana.  相似文献   

17.
In this article, we propose a two-stage approach to modeling multilevel clustered non-Gaussian data with sufficiently large numbers of continuous measures per cluster. Such data are common in biological and medical studies utilizing monitoring or image-processing equipment. We consider a general class of hierarchical models that generalizes the model in the global two-stage (GTS) method for nonlinear mixed effects models by using any square-root-n-consistent and asymptotically normal estimators from stage 1 as pseudodata in the stage 2 model, and by extending the stage 2 model to accommodate random effects from multiple levels of clustering. The second-stage model is a standard linear mixed effects model with normal random effects, but the cluster-specific distributions, conditional on random effects, can be non-Gaussian. This methodology provides a flexible framework for modeling not only a location parameter but also other characteristics of conditional distributions that may be of specific interest. For estimation of the population parameters, we propose a conditional restricted maximum likelihood (CREML) approach and establish the asymptotic properties of the CREML estimators. The proposed general approach is illustrated using quartiles as cluster-specific parameters estimated in the first stage, and applied to the data example from a collagen fibril development study. We demonstrate using simulations that in samples with small numbers of independent clusters, the CREML estimators may perform better than conditional maximum likelihood estimators, which are a direct extension of the estimators from the GTS method.  相似文献   

18.
Methods for the analysis of unmatched case-control data based on a finite population sampling model are developed. Under this model, and the prospective logistic model for disease probabilities, a likelihood for case-control data that accommodates very general sampling of controls is derived. This likelihood has the form of a weighted conditional logistic likelihood. The flexibility of the methods is illustrated by providing a number of control sampling designs and a general scheme for their analyses. These include frequency matching, counter-matching, case-base, randomized recruitment, and quota sampling. A study of risk factors for childhood asthma illustrates an application of the counter-matching design. Some asymptotic efficiency results are presented and computational methods discussed. Further, it is shown that a 'marginal' likelihood provides a link to unconditional logistic methods. The methods are examined in a simulation study that compares frequency and counter-matching using conditional and unconditional logistic analyses and indicate that the conditional logistic likelihood has superior efficiency. Extensions that accommodate sampling of cases and multistage designs are presented. Finally, we compare the analysis methods presented here to other approaches, compare counter-matching and two-stage designs, and suggest areas for further research.To whom correspondence should be addressed.  相似文献   

19.
We have developed an approximate maximum likelihood framework for the problem of estimating the selection coefficients in a simple fertility selection model via random union of zygotes. We consider a sampling scheme where a random sample from each (discrete) generation of a population observed over several generations is collected and genotyped based on one nuclear locus and a cytonuclear locus, simultaneously. Simulation results show excellent small sample performance of the resulting approximate MLE. Asymptotic variance‐covariance matrix of our estimator is also obtained. We further show that these estimates can be used to obtain simple test statistics for testing various types of selection hypotheses including a test of neutrality.  相似文献   

20.
Two-stage design is a well-known cost-effective way for conducting biomedical studies when the exposure variable is expensive or difficult to measure. Recent research development further allowed one or both stages of the two-stage design to be outcome dependent on a continuous outcome variable. This outcome-dependent sampling feature enables further efficiency gain in parameter estimation and overall cost reduction of the study (e.g. Wang, X. and Zhou, H., 2010. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66, 502-511; Zhou, H., Song, R., Wu, Y. and Qin, J., 2011. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202). In this paper, we develop a semiparametric mixed effect regression model for data from a two-stage design where the second-stage data are sampled with an outcome-auxiliary-dependent sample (OADS) scheme. Our method allows the cluster- or center-effects of the study subjects to be accounted for. We propose an estimated likelihood function to estimate the regression parameters. Simulation study indicates that greater study efficiency gains can be achieved under the proposed two-stage OADS design with center-effects when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a dataset from the Collaborative Perinatal Project.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号