期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Statistical Inference for a Two‐Stage Outcome‐Dependent Sampling Design with a Continuous Outcome

Haibo Zhou Rui Song Yuanshan Wu Jing Qin 《Biometrics》2011,67(1):194-202

Summary The two‐stage case–control design has been widely used in epidemiology studies for its cost‐effectiveness and improvement of the study efficiency ( White, 1982 , American Journal of Epidemiology 115, 119–128; Breslow and Cain, 1988 , Biometrika 75, 11–20). The evolution of modern biomedical studies has called for cost‐effective designs with a continuous outcome and exposure variables. In this article, we propose a new two‐stage outcome‐dependent sampling (ODS) scheme with a continuous outcome variable, where both the first‐stage data and the second‐stage data are from ODS schemes. We develop a semiparametric empirical likelihood estimation for inference about the regression parameters in the proposed design. Simulation studies were conducted to investigate the small‐sample behavior of the proposed estimator. We demonstrate that, for a given statistical power, the proposed design will require a substantially smaller sample size than the alternative designs. The proposed method is illustrated with an environmental health study conducted at National Institutes of Health. 相似文献

2.

A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children

Zhou H Qin G Longnecker MP 《Biometrics》2011,67(3):876-885

Outcome-dependent sampling (ODS) has been widely used in biomedical studies because it is a cost-effective way to improve study efficiency. However, in the setting of a continuous outcome, the representation of the exposure variable has been limited to the framework of linear models, due to the challenge in terms of both theory and computation. Partial linear models (PLM) are a powerful inference tool to nonparametrically model the relation between an outcome and the exposure variable. In this article, we consider a case study of a PLM for data from an ODS design. We propose a semiparametric maximum likelihood method to make inferences with a PLM. We develop the asymptotic properties and conduct simulation studies to show that the proposed ODS estimator can produce a more efficient estimate than that from a traditional simple random sampling design with the same sample size. Using this newly developed method, we were able to explore an open question in epidemiology: whether in utero exposure to background levels of polychlorinated biphenyls (PCBs) is associated with children's intellectual impairment. Our model provides further insights into the relation between low-level PCB exposure and children's cognitive function. The results shed new light on a body of inconsistent epidemiologic findings. 相似文献

3.

A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome

Song Rui; Zhou Haibo; Kosorok Michael R. 《Biometrika》2009,96(1):221-228

Outcome-dependent sampling designs have been shown to be a cost-effectiveway to enhance study efficiency. We show that the outcome-dependentsampling design with a continuous outcome can be viewed as anextension of the two-stage case-control designs to the continuous-outcomecase. We further show that the two-stage outcome-dependent samplinghas a natural link with the missing-data and biased-samplingframeworks. Through the use of semiparametric inference andmissing-data techniques, we show that a certain semiparametricmaximum-likelihood estimator is computationally convenient andachieves the semiparametric efficient information bound. Wedemonstrate this both theoretically and through simulation. 相似文献

4.

A semiparametric empirical likelihood method for biased sampling schemes with auxiliary covariates

Wang X Zhou H 《Biometrics》2006,62(4):1149-1160

We consider a semiparametric inference procedure for data from epidemiologic studies conducted with a two-component sampling scheme where both a simple random sample and multiple outcome- or outcome-/auxiliary-dependent samples are observed. This sampling scheme allows the investigators to oversample certain subpopulations believed to have more information about the regression model while still gaining insights about the underlying population through the simple random sample. We focus on settings where there is no additional information about the parent cohort and the sampling probability is nonidentifiable. We motivate our problem with an ongoing study to assess the association between the mutation level of epidermal growth factor receptor (EGFR) and the antitumor response to EGFR-targeted therapy among nonsmall cell lung cancer patients. The proposed method applies to both binary and multicategorical outcome data and allows an arbitrary link function in the framework of generalized linear models. Simulation studies show that the proposed estimator has nice small sample properties. The proposed method is illustrated with a data example. 相似文献

5.

Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome

Zhou H Wu Y Liu Y Cai J 《Biostatistics (Oxford, England)》2011,12(3):521-534

Two-stage design has long been recognized to be a cost-effective way for conducting biomedical studies. In many trials, auxiliary covariate information may also be available, and it is of interest to exploit these auxiliary data to improve the efficiency of inferences. In this paper, we propose a 2-stage design with continuous outcome where the second-stage data is sampled with an "outcome-auxiliary-dependent sampling" (OADS) scheme. We propose an estimator which is the maximizer for an estimated likelihood function. We show that the proposed estimator is consistent and asymptotically normally distributed. The simulation study indicates that greater study efficiency gains can be achieved under the proposed 2-stage OADS design by utilizing the auxiliary covariate information when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a data set from an environmental epidemiologic study. 相似文献

6.

Partial linear inference for a 2-stage outcome-dependent sampling design with a continuous outcome

Qin G Zhou H 《Biostatistics (Oxford, England)》2011,12(3):506-520

The outcome-dependent sampling (ODS) design, which allows observation of exposure variable to depend on the outcome, has been shown to be cost efficient. In this article, we propose a new statistical inference method, an estimated penalized likelihood method, for a partial linear model in the setting of a 2-stage ODS with a continuous outcome. We develop the asymptotic properties and conduct simulation studies to demonstrate the performance of the proposed estimator. A real environmental study data set is used to illustrate the proposed method. 相似文献

7.

Semiparametric estimation exploiting covariate independence in two-phase randomized trials

Dai JY LeBlanc M Kooperberg C 《Biometrics》2009,65(1):178-187

Summary . Recent results for case–control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment–biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton–Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment–biomarker interactions. 相似文献

8.

Generalized case-control sampling under generalized linear models

Jacob M. Maronge Ran Tao Jonathan S. Schildcrout Paul J. Rathouz 《Biometrics》2023,79(1):332-343

A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study data using the recently developed semiparametric extension of the generalized linear model (GLM), which is substantially more robust to model misspecification than existing approaches based on parametric GLMs. For valid estimation and inference, we use a conditional likelihood to account for the biased sampling design. We describe analysis procedures for estimation and inference for the semiparametric GLM under a conditional likelihood, and we discuss problems with estimation and inference under a conditional likelihood when the response distribution is misspecified. We demonstrate the flexibility of our approach over existing ones through extensive simulation studies, and we apply the methodology to an analysis of the Asset and Health Dynamics Among the Oldest Old study, which motives our research. The proposed approach yields a simple yet versatile solution for handling ODS in a wide variety of possible response distributions and sampling schemes encountered in practice. 相似文献

9.

Cox Regression in Nested Case–Control Studies with Auxiliary Covariates

Mengling Liu Wenbin Lu Chi‐hong Tseng 《Biometrics》2010,66(2):374-381

Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. 相似文献

10.

Nonparametric and semiparametric estimation with sequentially truncated survival data

Rebecca A. Betensky Jing Qian Jingyao Hou 《Biometrics》2023,79(2):1000-1013

In observational cohort studies with complex sampling schemes, truncation arises when the time to event of interest is observed only when it falls below or exceeds another random time, that is, the truncation time. In more complex settings, observation may require a particular ordering of event times; we refer to this as sequential truncation. Estimators of the event time distribution have been developed for simple left-truncated or right-truncated data. However, these estimators may be inconsistent under sequential truncation. We propose nonparametric and semiparametric maximum likelihood estimators for the distribution of the event time of interest in the presence of sequential truncation, under two truncation models. We show the equivalence of an inverse probability weighted estimator and a product limit estimator under one of these models. We study the large sample properties of the proposed estimators and derive their asymptotic variance estimators. We evaluate the proposed methods through simulation studies and apply the methods to an Alzheimer's disease study. We have developed an R package, seqTrun , for implementation of our method. 相似文献

11.

Mixed effect regression analysis for a cluster-based two-stage outcome-auxiliary-dependent sampling design with a continuous outcome

Xu W Zhou H 《Biostatistics (Oxford, England)》2012,13(4):650-664

Two-stage design is a well-known cost-effective way for conducting biomedical studies when the exposure variable is expensive or difficult to measure. Recent research development further allowed one or both stages of the two-stage design to be outcome dependent on a continuous outcome variable. This outcome-dependent sampling feature enables further efficiency gain in parameter estimation and overall cost reduction of the study (e.g. Wang, X. and Zhou, H., 2010. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66, 502-511; Zhou, H., Song, R., Wu, Y. and Qin, J., 2011. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202). In this paper, we develop a semiparametric mixed effect regression model for data from a two-stage design where the second-stage data are sampled with an outcome-auxiliary-dependent sample (OADS) scheme. Our method allows the cluster- or center-effects of the study subjects to be accounted for. We propose an estimated likelihood function to estimate the regression parameters. Simulation study indicates that greater study efficiency gains can be achieved under the proposed two-stage OADS design with center-effects when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a dataset from the Collaborative Perinatal Project. 相似文献

12.

Multiple augmentation with partial missing regressors

Ma S 《Biometrical journal. Biometrische Zeitschrift》2006,48(1):83-92

In large cohort studies, it is common that a subset of the regressors may be missing for some study subjects by design or happenstance. In this article, we apply the multiple data augmentation techniques to semiparametric models for epidemiologic data when a subset of the regressors are missing for some subjects, under the assumption that the data are missing at random in the sense of Rubin (2004) and that the missingness probabilities depend jointly on the observable subset of regressors, on a set of observable extraneous variables and on the outcome. Computational algorithms for the Poor Man's and the Asymptotic Normal data augmentations are investigated. Simulation studies show that the data augmentation approach generates satisfactory estimates and is computationally affordable. Under certain simulation scenarios, the proposed approach can achieve asymptotic efficiency similar to the maximum likelihood approach. We apply the proposed technique to the Multi-Ethic Study of Atherosclerosis (MESA) data and the South Wales Nickel Worker Study data. 相似文献

13.

Variance estimation for systematic designs in spatial surveys

Fewster RM 《Biometrics》2011,67(4):1518-1531

Summary In spatial surveys for estimating the density of objects in a survey region, systematic designs will generally yield lower variance than random designs. However, estimating the systematic variance is well known to be a difficult problem. Existing methods tend to overestimate the variance, so although the variance is genuinely reduced, it is over‐reported, and the gain from the more efficient design is lost. The current approaches to estimating a systematic variance for spatial surveys are to approximate the systematic design by a random design, or approximate it by a stratified design. Previous work has shown that approximation by a random design can perform very poorly, while approximation by a stratified design is an improvement but can still be severely biased in some situations. We develop a new estimator based on modeling the encounter process over space. The new “striplet” estimator has negligible bias and excellent precision in a wide range of simulation scenarios, including strip‐sampling, distance‐sampling, and quadrat‐sampling surveys, and including populations that are highly trended or have strong aggregation of objects. We apply the new estimator to survey data for the spotted hyena (Crocuta crocuta) in the Serengeti National Park, Tanzania, and find that the reported coefficient of variation for estimated density is 20% using approximation by a random design, 17% using approximation by a stratified design, and 11% using the new striplet estimator. This large reduction in reported variance is verified by simulation. 相似文献

14.

Improving trial generalizability using observational studies

Dasom Lee Shu Yang Lin Dong Xiaofei Wang Donglin Zeng Jianwen Cai 《Biometrics》2023,79(2):1213-1225

Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery. 相似文献

15.

A targeted maximum likelihood estimator for two-stage designs

Rose S van der Laan MJ 《The international journal of biostatistics》2011,7(1):17

We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator. 相似文献

16.

Estimating the encounter rate variance in distance sampling 总被引：1，自引：0，他引：1

Fewster RM Buckland ST Burnham KP Borchers DL Jupp PE Laake JL Thomas L 《Biometrics》2009,65(1):225-236

Summary . The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. 相似文献

17.

Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data 总被引：1，自引：0，他引：1

Xue Liugen; Zhu Lixing 《Biometrika》2007,94(4):921-937

A semiparametric regression model for longitudinal data is considered.The empirical likelihood method is used to estimate the regressioncoefficients and the baseline function, and to construct confidenceregions and intervals. It is proved that the maximum empiricallikelihood estimator of the regression coefficients achievesasymptotic efficiency and the estimator of the baseline functionattains asymptotic normality when a bias correction is made.Two calibrated empirical likelihood approaches to inferencefor the baseline function are developed. We propose a groupwiseempirical likelihood procedure to handle the inter-series dependencefor the longitudinal semiparametric regression model, and employbias correction to construct the empirical likelihood ratiofunctions for the parameters of interest. This leads us to provea nonparametric version of Wilks' theorem. Compared with methodsbased on normal approximations, the empirical likelihood doesnot require consistent estimators for the asymptotic varianceand bias. A simulation compares the empirical likelihood andnormal-based methods in terms of coverage accuracies and averageareas/lengths of confidence regions/intervals. 相似文献

18.

Semiparametric estimation of the transformation model by leveraging external aggregate data in the presence of population heterogeneity

Yu-Jen Cheng Yen-Chun Liu Chang-Yu Tsai Chiung-Yu Huang 《Biometrics》2023,79(3):1996-2009

Leveraging information in aggregate data from external sources to improve estimation efficiency and prediction accuracy with smaller scale studies has drawn a great deal of attention in recent years. Yet, conventional methods often either ignore uncertainty in the external information or fail to account for the heterogeneity between internal and external studies. This article proposes an empirical likelihood-based framework to improve the estimation of the semiparametric transformation models by incorporating information about the t-year subgroup survival probability from external sources. The proposed estimation procedure incorporates an additional likelihood component to account for uncertainty in the external information and employs a density ratio model to characterize population heterogeneity. We establish the consistency and asymptotic normality of the proposed estimator and show that it is more efficient than the conventional pseudopartial likelihood estimator without combining information. Simulation studies show that the proposed estimator yields little bias and outperforms the conventional approach even in the presence of information uncertainty and heterogeneity. The proposed methodologies are illustrated with an analysis of a pancreatic cancer study. 相似文献

19.

Evaluating the Predictive Value of Biomarkers with Stratified Case‐Cohort Design

Dandan Liu Tianxi Cai Yingye Zheng 《Biometrics》2012,68(4):1219-1227

Summary Identification of novel biomarkers for risk assessment is important for both effective disease prevention and optimal treatment recommendation. Discovery relies on the precious yet limited resource of stored biological samples from large prospective cohort studies. Case‐cohort sampling design provides a cost‐effective tool in the context of biomarker evaluation, especially when the clinical condition of interest is rare. Existing statistical methods focus on making efficient inference on relative hazard parameters from the Cox regression model. Drawing on recent theoretical development on the weighted likelihood for semiparametric models under two‐phase studies ( Breslow and Wellner, 2007 ), we propose statistical methods to evaluate accuracy and predictiveness of a risk prediction biomarker, with censored time‐to‐event outcome under stratified case‐cohort sampling. We consider nonparametric methods and a semiparametric method. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using data from Framingham Offspring Study to evaluate the accuracy of a recently developed risk score incorporating biomarker information for predicting cardiovascular disease. 相似文献

20.

A Combined Product Estimator in Sample Survey

Parvinder Kaur 《Biometrical journal. Biometrische Zeitschrift》1984,26(7):749-753

For the estimation of the population mean in stratified random sampling a ‘Combined Product Estimator’ is proposed which is more efficient than the ‘Combined Ratio’ and ‘Separate Ratio’ estimators. Also, the proposed estimator have exact expressions for bias and mean square error. An empirical illustration is given to compare the efficiencies of different estimators. 相似文献