首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Complex disease by definition results from the interplay of genetic and environmental factors. However, it is currently unclear how gene-environment interaction can best be used to locate complex disease susceptibility loci, particularly in the context of studies where between 1,000 and 1,000,000 markers are scanned for association with disease. We present a joint test of marginal association and gene-environment interaction for case-control data. We compare the power and sample size requirements of this joint test to other analyses: the marginal test of genetic association, the standard test for gene-environment interaction based on logistic regression, and the case-only test for interaction that exploits gene-environment independence. Although for many penetrance models the joint test of genetic marginal effect and interaction is not the most powerful, it is nearly optimal across all penetrance models we considered. In particular, it generally has better power than the marginal test when the genetic effect is restricted to exposed subjects and much better power than the tests of gene-environment interaction when the genetic effect is not restricted to a particular exposure level. This makes the joint test an attractive tool for large-scale association scans where the true gene-environment interaction model is unknown.  相似文献   

2.
In non-randomized studies, the assessment of a causal effect of treatment or exposure on outcome is hampered by possible confounding. Applying multiple regression models including the effects of treatment and covariates on outcome is the well-known classical approach to adjust for confounding. In recent years other approaches have been promoted. One of them is based on the propensity score and considers the effect of possible confounders on treatment as a relevant criterion for adjustment. Another proposal is based on using an instrumental variable. Here inference relies on a factor, the instrument, which affects treatment but is thought to be otherwise unrelated to outcome, so that it mimics randomization. Each of these approaches can basically be interpreted as a simple reweighting scheme, designed to address confounding. The procedures will be compared with respect to their fundamental properties, namely, which bias they aim to eliminate, which effect they aim to estimate, and which parameter is modelled. We will expand our overview of methods for analysis of non-randomized studies to methods for analysis of randomized controlled trials and show that analyses of both study types may target different effects and different parameters. The considerations will be illustrated using a breast cancer study with a so-called Comprehensive Cohort Study design, including a randomized controlled trial and a non-randomized study in the same patient population as sub-cohorts. This design offers ideal opportunities to discuss and illustrate the properties of the different approaches.  相似文献   

3.
A time-dependent measure, termed the rate ratio, was proposed to assess the local dependence between two types of recurrent event processes in one-sample settings. However, the one-sample work does not consider modeling the dependence by covariates such as subject characteristics and treatments received. The focus of this paper is to understand how and in what magnitude the covariates influence the dependence strength for bivariate recurrent events. We propose the covariate-adjusted rate ratio, a measure of covariate-adjusted dependence. We propose a semiparametric regression model for jointly modeling the frequency and dependence of bivariate recurrent events: the first level is a proportional rates model for the marginal rates and the second level is a proportional rate ratio model for the dependence structure. We develop a pseudo-partial likelihood to estimate the parameters in the proportional rate ratio model. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. We illustrate the proposed models and methods using a soft tissue sarcoma study that examines the effects of initial treatments on the marginal frequencies of local/distant sarcoma recurrence and the dependence structure between the two types of cancer recurrence.  相似文献   

4.
Exposure measurement error can result in a biased estimate of the association between an exposure and outcome. When the exposure–outcome relationship is linear on the appropriate scale (e.g. linear, logistic) and the measurement error is classical, that is the result of random noise, the result is attenuation of the effect. When the relationship is non‐linear, measurement error distorts the true shape of the association. Regression calibration is a commonly used method for correcting for measurement error, in which each individual's unknown true exposure in the outcome regression model is replaced by its expectation conditional on the error‐prone measure and any fully measured covariates. Regression calibration is simple to execute when the exposure is untransformed in the linear predictor of the outcome regression model, but less straightforward when non‐linear transformations of the exposure are used. We describe a method for applying regression calibration in models in which a non‐linear association is modelled by transforming the exposure using a fractional polynomial model. It is shown that taking a Bayesian estimation approach is advantageous. By use of Markov chain Monte Carlo algorithms, one can sample from the distribution of the true exposure for each individual. Transformations of the sampled values can then be performed directly and used to find the expectation of the transformed exposure required for regression calibration. A simulation study shows that the proposed approach performs well. We apply the method to investigate the relationship between usual alcohol intake and subsequent all‐cause mortality using an error model that adjusts for the episodic nature of alcohol consumption.  相似文献   

5.
In the regression analysis of clustered data it is important to allow for the possibility of distinct between- and within-cluster exposure effects on the outcome measure, represented, respectively, by regression coefficients for the cluster mean and the deviation of the individual-level exposure value from this mean. In twin data, the within-pair regression effect represents association conditional on exposures shared within pairs, including any common genetic or environmental influences on the outcome measure. It has therefore been proposed that a comparison of the within-pair regression effects between monozygous (MZ) and dizygous (DZ) twins can be used to examine whether the association between exposure and outcome has a genetic origin. We address this issue by proposing a bivariate model for exposure and outcome measurements in twin-pair data. The between- and within-pair regression coefficients are shown to be weighted averages of ratios of the exposure and outcome variances and covariances, from which it is straightforward to determine the conditions under which the within-pair regression effect in MZ pairs will be different from that in DZ pairs. In particular, we show that a correlation structure in twin pairs for exposure and outcome that appears to be due to genetic factors will not necessarily be reflected in distinct MZ and DZ values for the within-pair regression coefficients. We illustrate these results in a study of female twin pairs from Australia and North America relating mammographic breast density to weight and body mass index.  相似文献   

6.
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.  相似文献   

7.
Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and pointwise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.  相似文献   

8.
J M Robins  S D Mark  W K Newey 《Biometrics》1992,48(2):479-495
In order to estimate the causal effects of one or more exposures or treatments on an outcome of interest, one has to account for the effect of "confounding factors" which both covary with the exposures or treatments and are independent predictors of the outcome. In this paper we present regression methods which, in contrast to standard methods, adjust for the confounding effect of multiple continuous or discrete covariates by modelling the conditional expectation of the exposures or treatments given the confounders. In the special case of a univariate dichotomous exposure or treatment, this conditional expectation is identical to what Rosenbaum and Rubin have called the propensity score. They have also proposed methods to estimate causal effects by modelling the propensity score. Our methods generalize those of Rosenbaum and Rubin in several ways. First, our approach straightforwardly allows for multivariate exposures or treatments, each of which may be continuous, ordinal, or discrete. Second, even in the case of a single dichotomous exposure, our approach does not require subclassification or matching on the propensity score so that the potential for "residual confounding," i.e., bias, due to incomplete matching is avoided. Third, our approach allows a rather general formalization of the idea that it is better to use the "estimated propensity score" than the true propensity score even when the true score is known. The additional power of our approach derives from the fact that we assume the causal effects of the exposures or treatments can be described by the parametric component of a semiparametric regression model. To illustrate our methods, we reanalyze the effect of current cigarette smoking on the level of forced expiratory volume in one second in a cohort of 2,713 adult white males. We compare the results with those obtained using standard methods.  相似文献   

9.
Since the seminal work of Prentice and Pyke, the prospective logistic likelihood has become the standard method of analysis for retrospectively collected case‐control data, in particular for testing the association between a single genetic marker and a disease outcome in genetic case‐control studies. In the study of multiple genetic markers with relatively small effects, especially those with rare variants, various aggregated approaches based on the same prospective likelihood have been developed to integrate subtle association evidence among all the markers considered. Many of the commonly used tests are derived from the prospective likelihood under a common‐random‐effect assumption, which assumes a common random effect for all subjects. We develop the locally most powerful aggregation test based on the retrospective likelihood under an independent‐random‐effect assumption, which allows the genetic effect to vary among subjects. In contrast to the fact that disease prevalence information cannot be used to improve efficiency for the estimation of odds ratio parameters in logistic regression models, we show that it can be utilized to enhance the testing power in genetic association studies. Extensive simulations demonstrate the advantages of the proposed method over the existing ones. A real genome‐wide association study is analyzed for illustration.  相似文献   

10.
We propose a method for testing gene-environment (G × E) interactions on a complex trait in family-based studies in which a phenotypic ascertainment criterion has been imposed. This novel approach employs G-estimation, a semiparametric estimation technique from the causal inference literature, to avoid modeling of the association between the environmental exposure and the phenotype, to gain robustness against unmeasured confounding due to population substructure, and to acknowledge the ascertainment conditions. The proposed test allows for incomplete parental genotypes. It is compared by simulation studies to an analogous conditional likelihood-based approach and to the QBAT-I test, which also invokes the G-estimation principle but ignores ascertainment. We apply our approach to a study of chronic obstructive pulmonary disorder.  相似文献   

11.
Summary Model‐based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter, , denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (), BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC, with estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We, then, compare BAC, a recent approach of Crainiceanu, Dominici, and Parmigiani (2008 , Biometrika 95, 635–651), and traditional BMA in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999–2005. Using each approach, we estimate the short‐term effects of on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.  相似文献   

12.
We address estimation of the marginal effect of a time‐varying binary treatment on a continuous longitudinal outcome in the context of observational studies using electronic health records, when the relationship of interest is confounded, mediated, and further distorted by an informative visit process. We allow the longitudinal outcome to be recorded only sporadically and assume that its monitoring timing is informed by patients' characteristics. We propose two novel estimators based on linear models for the mean outcome that incorporate an adjustment for confounding and informative monitoring process through generalized inverse probability of treatment weights and a proportional intensity model, respectively. We allow for a flexible modeling of the intercept function as a function of time. Our estimators have closed‐form solutions, and their asymptotic distributions can be derived. Extensive simulation studies show that both estimators outperform standard methods such as the ordinary least squares estimator or estimators that only account for informative monitoring or confounders. We illustrate our methods using data from the Add Health study, assessing the effect of depressive mood on weight in adolescents.  相似文献   

13.
Covariate-adjusted regression was recently proposed for situations where both predictors and response in a regression model are not directly observed, but are observed after being contaminated by unknown functions of a common observable covariate. The method has been appealing because of its flexibility in targeting the regression coefficients under different forms of distortion. We extend this methodology proposed for regression into the framework of varying coefficient models, where the goal is to target the covariate-adjusted relationship between longitudinal variables. The proposed method of covariate-adjusted varying coefficient model (CAVCM) is illustrated with an analysis of a longitudinal data set containing calcium absorbtion and intake measurements on 188 subjects. We estimate the age-dependent relationship between these two variables adjusted for the covariate body surface area. Simulation studies demonstrate the flexibility of CAVCM in handling different forms of distortion in the longitudinal setting.  相似文献   

14.
Commonly used semiparametric estimators of causal effects specify parametric models for the propensity score (PS) and the conditional outcome. An example is an augmented inverse probability weighting (IPW) estimator, frequently referred to as a doubly robust estimator, because it is consistent if at least one of the two models is correctly specified. However, in many observational studies, the role of the parametric models is often not to provide a representation of the data-generating process but rather to facilitate the adjustment for confounding, making the assumption of at least one true model unlikely to hold. In this paper, we propose a crude analytical approach to study the large-sample bias of estimators when the models are assumed to be approximations of the data-generating process, namely, when all models are misspecified. We apply our approach to three prototypical estimators of the average causal effect, two IPW estimators, using a misspecified PS model, and an augmented IPW (AIPW) estimator, using misspecified models for the outcome regression (OR) and the PS. For the two IPW estimators, we show that normalization, in addition to having a smaller variance, also offers some protection against bias due to model misspecification. To analyze the question of when the use of two misspecified models is better than one we derive necessary and sufficient conditions for when the AIPW estimator has a smaller bias than a simple IPW estimator and when it has a smaller bias than an IPW estimator with normalized weights. If the misspecification of the outcome model is moderate, the comparisons of the biases of the IPW and AIPW estimators show that the AIPW estimator has a smaller bias than the IPW estimators. However, all biases include a scaling with the PS-model error and we suggest caution in modeling the PS whenever such a model is involved. For numerical and finite sample illustrations, we include three simulation studies and corresponding approximations of the large-sample biases. In a dataset from the National Health and Nutrition Examination Survey, we estimate the effect of smoking on blood lead levels.  相似文献   

15.
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.  相似文献   

16.
Mendelian Randomisation (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants as instrumental variables (IVs) for the exposure. The effect estimates obtained from MR studies are often interpreted as the lifetime effect of the exposure in question. However, the causal effects of some exposures are thought to vary throughout an individual’s lifetime with periods during which an exposure has a greater effect on a particular outcome. Multivariable MR (MVMR) is an extension of MR that allows for multiple, potentially highly related, exposures to be included in an MR estimation. MVMR estimates the direct effect of each exposure on the outcome conditional on all the other exposures included in the estimation. We explore the use of MVMR to estimate the direct effect of a single exposure at different time points in an individual’s lifetime on an outcome. We use simulations to illustrate the interpretation of the results from such analyses and the key assumptions required. We show that causal effects at different time periods can be estimated through MVMR when the association between the genetic variants used as instruments and the exposure measured at those time periods varies. However, this estimation will not necessarily identify exact time periods over which an exposure has the most effect on the outcome. Prior knowledge regarding the biological basis of exposure trajectories can help interpretation. We illustrate the method through estimation of the causal effects of childhood and adult BMI on C-Reactive protein and smoking behaviour.  相似文献   

17.
Most studies investigating the relationship between passive smoking and child health have found a significant effect on respiratory illness and lung function. The wide range of findings is based on diverse types of studies which use multiple criteria for respiratory illness, smoke exposure, and outcome variables. The aim of this review is to examine these studies in an attempt to focus attention on methodological criteria which relate to the strength of the association and likelihood of a causal relationship between passive smoking and child health. We examined 30 studies and judged their strength by examining (1) data collection, (2) surveillance bias, (3) definition of amount of smoking, (4) definition of illness, (5) detection bias, (6) outcome variables, and (7) control for confounding variables. Poor scores were noted in the use of "blinded" data collectors (37 percent of possible score), use of multiple specific outcome variables (51 percent), and definition of the quantity of smoking (56 percent). Good scores were noted in the detection of illnesses (98 percent), recall by study subjects of symptoms of illness (71 percent), control for confounding variables (81 percent), and definition of illnesses (86 percent). The range of scores for the studies was from 44 percent to 89 percent (of the total possible score). While a few well-designed studies demonstrate a significant effect of passive smoking on child health, most studies had significant design problems that prevent reliance on their conclusions. Thus, many questions remain, and future studies should consider important methodological standards to determine more accurately the effect of passive smoking on child health.  相似文献   

18.
In modern genetic epidemiology studies, the association between the disease and a genomic region, such as a candidate gene, is often investigated using multiple SNPs. We propose a multilocus test of genetic association that can account for genetic effects that might be modified by variants in other genes or by environmental factors. We consider use of the venerable and parsimonious Tukey's 1-degree-of-freedom model of interaction, which is natural when individual SNPs within a gene are associated with disease through a common biological mechanism; in contrast, many standard regression models are designed as if each SNP has unique functional significance. On the basis of Tukey's model, we propose a novel but computationally simple generalized test of association that can simultaneously capture both the main effects of the variants within a genomic region and their interactions with the variants in another region or with an environmental exposure. We compared performance of our method with that of two standard tests of association, one ignoring gene-gene/gene-environment interactions and the other based on a saturated model of interactions. We demonstrate major power advantages of our method both in analysis of data from a case-control study of the association between colorectal adenoma and DNA variants in the NAT2 genomic region, which are well known to be related to a common biological phenotype, and under different models of gene-gene interactions with use of simulated data.  相似文献   

19.
In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.  相似文献   

20.
In this paper, we propose a unified Bayesian joint modeling framework for studying association between a binary treatment outcome and a baseline matrix-valued predictor. Specifically, a joint modeling approach relating an outcome to a matrix-valued predictor through a probabilistic formulation of multilinear principal component analysis is developed. This framework establishes a theoretical relationship between the outcome and the matrix-valued predictor, although the predictor is not explicitly expressed in the model. Simulation studies are provided showing that the proposed method is superior or competitive to other methods, such as a two-stage approach and a classical principal component regression in terms of both prediction accuracy and estimation of association; its advantage is most notable when the sample size is small and the dimensionality in the imaging covariate is large. Finally, our proposed joint modeling approach is shown to be a very promising tool in an application exploring the association between baseline electroencephalography data and a favorable response to treatment in a depression treatment study by achieving a substantial improvement in prediction accuracy in comparison to competing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号