首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gustafson P  Le Nhu D 《Biometrics》2002,58(4):878-887
It is well known that imprecision in the measurement of predictor variables typically leads to bias in estimated regression coefficients. We compare the bias induced by measurement error in a continuous predictor with that induced by misclassification of a binary predictor in the contexts of linear and logistic regression. To make the comparison fair, we consider misclassification probabilities for a binary predictor that correspond to dichotomizing an imprecise continuous predictor in lieu of its precise counterpart. On this basis, nondifferential binary misclassification is seen to yield more bias than nondifferential continuous measurement error. However, it is known that differential misclassification results if a binary predictor is actually formed by dichotomizing a continuous predictor subject to nondifferential measurement error. When the postulated model linking the response and precise continuous predictor is correct, this differential misclassification is found to yield less bias than continuous measurement error, in contrast with nondifferential misclassification, i.e., dichotomization reduces the bias due to mismeasurement. This finding, however, is sensitive to the form of the underlying relationship between the response and the continuous predictor. In particular, we give a scenario where dichotomization involves a trade-off between model fit and misclassification bias. We also examine how the bias depends on the choice of threshold in the dichotomization process and on the correlation between the imprecise predictor and a second precise predictor.  相似文献   

2.
Mendelian Randomisation (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants as instrumental variables (IVs) for the exposure. The effect estimates obtained from MR studies are often interpreted as the lifetime effect of the exposure in question. However, the causal effects of some exposures are thought to vary throughout an individual’s lifetime with periods during which an exposure has a greater effect on a particular outcome. Multivariable MR (MVMR) is an extension of MR that allows for multiple, potentially highly related, exposures to be included in an MR estimation. MVMR estimates the direct effect of each exposure on the outcome conditional on all the other exposures included in the estimation. We explore the use of MVMR to estimate the direct effect of a single exposure at different time points in an individual’s lifetime on an outcome. We use simulations to illustrate the interpretation of the results from such analyses and the key assumptions required. We show that causal effects at different time periods can be estimated through MVMR when the association between the genetic variants used as instruments and the exposure measured at those time periods varies. However, this estimation will not necessarily identify exact time periods over which an exposure has the most effect on the outcome. Prior knowledge regarding the biological basis of exposure trajectories can help interpretation. We illustrate the method through estimation of the causal effects of childhood and adult BMI on C-Reactive protein and smoking behaviour.  相似文献   

3.

Background

Random forest (RF) is a machine-learning method that generally works well with high-dimensional problems and allows for nonlinear relationships between predictors; however, the presence of correlated predictors has been shown to impact its ability to identify strong predictors. The Random Forest-Recursive Feature Elimination algorithm (RF-RFE) mitigates this problem in smaller data sets, but this approach has not been tested in high-dimensional omics data sets.

Results

We integrated 202,919 genotypes and 153,422 methylation sites in 680 individuals, and compared the abilities of RF and RF-RFE to detect simulated causal associations, which included simulated genotype–methylation interactions, between these variables and triglyceride levels. Results show that RF was able to identify strong causal variables with a few highly correlated variables, but it did not detect other causal variables.

Conclusions

Although RF-RFE decreased the importance of correlated variables, in the presence of many correlated variables, it also decreased the importance of causal variables, making both hard to detect. These findings suggest that RF-RFE may not scale to high-dimensional data.
  相似文献   

4.
While there is recognition that more informative clinical endpoints can support better decision-making in clinical trials, it remains a common practice to categorize endpoints originally measured on a continuous scale. The primary motivation for this categorization (and most commonly dichotomization) is the simplicity of the analysis. There is, however, a long argument that this simplicity can come at a high cost. Specifically, larger sample sizes are needed to achieve the same level of accuracy when using a dichotomized outcome instead of the original continuous endpoint. The degree of “loss of information” has been studied in the contexts of parallel-group designs and two-stage Phase II trials. Limited attention, however, has been given to the quantification of the associated losses in dose-ranging trials. In this work, we propose an approach to estimate the associated losses in Phase II dose-ranging trials that is free of the actual dose-ranging design used and depends on the clinical setting only. The approach uses the notion of a nonparametric optimal benchmark for dose-finding trials, an evaluation tool that facilitates the assessment of a dose-finding design by providing an upper bound on its performance under a given scenario in terms of the probability of the target dose selection. After demonstrating how the benchmark can be applied to Phase II dose-ranging trials, we use it to quantify the dichotomization losses. Using parameters from real clinical trials in various therapeutic areas, it is found that the ratio of sample sizes needed to obtain the same precision using continuous and binary (dichotomized) endpoints varies between 70% and 75% under the majority of scenarios but can drop to 50% in some cases.  相似文献   

5.
With single blastocyst transfer practice becoming more common in ART, there is a greater demand for a convenient and reliable cryostorage of surplus blastocysts. Vitrification has emerged in the last decade as an alternative promising substitute for slow freezing. Blastocysts represent a unique challenge in cryostorage due to their size, multicellular structure and presence of blastocoele. The continuous acquisition of experience and introduction of many different technological developments has led to the improvement of vitrification as a technology and improved the results of its application in blastocyst cryostorage. The current information concerning safety and efficacy of the vitrification of blastocysts will be reviewed along with the variables that can impact the outcome of the procedure.  相似文献   

6.
Learning causality from data is known as the causal discovery problem, and it is an important and relatively new field. In many applications, there often exist latent variables, if such latent variables are completely ignored, which can lead to the estimation results seriously biased. In this paper, a method of combining exploratory factor analysis and path analysis (EFA-PA) is proposed to infer the causality in the presence of latent variables. Our method expands latent variables as well as their linear causal relationships with observed variables, which enhances the accuracy of causal models. Such model can be thought of as the simplest possible causal models for continuous data. The EFA-PA is very similar to that of structural equation model, but the theoretical model established by the structural equation model needs to be modified in the process of data fitting until the ideal model is established.The model gained by EFA-PA not only avoids subjectivity but also reduces estimation complexity. It is found that the EFA-PA estimation model is superior to the other models. EFA-PA can provides a basis for the correct estimation of the causal relationship between the observed variables in the presence of latent variables. The experiment shows that EFA-PA is better than the structural equation model.  相似文献   

7.
Trend test based on cross-classified data in dose-response has been a central problem in medicine. Most of existing test methods are known to only fit to binary response variables. However, the approaches for binary response tables may suffer from the lack of a clear choice for dichotomization. For multivariate response with ordered categories, some studies have been done for simple stochastic order, likelihood ratio order and so on. However, methods of statistical inference on increasing convex order for more than two multinomial populations have not been fully developed. For testing the increasing convex order alternative, this article provides a model-free test method which can be used in the case of two-way tables and stratified data. Two real examples will be used to illustrate how to apply our test method.  相似文献   

8.
Adaptive diversification is driven by selection in ecologically different environments. In absence of geographical barriers to dispersal, this adaptive divergence (AD) may be constrained by gene flow (GF). And yet the reverse may also be true, with AD constraining GF (i.e. 'ecological speciation'). Both of these causal effects have frequently been inferred from the presence of negative correlations between AD and GF in nature - yet the bi-directional causality warrants caution in such inferences. We discuss how the ability of correlative studies to infer causation might be improved through the simultaneous measurement of multiple ecological and evolutionary variables. On the one hand, inferences about the causal role of GF can be made by examining correlations between AD and the potential for dispersal. On the other hand, inferences about the causal role of AD can be made by examining correlations between GF and environmental differences. Experimental manipulations of dispersal and environmental differences are a particularly promising approach for inferring causation. At present, the best studies find strong evidence that GF constrains AD and some studies also find the reverse. Improvements in empirical approaches promise to eventually allow general inferences about the relative strength of different causal interactions during adaptive diversification.  相似文献   

9.
Inverse‐probability‐of‐treatment weighted (IPTW) estimation has been widely used to consistently estimate the causal parameters in marginal structural models, with time‐dependent confounding effects adjusted for. Just like other causal inference methods, the validity of IPTW estimation typically requires the crucial condition that all variables are precisely measured. However, this condition, is often violated in practice due to various reasons. It has been well documented that ignoring measurement error often leads to biased inference results. In this paper, we consider the IPTW estimation of the causal parameters in marginal structural models in the presence of error‐contaminated and time‐dependent confounders. We explore several methods to correct for the effects of measurement error on the estimation of causal parameters. Numerical studies are reported to assess the finite sample performance of the proposed methods.  相似文献   

10.
Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions.  相似文献   

11.
Leukaemias are a heterogeneous group of tumours including acute and chronic forms. Considerable efforts have been made to identify risk factors for these diseases, but only a minority of leukaemia cases can currently be attributed to identified or hypothesized factors. This review highlights recent epidemiological literature concerning adult leukaemia, discussing in detail the hereditary, environmental and medical risks. Chromosomal syndromes and genetically based diseases carry a high risk of leukaemia, but rarely occur in the population. Environmental and occupational exposures to chemicals including pesticides have been widely studied, although the results are not consistent, with the exception of benzene. Smoking seems to be a weak causal risk factor. The risk of ionizing radiation has further been quantified in recent studies, although the effects of low doses have not yet been clarified. The results for non-ionizing radiation continue to be inconsistent, but a large effect of electromagnetic fields on the risk of leukaemia appears to be unlikely. Medically applied radio- and chemotherapy are clearly associated with subsequent leukaemia development, and there are links between leukaemia and viral infections. Future research should emphasize the shortcomings in exposure assessment that pervade many studies, and interactions between different risk factors need to be taken into consideration. Received: 25 September 1997 / Accepted in revised form: 14 October 1997  相似文献   

12.
Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.  相似文献   

13.
Various assumptions have been used in the literature to identify natural direct and indirect effects in mediation analysis. These effects are of interest because they allow for effect decomposition of a total effect into a direct and indirect effect even in the presence of interactions or non-linear models. In this paper, we consider the relation and interpretation of various identification assumptions in terms of causal diagrams interpreted as a set of non-parametric structural equations. We show that for such causal diagrams, two sets of assumptions for identification that have been described in the literature are in fact equivalent in the sense that if either set of assumptions holds for all models inducing a particular causal diagram, then the other set of assumptions will also hold for all models inducing that diagram. We moreover build on prior work concerning a complete graphical identification criterion for covariate adjustment for total effects to provide a complete graphical criterion for using covariate adjustment to identify natural direct and indirect effects. Finally, we show that this criterion is equivalent to the two sets of independence assumptions used previously for mediation analysis.  相似文献   

14.
Summary .   We propose robust and efficient tests and estimators for gene–environment/gene–drug interactions in family-based association studies in which haplotypes, dichotomous/quantitative phenotypes, and complex exposure/treatment variables are analyzed. Using causal inference methodology, we show that the tests and estimators are robust against unmeasured confounding due to population admixture and stratification, provided that Mendel's law of segregation holds and that the considered exposure/treatment variable is not affected by the candidate gene under study. We illustrate the practical relevance of our approach by an application to a chronic obstructive pulmonary disease study. The data analysis suggests a gene–environment interaction between a single nucleotide polymorphism in the Serpine2 gene and smoking status/pack-years of smoking. Simulation studies show that the proposed methodology is sufficiently powered for realistic sample sizes and that it provides valid tests and effect size estimators in the presence of admixture and stratification.  相似文献   

15.
Thanks to their different senses, human observers acquire multiple information coming from their environment. Complex cross-modal interactions occur during this perceptual process. This article proposes a framework to analyze and model these interactions through a rigorous and systematic data-driven process. This requires considering the general relationships between the physical events or factors involved in the process, not only in quantitative terms, but also in term of the influence of one factor on another. We use tools from information theory and probabilistic reasoning to derive relationships between the random variables of interest, where the central notion is that of conditional independence. Using mutual information analysis to guide the model elicitation process, a probabilistic causal model encoded as a Bayesian network is obtained. We exemplify the method by using data collected in an audio-visual localization task for human subjects, and we show that it yields a well-motivated model with good predictive ability. The model elicitation process offers new prospects for the investigation of the cognitive mechanisms of multisensory perception.  相似文献   

16.
In Italy, in the eastern area of the Campania region, the illegal dumping and burning of waste have been documented, which could potentially affect the local population's health. In particular, toxic waste exposure has been suggested to associate with increased cancer development/mortality in these areas, although a causal link has not yet been established. In this pilot study, we evaluated blood levels of toxic heavy metals and persistent organic pollutants (POPs) in 95 patients with different cancer types residing in this area and in 27 healthy individuals. While we did not find any significant correlation between the blood levels of POPs and the provenance of the patients, we did observe high blood concentrations of heavy metals in some municipalities, including Giugliano, where many illegal waste disposal sites have previously been documented. Our results showed that patients with different cancer types from Giugliano had higher blood levels of heavy metals than healthy controls. Despite the obvious limitations of this exploratory study, our preliminary observations encourage further research assessing the possible association between exposure to hazardous waste, increased blood metals, and increased risk of cancer.  相似文献   

17.
Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

18.
Wang Y  Mogg R  Lunceford J 《Biometrics》2012,68(2):617-627
Biomarkers play an increasing role in the clinical development of new therapeutics. Earlier clinical decisions facilitated by biomarkers can lead to reduced costs and duration of drug development. Associations between biomarkers and clinical endpoints are often viewed as initial evidence supporting the intended purpose. As a result, even though it is widely understood that correlation is not proof of a causal relationship, correlation continues to be used as a metric for biomarker qualification in practice. In this article, we introduce a causal correlation framework where two different types of correlations are defined at the individual level. We show that the correlation estimate is a composite of different components, and needs to be interpreted with caution when used for biomarker qualification to avoid misleading conclusions. Otherwise, a significant correlation can be concluded even in the absence of a true underlying association. We also show how the causal quantities of interest are testable in a crossover design and provide discussion on the challenges that exist in a parallel group setting.  相似文献   

19.
Summary We examine situations where interest lies in the conditional association between outcome and exposure variables, given potential confounding variables. Concern arises that some potential confounders may not be measured accurately, whereas others may not be measured at all. Some form of sensitivity analysis might be employed, to assess how this limitation in available data impacts inference. A Bayesian approach to sensitivity analysis is straightforward in concept: a prior distribution is formed to encapsulate plausible relationships between unobserved and observed variables, and posterior inference about the conditional exposure–disease relationship then follows. In practice, though, it can be challenging to form such a prior distribution in both a realistic and simple manner. Moreover, it can be difficult to develop an attendant Markov chain Monte Carlo (MCMC) algorithm that will work effectively on a posterior distribution arising from a highly nonidentified model. In this article, a simple prior distribution for acknowledging both poorly measured and unmeasured confounding variables is developed. It requires that only a small number of hyperparameters be set by the user. Moreover, a particular computational approach for posterior inference is developed, because application of MCMC in a standard manner is seen to be ineffective in this problem.  相似文献   

20.
Recently, instrumental variables methods have been used to address non-compliance in randomized experiments. Complicating such analyses is often the presence of missing data. The standard model for missing data, missing at random (MAR), has some unattractive features in this context. In this paper we compare MAR-based estimates of the complier average causal effect (CACE) with an estimator based on an alternative, nonignorable model for the missing data process, developed by Frangakis and Rubin (1999, Biometrika, 86, 365-379). We also introduce a new missing data model that, like the Frangakis-Rubin model, is specially suited for models with instrumental variables, but makes different substantive assumptions. We analyze these issues in the context of a randomized trial of breast self-examination (BSE). In the study two methods of teaching BSE, consisting of either mailed information about BSE (the standard treatment) or the attendance of a course involving theoretical and practical sessions (the new treatment), were compared with the aim of assessing whether teaching programs could increase BSE practice and improve examination skills. The study was affected by the two sources of bias mentioned above: only 55% of women assigned to receive the new treatment complied with their assignment and 35% of the women did not respond to the post-test questionnaire. Comparing the causal estimand of the new treatment using the MAR, Frangakis-Rubin, and our new approach, the results suggest that for these data the MAR assumption appears least plausible, and that the new model appears most plausible among the three choices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号