首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Realistic power calculations for large cohort studies and nested case control studies are essential for successfully answering important and complex research questions in epidemiology and clinical medicine. For this, we provide a methodical framework for general realistic power calculations via simulations that we put into practice by means of an R‐based template. We consider staggered recruitment and individual hazard rates, competing risks, interaction effects, and the misclassification of covariates. The study cohort is assembled with respect to given age‐, gender‐, and community distributions. Nested case‐control analyses with a varying number of controls enable comparisons of power with a full cohort analysis. Time‐to‐event generation under competing risks, including delayed study‐entry times, is realized on the basis of a six‐state Markov model. Incidence rates, prevalence of risk factors and prefixed hazard ratios allow for the assignment of age‐dependent transition rates given in the form of Cox models. These provide the basis for a central simulation‐algorithm, which is used for the generation of sample paths of the underlying time‐inhomogeneous Markov processes. With the inclusion of frailty terms into the Cox models the Markov property is specifically biased. An “individual Markov process given frailty” creates some unobserved heterogeneity between individuals. Different left‐truncation‐ and right‐censoring patterns call for the use of Cox models for data analysis. p‐values are recorded over repeated simulation runs to allow for the desired power calculations. For illustration, we consider scenarios with a “testing” character as well as realistic scenarios. This enables the validation of a correct implementation of theoretical concepts and concrete sample size recommendations against an actual epidemiological background, here given with possible substudy designs within the German National Cohort.  相似文献   

2.
Outcome misclassification occurs frequently in binary-outcome studies and can result in biased estimation of quantities such as the incidence, prevalence, cause-specific hazards, cumulative incidence functions, and so forth. A number of remedies have been proposed to address the potential misclassification of the outcomes in such data. The majority of these remedies lie in the estimation of misclassification probabilities, which are in turn used to adjust analyses for outcome misclassification. A number of authors advocate using a gold-standard procedure on a sample internal to the study to learn about the extent of the misclassification. With this type of internal validation, the problem of quantifying the misclassification also becomes a missing data problem as, by design, the true outcomes are only ascertained on a subset of the entire study sample. Although, the process of estimating misclassification probabilities appears simple conceptually, the estimation methods proposed so far have several methodological and practical shortcomings. Most methods rely on missing outcome data to be missing completely at random (MCAR), a rather stringent assumption which is unlikely to hold in practice. Some of the existing methods also tend to be computationally-intensive. To address these issues, we propose a computationally-efficient, easy-to-implement, pseudo-likelihood estimator of the misclassification probabilities under a missing at random (MAR) assumption, in studies with an available internal-validation sample. We present the estimator through the lens of studies with competing-risks outcomes, though the estimator extends beyond this setting. We describe the consistency and asymptotic distributional properties of the resulting estimator, and derive a closed-form estimator of its variance. The finite-sample performance of this estimator is evaluated via simulations. Using data from a real-world study with competing-risks outcomes, we illustrate how the proposed method can be used to estimate misclassification probabilities. We also show how the estimated misclassification probabilities can be used in an external study to adjust for possible misclassification bias when modeling cumulative incidence functions.  相似文献   

3.
Conventional methods for sample size calculation for population-based longitudinal studies tend to overestimate the statistical power by overlooking important determinants of the required sample size, such as the measurement errors and unmeasured etiological determinants, etc. In contrast, a simulation-based sample size calculation, if designed properly, allows these determinants to be taken into account and offers flexibility in accommodating complex study design features. The Canadian Longitudinal Study on Aging (CLSA) is a Canada-wide, 20-year follow-up study of 30,000 people between the ages of 45 and 85 years, with in-depth information collected every 3 years. A simulation study, based on an illness-death model, was conducted to: (1) investigate the statistical power profile of the CLSA to detect the effect of environmental and genetic risk factors, and their interaction on age-related chronic diseases; and (2) explore the design alternatives and implementation strategies for increasing the statistical power of population-based longitudinal studies in general. The results showed that the statistical power to identify the effect of environmental and genetic risk exposures, and their interaction on a disease was boosted when: (1) the prevalence of the risk exposures increased; (2) the disease of interest is relatively common in the population; and (3) risk exposures were measured accurately. In addition, the frequency of data collection every three years in the CLSA led to a slightly lower statistical power compared to the design assuming that participants underwent health monitoring continuously. The CLSA had sufficient power to detect a small (1<hazard ratio (HR)≤1.5) or moderate effect (1.5< HR≤2.0) of the environmental risk exposure, as long as the risk exposure and the disease of interest were not rare. It had enough power to detect a moderate or large (2.0<HR≤3.0) effect of the genetic risk exposure when the prevalence of the risk exposure was not very low (≥0.1) and the disease of interest was not rare (such as diabetes and dementia). The CLSA had enough power to detect a large effect of the gene-environment interaction only when both risk exposures had relatively high prevalence (0.2) and the disease of interest was very common (such as diabetes). The minimum detectable hazard ratios (MDHR) of the CLSA for the environmental and genetic risk exposures obtained from this simulation study were larger than those calculated according to the conventional sample size calculation method. For example, the MDHR for the environmental risk exposure was 1.15 according to the conventional method if the prevalence of the risk exposure was 0.1 and the disease of interest was dementia. In contrast, the MDHR was 1.61 if the same exposure was measured every 3 years with a misclassification rate of 0.1 according to this simulation study. With a given sample size, higher statistical power could be achieved by increasing the measuring frequency in participants with high risk of declining health status or changing risk exposures, and by increasing measurement accuracy of diseases and risk exposures. A properly designed simulation-based sample size calculation is superior to conventional methods when rigorous sample size calculation is necessary.  相似文献   

4.
In epidemiologic studies, measurement error in the exposure variable can have a detrimental effect on the power of hypothesis testing for detecting the impact of exposure in the development of a disease. To adjust for misclassification in the hypothesis testing procedure involving a misclassified binary exposure variable, we consider a retrospective case–control scenario under the assumption of nondifferential misclassification. We develop a test under Bayesian approach from a posterior distribution generated by a MCMC algorithm and a normal prior under realistic assumptions. We compared this test with an equivalent likelihood ratio test developed under the frequentist approach, using various simulated settings and in the presence or the absence of validation data. In our simulations, we considered varying degrees of sensitivity, specificity, sample sizes, exposure prevalence, and proportion of unvalidated and validated data. In these scenarios, our simulation study shows that the adjusted model (with-validation data model) is always better than the unadjusted model (without validation data model). However, we showed that exception is possible in the fixed budget scenario where collection of the validation data requires a much higher cost. We also showed that both Bayesian and frequentist hypothesis testing procedures reach the same conclusions for the scenarios under consideration. The Bayesian approach is, however, computationally more stable in rare exposure contexts. A real case–control study was used to show the application of the hypothesis testing procedures under consideration.  相似文献   

5.
A recent study examining the relationship between distance to nearby power lines and childhood cancer risk re‐opened the debate about which exposure metrics are appropriate for power frequency magnetic field investigations. Using data from two large population‐based UK and German studies we demonstrate that distance to power lines is a comparatively poor predictor of measured residential magnetic fields. Even at proximities of 50 m or less, the positive predictive value of having a household measurement over 0.2 µT was only 19.4%. Clearly using distance from power lines, without taking account of other variables such as load, results in a poor proxy of residential magnetic field exposure. We conclude that such high levels of exposure misclassification render the findings from studies that rely on distance alone uninterpretable. Bioelectromagnetics 30:183–188, 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

6.
Phenotypic misclassification (between cases) has been shown to reduce the power to detect association in genetic studies. However, it is conceivable that complex traits are heterogeneous with respect to individual genetic susceptibility and disease pathophysiology, and that the effect of heterogeneity has a larger magnitude than the effect of phenotyping errors. Although an intuitively clear concept, the effect of heterogeneity on genetic studies of common diseases has received little attention. Here we investigate the impact of phenotypic and genetic heterogeneity on the statistical power of genome wide association studies (GWAS). We first performed a study of simulated genotypic and phenotypic data. Next, we analyzed the Wellcome Trust Case-Control Consortium (WTCCC) data for diabetes mellitus (DM) type 1 (T1D) and type 2 (T2D), using varying proportions of each type of diabetes in order to examine the impact of heterogeneity on the strength and statistical significance of association previously found in the WTCCC data. In both simulated and real data, heterogeneity (presence of “non-cases”) reduced the statistical power to detect genetic association and greatly decreased the estimates of risk attributed to genetic variation. This finding was also supported by the analysis of loci validated in subsequent large-scale meta-analyses. For example, heterogeneity of 50% increases the required sample size by approximately three times. These results suggest that accurate phenotype delineation may be more important for detecting true genetic associations than increase in sample size.  相似文献   

7.
Prepregnancy BMI is a widely used marker of maternal nutritional status that relies on maternal self‐report of prepregnancy weight and height. Pregravid BMI has been associated with adverse health outcomes for the mother and infant, but the impact of BMI misclassification on measures of effect has not been quantified. The authors applied published probabilistic bias analysis methods to quantify the impact of exposure misclassification bias on well‐established associations between self‐reported prepregnancy BMI category and five pregnancy outcomes (small for gestational age (SGA) and large for gestational age (LGA) birth, spontaneous preterm birth (sPTB), gestational diabetes mellitus (GDM), and preeclampsia) derived from a hospital‐based delivery database in Pittsburgh, PA (2003–2005; n = 18,362). The bias analysis method recreates the data that would have been observed had BMI been correctly classified, assuming given classification parameters. The point estimates derived from the bias analysis account for random error as well as systematic error caused by exposure misclassification bias and additional uncertainty contributed by classification errors. In conventional multivariable logistic regression models, underweight women were at increased risk of SGA and sPTB, and reduced risk of LGA, whereas overweight, obese, and severely obese women had elevated risks of LGA, GDM, and preeclampsia compared with normal‐weight women. After applying the probabilistic bias analysis method, adjusted point estimates were attenuated, indicating the conventional estimates were biased away from the null. However, the majority of relations remained readily apparent. This analysis suggests that in this population, associations between self‐reported prepregnancy BMI and pregnancy outcomes are slightly overestimated.  相似文献   

8.
Wang MC  Chen YQ 《Biometrics》2000,56(3):789-794
Recurrent event data are frequently encountered in longitudinal follow-up studies when the occurrences of multiple events are considered as the major outcomes. Suppose that the recurrent events are of the same type and the variable of interest is the recurrence time between successive events. In many applications, the distributional pattern of recurrence times can be used as an index for the progression of a disease. Such a distributional pattern is important for understanding the natural history of a disease or for confirming long-term treatment effect. In this article, we discuss and define the comparability of recurrence times. Nonparametric and semiparametric methods are developed for testing trend of recurrence time distributions and estimating trend parameters in regression models. The construction of the methods is based on comparable recurrence times from stratified data. A real data example is presented to illustrate the use of methodology.  相似文献   

9.

Background

Misclassification has been shown to have a high prevalence in binary responses in both livestock and human populations. Leaving these errors uncorrected before analyses will have a negative impact on the overall goal of genome-wide association studies (GWAS) including reducing predictive power. A liability threshold model that contemplates misclassification was developed to assess the effects of mis-diagnostic errors on GWAS. Four simulated scenarios of case–control datasets were generated. Each dataset consisted of 2000 individuals and was analyzed with varying odds ratios of the influential SNPs and misclassification rates of 5% and 10%.

Results

Analyses of binary responses subject to misclassification resulted in underestimation of influential SNPs and failed to estimate the true magnitude and direction of the effects. Once the misclassification algorithm was applied there was a 12% to 29% increase in accuracy, and a substantial reduction in bias. The proposed method was able to capture the majority of the most significant SNPs that were not identified in the analysis of the misclassified data. In fact, in one of the simulation scenarios, 33% of the influential SNPs were not identified using the misclassified data, compared with the analysis using the data without misclassification. However, using the proposed method, only 13% were not identified. Furthermore, the proposed method was able to identify with high probability a large portion of the truly misclassified observations.

Conclusions

The proposed model provides a statistical tool to correct or at least attenuate the negative effects of misclassified binary responses in GWAS. Across different levels of misclassification probability as well as odds ratios of significant SNPs, the model proved to be robust. In fact, SNP effects, and misclassification probability were accurately estimated and the truly misclassified observations were identified with high probabilities compared to non-misclassified responses. This study was limited to situations where the misclassification probability was assumed to be the same in cases and controls which is not always the case based on real human disease data. Thus, it is of interest to evaluate the performance of the proposed model in that situation which is the current focus of our research.
  相似文献   

10.
Na Cai  Wenbin Lu  Hao Helen Zhang 《Biometrics》2012,68(4):1093-1102
Summary In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject‐specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random‐effect model and assumes a time‐varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance–covariance matrix that has a closed form and can be consistently estimated by the usual plug‐in method. One additional advantage of the procedure is that it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time‐varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.  相似文献   

11.
Misclassification in binary outcomes can severely bias effect estimates of regression models when the models are naively applied to error‐prone data. Here, we discuss response misclassification in studies on the special class of bilateral diseases. Such diseases can affect neither, one, or both entities of a paired organ, for example, the eyes or ears. If measurements are available on both organ entities, disease occurrence in a person is often defined as disease occurrence in at least one entity. In this setting, there are two reasons for response misclassification: (a) ignorance of missing disease assessment in one of the two entities and (b) error‐prone disease assessment in the single entities. We investigate the consequences of ignoring both types of response misclassification and present an approach to adjust the bias from misclassification by optimizing an adequate likelihood function. The inherent modelling assumptions and problems in case of entity‐specific misclassification are discussed. This work was motivated by studies on age‐related macular degeneration (AMD), a disease that can occur separately in each eye of a person. We illustrate and discuss the proposed analysis approach based on real‐world data of a study on AMD and simulated data.  相似文献   

12.
In many clinical studies that involve follow-up, it is common to observe one or more sequences of longitudinal measurements, as well as one or more time to event outcomes. A competing risks situation arises when the probability of occurrence of one event is altered/hindered by another time to event. Recently, there has been much attention paid to the joint analysis of a single longitudinal response and a single time to event outcome, when the missing data mechanism in the longitudinal process is non-ignorable. We, in this paper, propose an extension where multiple longitudinal responses are jointly modeled with competing risks (multiple time to events). Our shared parameter joint model consists of a system of multiphase non-linear mixed effects sub-models for the multiple longitudinal responses, and a system of cause-specific non-proportional hazards frailty sub-models for competing risks, with associations among multiple longitudinal responses and competing risks modeled using latent parameters. The joint model is applied to a data set of patients who are on mechanical circulatory support and are awaiting heart transplant, using readily available software. While on the mechanical circulatory support, patient liver and renal functions may worsen and these in turn may influence one of the two possible competing outcomes: (i) death before transplant; (ii) transplant. In one application, we propose a system of multiphase cause-specific non-proportional hazard sub-model where frailty can be time varying. Performance under different scenarios was assessed using simulation studies. By using the proposed joint modeling of the multiphase sub-models, one can identify: (i) non-linear trends in multiple longitudinal outcomes; (ii) time-varying hazards and cumulative incidence functions of the competing risks; (iii) identify risk factors for the both types of outcomes, where the effect may or may not change with time; and (iv) assess the association between multiple longitudinal and competing risks outcomes, where the association may or may not change with time.  相似文献   

13.
In epidemiologic studies, subjects are often misclassified as to their level of exposure. Ignoring this misclassification error in the analysis introduces bias in the estimates of certain parameters and invalidates many hypothesis tests. For situations in which there is misclassification of exposure in a follow-up study with categorical data, we have developed a model that permits consideration of any number of exposure categories and any number of multiple-category covariates. When used with logistic and Poisson regression procedures, this model helps assess the potential for bias when misclassification is ignored. When reliable ancillary information is available, the model can be used to correct for misclassification bias in the estimates produced by these regression procedures.  相似文献   

14.
In a randomized two-group parallel trial the mean causal effect is typically estimated as the difference in means or proportions for patients receiving, say, either treatment (T) or control (C). Treatment effect heterogeneity (TEH), or unit-treatment interaction, the variability of the causal effect (defined in terms of potential outcomes) across individuals, is often ignored. Since only one of the outcomes, either Y(T) or Y(C), is observed for each unit in such studies, the TEH is not directly estimable. For convenience, it is often assumed to be minimal or zero. We are particularly interested in the 'treatment risk' for binary outcomes, that is, the proportion of individuals who would succeed on C but fail on T. Previous work has shown that the treatment risk can be bounded (Albert, Gadbury and Mascha, 2005), and that the confidence interval width around it can be narrowed using clustered or correlated data (Mascha and Albert, 2006). Without further parameter constraints, treatment risk is unidentifiable. We show, however, that the treatment risk can be directly estimated when the four underlying population counts comprising the joint distribution of the potential outcomes, Y(T) and Y(C), follow constraints consistent with the Dirichlet multinomial. We propose a test of zero treatment risk and show it to have good size and power. Methods are applied to both a randomized as well as a non-randomized study. Implications for medical decision-making at the policy and individual levels are discussed.  相似文献   

15.
In the context of analyzing multiple functional limitation responses collected longitudinally from the Longitudinal Study of Aging (LSOA), we investigate the heterogeneity of these outcomes with respect to their associations with previous functional status and other risk factors in the presence of informative drop-out and confounding by baseline outcomes. We accommodate the longitudinal nature of the multiple outcomes with a unique extension of the nested random effects logistic model with an autoregressive structure to include drop-out and baseline outcome components with shared random effects. Estimation of fixed effects and variance components is by maximum likelihood with numerical integration. This shared parameter selection model assumes that drop-out is conditionally independent of the multiple functional limitation outcomes given the underlying random effect representing an individual's trajectory of functional status across time. Whereas it is not possible to fully assess the adequacy of this assumption, we assess the robustness of this approach by varying the assumptions underlying the proposed model such as the random effects structure, the drop-out component, and omission of baseline functional outcomes as dependent variables in the model. Heterogeneity among the associations between each functional limitation outcome and a set of risk factors for functional limitation, such as previous functional limitation and physical activity, exists for the LSOA data of interest. Less heterogeneity is observed among the estimates of time-level random effects variance components that are allowed to vary across functional outcomes and time. We also note that. under an autoregressive structure, bias results from omitting the baseline outcome component linked to the follow-up outcome component by subject-level random effects.  相似文献   

16.
We study the effect of misclassification of a binary covariate on the parameters of a logistic regression model. In particular we consider 2 × 2 × 2 tables. We assume that a binary covariate is subject to misclassification that may depend on the observed outcome. This type of misclassification is known as (outcome dependent) differential misclassification. We examine the resulting asymptotic bias on the parameters of the model and derive formulas for the biases and their approximations as a function of the odds and misclassification probabilities. Conditions for unbiased estimation are also discussed. The implications are illustrated numerically using a case control study. For completeness we briefly examine the effect of covariate dependent misclassification of exposures and of outcomes.  相似文献   

17.
Cohort studies and clinical trials may involve multiple events. When occurrence of one of these events prevents the observance of another, the situation is called “competing risks”. A useful measure in such studies is the cumulative incidence of an event, which is useful in evaluating interventions or assessing disease prognosis. When outcomes in such studies are subject to misclassification, the resulting cumulative incidence estimates may be biased. In this work, we study the mechanism of bias in cumulative incidence estimation due to outcome misclassification. We show that even moderate levels of misclassification can lead to seriously biased estimates in a frequently unpredictable manner. We propose an easy to use estimator for correcting this bias that is uniformly consistent. Extensive simulations suggest that this method leads to unbiased estimates in practical settings. The proposed method is useful, both in settings where misclassification probabilities are known by historical data or can be estimated by other means, and for performing sensitivity analyses when the misclassification probabilities are not precisely known.  相似文献   

18.
To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case–control status, and this estimated case–control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.  相似文献   

19.
Background: Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. Methods: We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a “full model” that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. Results: We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used. Overall our results indicate a nonsignificantly decreased lung cancer risk due to radiotherapy among nonsmokers, and a mildly increased risk among smokers. Conclusions: We described easy to implement Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to misclassification and missing data.  相似文献   

20.
In recent decades, a large number of epidemiological studies investigating the change of prevalence of hay fever showed an increase in the occurrence of this disease. However, other studies carried out in the 1990s yielded contradictory results. Many environmental factors have been hypothesized to contribute to the increasing hay fever rate, including both indoor and ambient air pollution, reduced exposure to microbial stimulation and changes in diets. However, the observed increase has not convincingly been explained by any of these factors and there is limited evidence of changes in exposure to these risk factors over time. Additionally, recent studies show that no further increase in asthma, hay fever and atopic sensitisation in adolescents and adults has been observed during the 1990s and the beginning of the new century. As the pattern of pollen counts has changed over the years, partly due to the global warming but also as a consequence of a change in the use of land, the changing prevalence of hay fever might partly be driven by this different pollen exposure. Epidemiological data for hay fever in Switzerland are available from 1926 until 2000 (with large gaps between 1926 and 1958 and 1958 to 1986) whereas pollen data are available from 1969 until the present. This allows an investigation as to whether these data are correlated provided the same time spans are compared. It would also be feasible to correlate the pollen data with meteorological data which, however, is not the subject of our investigation. Our study focuses on analyzing time series of pollen counts and of pollen season lengths in order to identify their trends, and to ascertain whether there is a relationship between these trends and the changes in the hay fever prevalence. It is shown in this paper that the pollen exposure has been decreasing in Basel since the beginning of the 1990s whereas the rate of the hay fever prevalence in Switzerland remained approximately unchanged in this period but with a slight tendency to decrease. In Locarno, most of the pollen species also show a decreasing trend, while in Zurich, the development is somewhat different as the pollen counts of most of the pollen types have been increasing. It is interesting, however, that some of the pollen counts of this station (grass, stinging nettle, mugwort and ragweed) have been decreasing in the period 1982–2007.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号