首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

2.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

3.
Guolo A 《Biometrics》2008,64(4):1207-1214
SUMMARY: We investigate the use of prospective likelihood methods to analyze retrospective case-control data where some of the covariates are measured with error. We show that prospective methods can be applied and the case-control sampling scheme can be ignored if one adequately models the distribution of the error-prone covariates in the case-control sampling scheme. Indeed, subject to this, the prospective likelihood methods result in consistent estimates and information standard errors are asymptotically correct. However, the distribution of such covariates is not the same in the population and under case-control sampling, dictating the need to model the distribution flexibly. In this article, we illustrate the general principle by modeling the distribution of the continuous error-prone covariates using the skewnormal distribution. The performance of the method is evaluated through simulation studies, which show satisfactory results in terms of bias and coverage. Finally, the method is applied to the analysis of two data sets which refer, respectively, to a cholesterol study and a study on breast cancer.  相似文献   

4.
Joint analysis of recurrent and nonrecurrent terminal events has attracted substantial attention in literature. However, there lacks formal methodology for such analysis when the event time data are on discrete scales, even though some modeling and inference strategies have been developed for discrete-time survival analysis. We propose a discrete-time joint modeling approach for the analysis of recurrent and terminal events where the two types of events may be correlated with each other. The proposed joint modeling assumes a shared frailty to account for the dependence among recurrent events and between the recurrent and the terminal terminal events. Also, the joint modeling allows for time-dependent covariates and rich families of transformation models for the recurrent and terminal events. A major advantage of our approach is that it does not assume a distribution for the frailty, nor does it assume a Poisson process for the analysis of the recurrent event. The utility of the proposed analysis is illustrated by simulation studies and two real applications, where the application to the biochemists' rank promotion data jointly analyzes the biochemists' citation numbers and times to rank promotion, and the application to the scleroderma lung study data jointly analyzes the adverse events and off-drug time among patients with the symptomatic scleroderma-related interstitial lung disease.  相似文献   

5.
Shih JH  Chatterjee N 《Biometrics》2002,58(3):502-509
In case-control family studies with survival endpoint, age of onset of diseases can be used to assess the familial aggregation of the disease and the relationship between the disease and genetic or environmental risk factors. Because of the retrospective nature of the case--control study, methods for analyzing prospectively collected correlated failure time data do not apply directly. In this article, we propose a semiparametric quasi-partial-likelihood approach to simultaneously estimate the effect of covariates on the age of onset and the association of ages of onset among family members that does not require specification of the baseline marginal distribution. We conducted a simulation study to evaluate the performance of the proposed approach and compare it with the existing semiparametric ones. Simulation results demonstrate that the proposed approach has better performance in terms of consistency and efficiency. We illustrate the methodology using a subset of data from the Washington Ashkenazi Study.  相似文献   

6.
It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika 92, 399-418) developed an efficient retrospective maximum-likelihood method for analysis of case-control studies that exploits an assumption of gene-environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology 29, 108-127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case-control study of the association between calcium intake and the risk of colorectal adenoma.  相似文献   

7.
Many case-control tests of rare variation are implemented in statistical frameworks that make correction for confounders like population stratification difficult. Simple permutation of disease status is unacceptable for resolving this issue because the replicate data sets do not have the same confounding as the original data set. These limitations make it difficult to apply rare-variant tests to samples in which confounding most likely exists, e.g., samples collected from admixed populations. To enable the use of such rare-variant methods in structured samples, as well as to facilitate permutation tests for any situation in which case-control tests require adjustment for confounding covariates, we propose to establish the significance of a rare-variant test via a modified permutation procedure. Our procedure uses Fisher's noncentral hypergeometric distribution to generate permuted data sets with the same structure present in the actual data set such that inference is valid in the presence of confounding factors. We use simulated sequence data based on coalescent models to show that our permutation strategy corrects for confounding due to population stratification that, if ignored, would otherwise inflate the size of a rare-variant test. We further illustrate the approach by using sequence data from the Dallas Heart Study of energy metabolism traits. Researchers can implement our permutation approach by using the R package BiasedUrn.  相似文献   

8.
Median regression with censored cost data   总被引:2,自引:0,他引:2  
Bang H  Tsiatis AA 《Biometrics》2002,58(3):643-649
Because of the skewness of the distribution of medical costs, we consider modeling the median as well as other quantiles when establishing regression relationships to covariates. In many applications, the medical cost data are also right censored. In this article, we propose semiparametric procedures for estimating the parameters in median regression models based on weighted estimating equations when censoring is present. Numerical studies are conducted to show that our estimators perform well with small samples and the resulting inference is reliable in circumstances of practical importance. The methods are applied to a dataset for medical costs of patients with colorectal cancer.  相似文献   

9.
In genetic association testing, failure to properly control for population structure can lead to severely inflated type 1 error and power loss. Meanwhile, adjustment for relevant covariates is often desirable and sometimes necessary to protect against spurious association and to improve power. Many recent methods to account for population structure and covariates are based on linear mixed models (LMMs), which are primarily designed for quantitative traits. For binary traits, however, LMM is a misspecified model and can lead to deteriorated performance. We propose CARAT, a binary-trait association testing approach based on a mixed-effects quasi-likelihood framework, which exploits the dichotomous nature of the trait and achieves computational efficiency through estimating equations. We show in simulation studies that CARAT consistently outperforms existing methods and maintains high power in a wide range of population structure settings and trait models. Furthermore, CARAT is based on a retrospective approach, which is robust to misspecification of the phenotype model. We apply our approach to a genome-wide analysis of Crohn disease, in which we replicate association with 17 previously identified regions. Moreover, our analysis on 5p13.1, an extensively reported region of association, shows evidence for the presence of multiple independent association signals in the region. This example shows how CARAT can leverage known disease risk factors to shed light on the genetic architecture of complex traits.  相似文献   

10.
A hierarchical modeling framework for multiple observer transect surveys   总被引:1,自引:0,他引:1  
PB Conn  JL Laake  DS Johnson 《PloS one》2012,7(8):e42294
Ecologists often use multiple observer transect surveys to census animal populations. In addition to animal counts, these surveys produce sequences of detections and non-detections for each observer. When combined with additional data (i.e. covariates such as distance from the transect line), these sequences provide the additional information to estimate absolute abundance when detectability on the transect line is less than one. Although existing analysis approaches for such data have proven extremely useful, they have some limitations. For instance, it is difficult to extrapolate from observed areas to unobserved areas unless a rigorous sampling design is adhered to; it is also difficult to share information across spatial and temporal domains or to accommodate habitat-abundance relationships. In this paper, we introduce a hierarchical modeling framework for multiple observer line transects that removes these limitations. In particular, abundance intensities can be modeled as a function of habitat covariates, making it easier to extrapolate to unsampled areas. Our approach relies on a complete data representation of the state space, where unobserved animals and their covariates are modeled using a reversible jump Markov chain Monte Carlo algorithm. Observer detections are modeled via a bivariate normal distribution on the probit scale, with dependence induced by a distance-dependent correlation parameter. We illustrate performance of our approach with simulated data and on a known population of golf tees. In both cases, we show that our hierarchical modeling approach yields accurate inference about abundance and related parameters. In addition, we obtain accurate inference about population-level covariates (e.g. group size). We recommend that ecologists consider using hierarchical models when analyzing multiple-observer transect data, especially when it is difficult to rigorously follow pre-specified sampling designs. We provide a new R package, hierarchicalDS, to facilitate the building and fitting of these models.  相似文献   

11.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

12.
In a typical case-control study, exposure information is collected at a single time point for the cases and controls. However, case-control studies are often embedded in existing cohort studies containing a wealth of longitudinal exposure history about the participants. Recent medical studies have indicated that incorporating past exposure history, or a constructed summary measure of cumulative exposure derived from the past exposure history, when available, may lead to more precise and clinically meaningful estimates of the disease risk. In this article, we propose a flexible Bayesian semiparametric approach to model the longitudinal exposure profiles of the cases and controls and then use measures of cumulative exposure based on a weighted integral of this trajectory in the final disease risk model. The estimation is done via a joint likelihood. In the construction of the cumulative exposure summary, we introduce an influence function, a smooth function of time to characterize the association pattern of the exposure profile on the disease status with different time windows potentially having differential influence/weights. This enables us to analyze how the present disease status of a subject is influenced by his/her past exposure history conditional on the current ones. The joint likelihood formulation allows us to properly account for uncertainties associated with both stages of the estimation process in an integrated manner. Analysis is carried out in a hierarchical Bayesian framework using reversible jump Markov chain Monte Carlo algorithms. The proposed methodology is motivated by, and applied to a case-control study of prostate cancer where longitudinal biomarker information is available for the cases and controls.  相似文献   

13.
T R Fears  C C Brown 《Biometrics》1986,42(4):955-960
There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.  相似文献   

14.
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.  相似文献   

15.
For patients on dialysis, hospitalizations remain a major risk factor for mortality and morbidity. We use data from a large national database, United States Renal Data System, to model time-varying effects of hospitalization risk factors as functions of time since initiation of dialysis. To account for the three-level hierarchical structure in the data where hospitalizations are nested in patients and patients are nested in dialysis facilities, we propose a multilevel mixed effects varying coefficient model (MME-VCM) where multilevel (patient- and facility-level) random effects are used to model the dependence structure of the data. The proposed MME-VCM also includes multilevel covariates, where baseline demographics and comorbidities are among the patient-level factors, and staffing composition and facility size are among the facility-level risk factors. To address the challenge of high-dimensional integrals due to the hierarchical structure of the random effects, we propose a novel two-step approximate EM algorithm based on the fully exponential Laplace approximation. Inference for the varying coefficient functions and variance components is achieved via derivation of the standard errors using score contributions. The finite sample performance of the proposed estimation procedure is studied through simulations.  相似文献   

16.
This article discusses the statistical analysis of panel count data when the underlying recurrent event process and observation process may be correlated. For the recurrent event process, we propose a new class of semiparametric mean models that allows for the interaction between the observation history and covariates. For inference on the model parameters, a monotone spline‐based least squares estimation approach is developed, and the resulting estimators are consistent and asymptotically normal. In particular, our new approach does not rely on the model specification of the observation process. The proposed inference procedure performs well through simulation studies, and it is illustrated by the analysis of bladder tumor data.  相似文献   

17.
Results from experiments studying different factors determining invasibility (e.g. land use, disturbance, biotic interactions) at different spatial scales are mainly used in isolation, probably because a methodology for integration is lacking. Recent studies show that factors affecting invasibility most likely do so in a hierarchical manner, with different factors acting more strongly at different spatial scales. Climate can be considered the dominant factor at the continental scale, while at regional and landscape scale topography, land cover and land use become increasingly important. At smaller spatial scales, soil type, disturbance, biotic interactions, resources, and microclimate may become significant. In the current paper, we propose a hierarchical framework for combining results from different types of studies. In this hierarchical system, factors operating at a smaller scale are subordinate to factors operating at a larger scale, but if conditions at higher levels are satisfied, the small-scale factors may become indispensable for making accurate predictions. Depending on the aim of the study, the accuracy of prediction can be selected by the researcher, which in its turn determines which data are required. We discuss several applications of the framework and indicate some options for future research. Although the complexity of natural systems presents fundamental limits to predictions, we think this framework can provide a useful tool for the identification of areas of risk for biological invasions, for improving our understanding of invasibility, and for identifying gaps in our current knowledge.  相似文献   

18.
Mixture modeling is a popular approach to accommodate overdispersion, skewness, and multimodality features that are very common for health care utilization data. However, mixture modeling tends to rely on subjective judgment regarding the appropriate number of mixture components or some hypothesis about how to cluster the data. In this work, we adopt a nonparametric, variational Bayesian approach to allow the model to select the number of components while estimating their parameters. Our model allows for a probabilistic classification of observations into clusters and simultaneous estimation of a Gaussian regression model within each cluster. When we apply this approach to data on patients with interstitial lung disease, we find distinct subgroups of patients with differences in means and variances of health care costs, health and treatment covariates, and relationships between covariates and costs. The subgroups identified are readily interpretable, suggesting that this nonparametric variational approach to inference can discover valid insights into the factors driving treatment costs. Moreover, the learning algorithm we employed is very fast and scalable, which should make the technique accessible for a broad range of applications.  相似文献   

19.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.  相似文献   

20.
Studies on HIV dynamics in AIDS research are very important in understanding the pathogenesis of HIV‐1 infection and also in assessing the effectiveness of antiretroviral (ARV) treatment. Viral dynamic models can be formulated through a system of nonlinear ordinary differential equations (ODE), but there has been only limited development of statistical methodologies for inference. This article, motivated by an AIDS clinical study, discusses a hierarchical Bayesian nonlinear mixed‐effects modeling approach to dynamic ODE models without a closed‐form solution. In this model, we fully integrate viral load, medication adherence, drug resistance, pharmacokinetics, baseline covariates and time‐dependent drug efficacy into the data analysis for characterizing long‐term virologic responses. Our method is implemented by a data set from an AIDS clinical study. The results suggest that modeling HIV dynamics and virologic responses with consideration of time‐varying clinical factors as well as baseline characteristics may be important for HIV/AIDS studies in providing quantitative guidance to better understand the virologic responses to ARV treatment and to help the evaluation of clinical trial design in response to existing therapies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号