首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Zhang H  Zheng G  Li Z 《Biometrics》2006,62(4):1124-1131
Using unphased genotype data, we studied statistical inference for association between a disease and a haplotype in matched case-control studies. Statistical inference for haplotype data is complicated due to ambiguity of genotype phases. An estimating equation-based method is developed for estimating odds ratios and testing disease-haplotype association. The method potentially can also be applied to testing haplotype-environment interaction. Simulation studies show that the proposed method has good performance. The performance of the method in the presence of departures from Hardy-Weinberg equilibrium is also studied.  相似文献   

2.
Chen J  Chatterjee N 《Biometrics》2006,62(1):28-35
Genetic epidemiologic studies often collect genotype data at multiple loci within a genomic region of interest from a sample of unrelated individuals. One popular method for analyzing such data is to assess whether haplotypes, i.e., the arrangements of alleles along individual chromosomes, are associated with the disease phenotype or not. For many study subjects, however, the exact haplotype configuration on the pair of homologous chromosomes cannot be derived with certainty from the available locus-specific genotype data (phase ambiguity). In this article, we consider estimating haplotype-specific association parameters in the Cox proportional hazards model, using genotype, environmental exposure, and the disease endpoint data collected from cohort or nested case-control studies. We study alternative Expectation-Maximization algorithms for estimating haplotype frequencies from cohort and nested case-control studies. Based on a hazard function of the disease derived from the observed genotype data, we then propose a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method is greatly simplified under a rare disease assumption, for which an asymptotic variance estimator is also proposed. The performance of the proposed estimators is assessed via simulation studies. An application of the proposed method is presented, using data from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study.  相似文献   

3.
Shih JH  Chatterjee N 《Biometrics》2002,58(3):502-509
In case-control family studies with survival endpoint, age of onset of diseases can be used to assess the familial aggregation of the disease and the relationship between the disease and genetic or environmental risk factors. Because of the retrospective nature of the case--control study, methods for analyzing prospectively collected correlated failure time data do not apply directly. In this article, we propose a semiparametric quasi-partial-likelihood approach to simultaneously estimate the effect of covariates on the age of onset and the association of ages of onset among family members that does not require specification of the baseline marginal distribution. We conducted a simulation study to evaluate the performance of the proposed approach and compare it with the existing semiparametric ones. Simulation results demonstrate that the proposed approach has better performance in terms of consistency and efficiency. We illustrate the methodology using a subset of data from the Washington Ashkenazi Study.  相似文献   

4.
Mukherjee B  Zhang L  Ghosh M  Sinha S 《Biometrics》2007,63(3):834-844
In case-control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence in order to derive more efficient estimation techniques than the traditional logistic regression analysis (Chatterjee and Carroll, 2005, Biometrika92, 399-418). However, covariates that stratify the population, such as age, ethnicity and alike, could potentially lead to nonindependence. In this article, we provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence in the control population. We illustrate the methods by applying them to data from a population-based case-control study on ovarian cancer conducted in Israel. A simulation study is conducted to compare our method with other popular choices. The results reflect that the semiparametric Bayesian model allows incorporation of key scientific evidence in the form of a prior and offers a flexible, robust alternative when standard parametric model assumptions do not hold.  相似文献   

5.
In the study of complex traits, the utility of linkage analysis and single marker association tests can be limited for researchers attempting to elucidate the complex interplay between a gene and environmental covariates. For these purposes, tests of gene-environment interactions are needed. In addition, recent studies have indicated that haplotypes, which are specific combinations of nucleotides on the same chromosome, may be more suitable as the unit of analysis for statistical tests than single genetic markers. The difficulty with this approach is that, in standard laboratory genotyping, haplotypes are often not directly observable. Instead, unphased marker phenotypes are collected. In this article, we present a method for estimating and testing haplotype-environment interactions when linkage phase is potentially ambiguous. The method builds on the work of Schaid et al. [2002] and is applicable to any trait that can be placed in the generalized linear model framework. Simulations were run to illustrate the salient features of the method. In addition, the method was used to test for haplotype-smoking exposure interaction with data from the Childhood Asthma Management Program.  相似文献   

6.
It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika 92, 399-418) developed an efficient retrospective maximum-likelihood method for analysis of case-control studies that exploits an assumption of gene-environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology 29, 108-127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case-control study of the association between calcium intake and the risk of colorectal adenoma.  相似文献   

7.
To detect the role of a candidate gene for a trait in a sample of individuals, we may test SNP haplotype or diplotype effects. For a limited sample size, many haplotype or diplotype categories may contain few individuals. This involves a power decrease when testing the association between the trait and the haplotypes or diplotypes as these categories provide little additional information while increasing the degrees of freedom. The present paper proposes a new strategy to group rare categories based on a measure of similarity between haplotypes or diplotypes and compares it to two other possible strategies to deal with rare categories: a SNP selection strategy based on haplotype diversity, and a grouping strategy that pools all rare categories into a single baseline group. This comparison is performed by means of simulation under four scenarios. We show that this new strategy shows the largest increase in power irrespective of the model underlying the candidate gene in the studied trait. This strategy therefore provides a powerful alternative to currently used methods to reduce the number of rare categories.  相似文献   

8.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

9.
The problem of estimating haplotype frequencies from population data has been considered by numerous investigators, resulting in a wide variety of possible algorithmic and statistical solutions. We propose a relatively unique approach that employs an artificial neural network (ANN) to predict the most likely haplotype frequencies from a sample of population genotype data. Through an innovative ANN design for mapping genotype patterns to diplotypes, we have produced a prototype that demonstrates the feasibility of this approach, with provisional results that correlate well with estimates produced by the expectation maximization algorithm for haplotype frequency estimation. Given the computational demands of estimating haplotype frequencies for 20 or more single-nucleotide polymorphisms, the ANN approach is promising because its design fits well with parallel computing architectures.  相似文献   

10.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

11.
A new method for haplotype inference including full-sib information   总被引:1,自引:0,他引:1       下载免费PDF全文
Ding XD  Simianer H  Zhang Q 《Genetics》2007,177(3):1929-1940
Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families, can be an alternative strategy in determining linkage phase and estimating haplotype frequencies. In the case of no possibility to obtain genotypes for parents, and only full-sib information being used, a new approach is suggested to infer phase and to reconstruct haplotypes. We present a maximum-likelihood method via an expectation-maximization algorithm, called FSHAP, using only full-sib information when parent information is not available. FSHAP can deal with families with an arbitrary number of children, and missing parents or missing genotypes can be handled as well. In a simulation study we compare FSHAP with another existing expectation-maximization (EM)-based approach (FAMHAP), the conditioning approach implemented in FBAT and GENEHUNTER, which is only pedigree based and assumes linkage equilibrium. In most situations, FSHAP has the smallest discrepancy of haplotype frequency estimation and the lowest error rate in haplotype reconstruction, only in some cases FAMHAP yields comparable results. GENEHUNTER produces the largest discrepancy, and FBAT produces the highest error rate in offspring in most situations. Among the methods compared, FSHAP has the highest accuracy in reconstructing the diplotypes of the unavailable parents. Potential limitations of the method, e.g., in analyzing very large haplotypes, are indicated and possible solutions are discussed.  相似文献   

12.
OBJECTIVES: The question of interest is estimating the relationship between haplotypes and an outcome measure, based upon unphased genotypes. The outcome of interest might be predicting the presence of disease in a logistic model, predicting a numeric drug response in a linear model, or predicting survival time in a parametric survival model with censoring. Explanatory variables may include phased haplotype design variables, environmental variables, or interactions between them. METHODS: We extend existing generalized linear haplotype models to parametric survival outcomes. To improve the stability of model variance estimates, a profile likelihood solution is proposed. An adjustment for population stratification is also considered. Here we investigate data sampled from known 'strata' (e.g., gender or ethnicity) that influence haplotype prior probabilities and thus the regression model weights. Differing linear model variance estimates, and the effect of stratification and departures from Hardy-Weinberg Equilibrium (HWE) on parameter estimates, are compared and contrasted via simulation. RESULTS: From simulations, we observed an improvement in statistical power when using a solution to profile likelihood equations. We also saw that stratification had little impact on estimates. Haplotypes that are not in HWE had a negative impact on power to test hypotheses. Finally, profile likelihood solutions for haplotypes deviating from HWE had improved power and confidence interval coverage of regression model coefficients.  相似文献   

13.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

14.
Summary Genetic association studies often investigate the effect of haplotypes on an outcome of interest. Haplotypes are not observed directly, and this complicates the inclusion of such effects in survival models. We describe a new estimating equations approach for Cox's regression model to assess haplotype effects for survival data. These estimating equations are simple to implement and avoid the use of the EM algorithm, which may be slow in the context of the semiparametric Cox model with incomplete covariate information. These estimating equations also lead to easily computable, direct estimators of standard errors, and thus overcome some of the difficulty in obtaining variance estimators based on the EM algorithm in this setting. We also develop an easily implemented goodness‐of‐fit procedure for Cox's regression model including haplotype effects. Finally, we apply the procedures presented in this article to investigate possible haplotype effects of the PAF‐receptor on cardiovascular events in patients with coronary artery disease, and compare our results to those based on the EM algorithm.  相似文献   

15.
《朊病毒》2013,7(6):449-462
ABSTRACT

The sequence of the prion protein gene (PRNP) affects susceptibility to spongiform encephalopathies, or prion diseases in many species. In white-tailed deer, both coding and non-coding single nucleotide polymorphisms have been identified in this gene that correlate to chronic wasting disease (CWD) susceptibility. Previous studies examined individual nucleotide or amino acid mutations; here we examine all nucleotide polymorphisms and their combined effects on CWD. A 626 bp region of PRNP was examined from 703 free-ranging white-tailed deer. Deer were sampled between 2002 and 2010 by hunter harvest or government culling in Illinois and Wisconsin. Fourteen variable nucleotide positions were identified (4 new and 10 previously reported). We identified 68 diplotypes comprised of 24 predicted haplotypes, with the most common diplotype occurring in 123 individuals. Diplotypes that were found exclusively among positive or negative animals were rare, each occurring in less than 1% of the deer studied. Only one haplotype (C, odds ratio 0.240) and 2 diplotypes (AC and BC, odds ratios of 0.161 and 0.108 respectively) has significant associations with CWD resistance. Each contains mutations (one synonymous nucleotide 555C/T and one nonsynonymous nucleotide 286G/A) at positions reported to be significantly associated with reduced CWD susceptibility. Results suggest that deer populations with higher frequencies of haplotype C or diplotypes AC and BC might have a reduced risk for CWD infection – while populations with lower frequencies may have higher risk for infection. Understanding the genetic basis of CWD has improved our ability to assess herd susceptibility and direct management efforts within CWD infected areas.  相似文献   

16.
Targeted maximum likelihood estimation is a versatile tool for estimating parameters in semiparametric and nonparametric models. We work through an example applying targeted maximum likelihood methodology to estimate the parameter of a marginal structural model. In the case we consider, we show how this can be easily done by clever use of standard statistical software. We point out differences between targeted maximum likelihood estimation and other approaches (including estimating function based methods). The application we consider is to estimate the effect of adherence to antiretroviral medications on virologic failure in HIV positive individuals.  相似文献   

17.
MOTIVATION: With the availability of large-scale, high-density single-nucleotide polymorphism markers and information on haplotype structures and frequencies, a great challenge is how to take advantage of haplotype information in the association mapping of complex diseases in case-control studies. RESULTS: We present a novel approach for association mapping based on directly mining haplotypes (i.e. phased genotype pairs) produced from case-control data or case-parent data via a density-based clustering algorithm, which can be applied to whole-genome screens as well as candidate-gene studies in small genomic regions. The method directly explores the sharing of haplotype segments in affected individuals that are rarely present in normal individuals. The measure of sharing between two haplotypes is defined by a new similarity metric that combines the length of the shared segments and the number of common alleles around any marker position of the haplotypes, which is robust against recent mutations/genotype errors and recombination events. The effectiveness of the approach is demonstrated by using both simulated datasets and real datasets. The results show that the algorithm is accurate for different population models and for different disease models, even for genes with small effects, and it outperforms some recently developed methods.  相似文献   

18.
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype–haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.  相似文献   

19.
This paper proposes a semiparametric methodology for modeling multivariate and conditional distributions. We first build a multivariate distribution whose dependence structure is induced by a Gaussian copula and whose marginal distributions are estimated nonparametrically via mixtures of B‐spline densities. The conditional distribution of a given variable is obtained in closed form from this multivariate distribution. We take a Bayesian approach, using Markov chain Monte Carlo methods for inference. We study the frequentist properties of the proposed methodology via simulation and apply the method to estimation of conditional densities of summary statistics, used for computing conditional local false discovery rates, from genetic association studies of schizophrenia and cardiovascular disease risk factors.  相似文献   

20.
An underlying complex genetic susceptibility exists in multiple sclerosis (MS), and an association with the HLA-DRB1*1501-DQB1*0602 haplotype has been repeatedly demonstrated in high-risk (northern European) populations. It is unknown whether the effect is explained by the HLA-DRB1 or the HLA-DQB1 gene within the susceptibility haplotype, which are in strong linkage disequilibrium (LD). African populations are characterized by greater haplotypic diversity and distinct patterns of LD compared with northern Europeans. To better localize the HLA gene responsible for MS susceptibility, case-control and family-based association studies were performed for DRB1 and DQB1 loci in a large and well-characterized African American data set. A selective association with HLA-DRB1*15 was revealed, indicating a primary role for the DRB1 locus in MS independent of DQB1*0602. This finding is unlikely to be solely explained by admixture, since a substantial proportion of the susceptibility chromosomes from African American patients with MS displayed haplotypes consistent with an African origin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号