首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ito T  Inoue E  Kamatani N 《Genetics》2004,168(4):2339-2348
Analysis of the association between haplotypes and phenotypes is becoming increasingly important. We have devised an expectation-maximization (EM)-based algorithm to test the association between a phenotype and a haplotype or a haplotype set and to estimate diplotype-based penetrance using individual genotype and phenotype data from cohort studies and clinical trials. The algorithm estimates, in addition to haplotype frequencies, penetrances for subjects with a given haplotype and those without it (dominant mode). Relative risk can thus also be estimated. In the dominant mode, the maximum likelihood under the assumption of no association between the phenotype and presence of the haplotype (L(0max)) and the maximum likelihood under the assumption of association (L(max)) were calculated. The statistic -2 log(L(0max)/L(max)) was used to test the association. The present algorithm along with the analyses in recessive and genotype modes was implemented in the computer program PENHAPLO. Results of analysis of simulated data indicated that the test had considerable power under certain conditions. Analyses of two real data sets from cohort studies, one concerning the MTHFR gene and the other the NAT2 gene, revealed significant associations between the presence of haplotypes and occurrence of side effects. Our algorithm may be especially useful for analyzing data concerning the association between genetic information and individual responses to drugs.  相似文献   

2.
There have been increasing efforts to relate drug efficacy and disease predisposition with genetic polymorphisms. We present statistical tests for association of haplotype frequencies with discrete and continuous traits in samples of unrelated individuals. Haplotype frequencies are estimated through the expectation-maximization algorithm, and each individual in the sample is expanded into all possible haplotype configurations with corresponding probabilities, conditional on their genotype. A regression-based approach is then used to relate inferred haplotype probabilities to the response. The relationship of this technique to commonly used approaches developed for case-control data is discussed. We confirm the proper size of the test under H(0) and find an increase in power under the alternative by comparing test results using inferred haplotypes with single-marker tests using simulated data. More importantly, analysis of real data comprised of a dense map of single nucleotide polymorphisms spaced along a 12-cM chromosomal region allows us to confirm the utility of the haplotype approach as well as the validity and usefulness of the proposed statistical technique. The method appears to be successful in relating data from multiple, correlated markers to response.  相似文献   

3.
Chen J  Chatterjee N 《Biometrics》2006,62(1):28-35
Genetic epidemiologic studies often collect genotype data at multiple loci within a genomic region of interest from a sample of unrelated individuals. One popular method for analyzing such data is to assess whether haplotypes, i.e., the arrangements of alleles along individual chromosomes, are associated with the disease phenotype or not. For many study subjects, however, the exact haplotype configuration on the pair of homologous chromosomes cannot be derived with certainty from the available locus-specific genotype data (phase ambiguity). In this article, we consider estimating haplotype-specific association parameters in the Cox proportional hazards model, using genotype, environmental exposure, and the disease endpoint data collected from cohort or nested case-control studies. We study alternative Expectation-Maximization algorithms for estimating haplotype frequencies from cohort and nested case-control studies. Based on a hazard function of the disease derived from the observed genotype data, we then propose a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method is greatly simplified under a rare disease assumption, for which an asymptotic variance estimator is also proposed. The performance of the proposed estimators is assessed via simulation studies. An application of the proposed method is presented, using data from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study.  相似文献   

4.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

5.
MOTIVATION: With the availability of large-scale, high-density single-nucleotide polymorphism markers and information on haplotype structures and frequencies, a great challenge is how to take advantage of haplotype information in the association mapping of complex diseases in case-control studies. RESULTS: We present a novel approach for association mapping based on directly mining haplotypes (i.e. phased genotype pairs) produced from case-control data or case-parent data via a density-based clustering algorithm, which can be applied to whole-genome screens as well as candidate-gene studies in small genomic regions. The method directly explores the sharing of haplotype segments in affected individuals that are rarely present in normal individuals. The measure of sharing between two haplotypes is defined by a new similarity metric that combines the length of the shared segments and the number of common alleles around any marker position of the haplotypes, which is robust against recent mutations/genotype errors and recombination events. The effectiveness of the approach is demonstrated by using both simulated datasets and real datasets. The results show that the algorithm is accurate for different population models and for different disease models, even for genes with small effects, and it outperforms some recently developed methods.  相似文献   

6.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

7.
Inference of haplotypes is important for many genetic approaches, including the process of assigning a phenotype to a genetic region. Usually, the population frequencies of haplotypes, as well as the diplotype configuration of each subject, are estimated from a set of genotypes of the subjects in a sample from the population. We have developed an algorithm to infer haplotype frequencies and the combination of haplotype copies in each pool by using pooled DNA data. The input data are the genotypes in pooled DNA samples, each of which contains the quantitative genotype data from one to six subjects. The algorithm infers by the maximum-likelihood method both frequencies of the haplotypes in the population and the combination of haplotype copies in each pool by an expectation-maximization algorithm. The algorithm was implemented in the computer program LDPooled. We also used the bootstrap method to calculate the standard errors of the estimated haplotype frequencies. Using this program, we analyzed the published genotype data for the SAA (n=156), MTHFR (n=80), and NAT2 (n=116) genes, as well as the smoothelin gene (n=102). Our study has shown that the frequencies of major (frequency >0.1 in a population) haplotypes can be inferred rather accurately from the pooled DNA data by the maximum-likelihood method, although with some limitations. The estimated D and D' values had large variations except when the /D/ values were >0.1. The estimated linkage-disequilibrium measure rho2 for 36 linked loci of the smoothelin gene when one- and two-subject pool protocols were used suggested that the gross pattern of the distribution of the measure can be reproduced using the two-subject pool data.  相似文献   

8.
Rohde K  Fürst R 《Human heredity》2003,56(1-3):41-47
In order to find association of genetic traits to some haplotypes from closely spaced multilocus phase-unknown genotypes we use a three-stage approach. Haplotype frequencies and the most likely haplotype pair for each individual are estimated from random samples of individual or small (nuclear) family genotypes via an EM algorithm. If the most likely haplotype pair configuration of the whole sample outweighs the less likely ones, we may consider the estimated haplotypes as alleles of a multi-allelic marker and carry out the conventional statistics, TDT or ANOVA for quantitative traits. If the most likely haplotype pair configuration and the less likely ones do not differ much in their weight, we sample the TDT or ANOVA statistic over all haplotype pair configurations using the Metropolis-Hastings algorithm. Applications of our method to simulated data as well as real data are given.  相似文献   

9.
Haplotype-based risk models can lead to powerful methods for detecting the association of a disease with a genomic region of interest. In population-based studies of unrelated individuals, however, the haplotype status of some subjects may not be discernible without ambiguity from available locus-specific genotype data. A score test for detecting haplotype-based association using genotype data has been developed in the context of generalized linear models for analysis of data from cross-sectional and retrospective studies. In this article, we develop a test for association using genotype data from cohort and nested case-control studies where subjects are prospectively followed until disease incidence or censoring (end of follow-up) occurs. Assuming a proportional hazard model for the haplotype effects, we derive an induced hazard function of the disease given the genotype data, and hence propose a test statistic based on the associated partial likelihood. The proposed test procedure can account for differential follow-up of subjects, can adjust for possibly time-dependent environmental co-factors and can make efficient use of valuable age-at-onset information that is available on cases. We provide an algorithm for computing the test statistic using readily available statistical software. Utilizing simulated data in the context of two genomic regions GPX1 and GPX3, we evaluate the validity of the proposed test for small sample sizes and study its power in the presence and absence of missing genotype data.  相似文献   

10.
Tan Q  Christiansen L  Bathum L  Li S  Kruse TA  Christensen K 《Genetics》2006,172(3):1821-1828
Although the case-control or the cross-sectional design has been popular in genetic association studies of human longevity, such a design is prone to false positive results due to sampling bias and a potential secular trend in gene-environment interactions. To avoid these problems, the cohort or follow-up study design has been recommended. With the observed individual survival information, the Cox regression model has been used for single-locus data analysis. In this article, we present a novel survival analysis model that combines population survival with individual genotype and phenotype information in assessing the genetic association with human longevity in cohort studies. By monitoring the changes in the observed genotype frequencies over the follow-up period in a birth cohort, we are able to assess the effects of the genotypes and/or haplotypes on individual survival. With the estimated parameters, genotype- and/or haplotype-specific survival and hazard functions can be calculated without any parametric assumption on the survival distribution. In addition, our model estimates haplotype frequencies in a birth cohort over the follow-up time, which is not observable in the multilocus genotype data. A computer simulation study was conducted to specifically assess the performance and power of our haplotype-based approach for given risk and frequency parameters under different sample sizes. Application of our method to paraoxonase 1 genotype data detected a haplotype that significantly reduces carriers' hazard of death and thus reveals and stresses the important role of genetic variation in maintaining human survival at advanced ages.  相似文献   

11.
The analysis of the haplotype-phenotype relationship has become more and more important. We have developed an algorithm, using individual genotypes at linked loci as well as their quantitative phenotypes, to estimate the parameters of the distribution of the phenotypes for subjects with and without a particular haplotype by an expectation-maximization (EM) algorithm. We assumed that the phenotype for a diplotype configuration follows a normal distribution. The algorithm simultaneously calculates the maximum likelihood (L0max) under the null hypothesis (i.e., nonassociation between the haplotype and phenotype), and the maximum likelihood (Lmax) under the alternative hypothesis (i.e., association between the haplotype and phenotype). Then we tested the association between the haplotype and the phenotype using a test statistic, -2 log(L0max/Lmax). The above algorithm along with some extensions for different modes of inheritance was implemented as a computer program, QTLHAPLO. Simulation studies using single-nucleotide polymorphism (SNP) genotypes have clarified that the estimation was very accurate when the linkage disequilibrium between linked loci was rather high. Empirical power using the simulated data was high enough. We applied QTLHAPLO for the analysis of the real data of the genotypes at the calpain 10 gene obtained from diabetic and control subjects in various laboratories.  相似文献   

12.
Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes   总被引:4,自引:0,他引:4  
As more SNP marker data becomes available, researchers have used haplotypes of markers, rather than individual polymorphisms, for association analysis of candidate genes. In order to perform haplotype analysis in a population-based case-control study, haplotypes must be determined by estimation in the absence of family information or laboratory methods for establishing phase. Here, we test the accuracy of the Expectation-Maximization (EM) algorithm for estimating haplotype state and frequency in the CCR2-CCR5 gene region by comparison with haplotype state and frequency determined by pedigree analysis. To do this, we have characterized haplotypes comprising alleles at seven biallelic loci in the CCR2-CCR5 chemokine receptor gene region, a span of 20 kb on chromosome 3p21. Three-generation CEPH families (n=40), totaling 489 individuals, were genotyped by the 5'nuclease assay (TaqMan). Haplotype states and frequencies were compared in 103 grandparents who were assumed to have mated at random. Both pedigree analysis and the EM algorithm yielded the same small number of haplotypes for which linkage disequilibrium was nearly maximal. The haplotype frequencies generated by the two methods were nearly identical. These results suggest that the EM algorithm estimation of haplotype states, frequency, and linkage disequilibrium analysis will be an effective strategy in the CCR2-CCR5 gene region. For genetic epidemiology studies, CCR2-CCR5 allele and haplotype frequencies were determined in African-American (n=30), Hispanic (n=24) and European-American (n=34) populations.  相似文献   

13.
We describe an approach for picking haplotype-tagging single nucleotide polymorphisms (htSNPs) that is presently being taken in two large nested case-control studies within a multiethnic cohort (MEC), which are engaged in a search for associations between risk of prostate and breast cancer and common genetic variations in candidate genes. Based on a preliminary sample of 70 control subjects chosen at random from each of the 5 ethnic groups in the MEC we estimate haplotype frequencies using a variant of the Excoffier-Slatkin E-M algorithm after genotyping a high density of SNPs selected every 3-5 kb in and surrounding a candidate gene. In order to evaluate the performance of a candidate set of htSNPS (which will be genotyped in the much larger case-control sample) we treat the haplotype frequencies estimate above as known, and carry out a formal calculation of the uncertainty of the number of copies of common haplotypes carried by an individual, summarizing this calculation as a coefficient of determination, R2h. A candidate set of htSNPS of a given size is chosen so as to maximize the minimum value of R2h over the common haplotypes, h.  相似文献   

14.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

15.
To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03.  相似文献   

16.
Cohort studies provide information on relative hazards and pure risks of disease. For rare outcomes, large cohorts are needed to have sufficient numbers of events, making it costly to obtain covariate information on all cohort members. We focus on nested case-control designs that are used to estimate relative hazard in the Cox regression model. In 1997, Langholz and Borgan showed that pure risk can also be estimated from nested case-control data. However, these approaches do not take advantage of some covariates that may be available on all cohort members. Researchers have used weight calibration to increase the efficiency of relative hazard estimates from case-cohort studies and nested cased-control studies. Our objective is to extend weight calibration approaches to nested case-control designs to improve precision of estimates of relative hazards and pure risks. We show that calibrating sample weights additionally against follow-up times multiplied by relative hazards during the risk projection period improves estimates of pure risk. Efficiency improvements for relative hazards for variables that are available on the entire cohort also contribute to improved efficiency for pure risks. We develop explicit variance formulas for the weight-calibrated estimates. Simulations show how much precision is improved by calibration and confirm the validity of inference based on asymptotic normality. Examples are provided using data from the American Association of Retired Persons Diet and Health Cohort Study.  相似文献   

17.
Genetic epidemiologic studies often involve investigation of the association of a disease with a genomic region in terms of the underlying haplotypes, that is the combination of alleles at multiple loci along homologous chromosomes. In this article, we consider the problem of estimating haplotype-environment interactions from case-control studies when some of the environmental exposures themselves may be influenced by genetic susceptibility. We specify the distribution of the diplotypes (haplotype pair) given environmental exposures for the underlying population based on a novel semiparametric model that allows haplotypes to be potentially related with environmental exposures, while allowing the marginal distribution of the diplotypes to maintain certain population genetics constraints such as Hardy-Weinberg equilibrium. The marginal distribution of the environmental exposures is allowed to remain completely nonparametric. We develop a semiparametric estimating equation methodology and related asymptotic theory for estimation of the disease odds ratios associated with the haplotypes, environmental exposures, and their interactions, parameters that characterize haplotype-environment associations and the marginal haplotype frequencies. The problem of phase ambiguity of genotype data is handled using a suitable expectation-maximization algorithm. We study the finite-sample performance of the proposed methodology using simulated data. An application of the methodology is illustrated using a case-control study of colorectal adenoma, designed to investigate how the smoking-related risk of colorectal adenoma can be modified by "NAT2," a smoking-metabolism gene that may potentially influence susceptibility to smoking itself.  相似文献   

18.
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.  相似文献   

19.
We applied a new approach based on Mantel statistics to analyze the Genetic Analysis Workshop 14 simulated data with prior knowledge of the answers. The method was developed in order to improve the power of a haplotype sharing analysis for gene mapping in complex disease. The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes from case-control studies. The genetic similarity is measured as the shared length between haplotype pairs around a genetic marker. The phenotypic similarity is measured as the mean corrected cross-product based on the respective phenotypes. Cases with phenotype P1 and unrelated controls were drawn from the population of Danacaa. Power to detect main effects was compared to the X2-test for association based on 3-marker haplotypes and a global permutation test for haplotype association to test for main effects. Power to detect gene x gene interaction was compared to unconditional logistic regression. The results suggest that the Mantel statistics might be more powerful than alternative tests.  相似文献   

20.
We have developed a simulation tool HapSim for the generation of haplotype data. The simulated haplotypes are such that their allele frequencies and linkage disequilibrium coefficients match exactly those estimated in a real sample. AVAILABILITY: The program is available as an R package and can be downloaded from http://cran.r-project.org/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号