首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Tan Q  Christiansen L  Bathum L  Li S  Kruse TA  Christensen K 《Genetics》2006,172(3):1821-1828
Although the case-control or the cross-sectional design has been popular in genetic association studies of human longevity, such a design is prone to false positive results due to sampling bias and a potential secular trend in gene-environment interactions. To avoid these problems, the cohort or follow-up study design has been recommended. With the observed individual survival information, the Cox regression model has been used for single-locus data analysis. In this article, we present a novel survival analysis model that combines population survival with individual genotype and phenotype information in assessing the genetic association with human longevity in cohort studies. By monitoring the changes in the observed genotype frequencies over the follow-up period in a birth cohort, we are able to assess the effects of the genotypes and/or haplotypes on individual survival. With the estimated parameters, genotype- and/or haplotype-specific survival and hazard functions can be calculated without any parametric assumption on the survival distribution. In addition, our model estimates haplotype frequencies in a birth cohort over the follow-up time, which is not observable in the multilocus genotype data. A computer simulation study was conducted to specifically assess the performance and power of our haplotype-based approach for given risk and frequency parameters under different sample sizes. Application of our method to paraoxonase 1 genotype data detected a haplotype that significantly reduces carriers' hazard of death and thus reveals and stresses the important role of genetic variation in maintaining human survival at advanced ages.  相似文献   

2.
The general availability of reliable and affordable genotyping technology has enabled genetic association studies to move beyond small case-control studies to large prospective studies. For prospective studies, genetic information can be integrated into the analysis via haplotypes, with focus on their association with a censored survival outcome. We develop non-iterative, regression-based methods to estimate associations between common haplotypes and a censored survival outcome in large cohort studies. Our non-iterative methods--weighted estimation and weighted haplotype combination--are both based on the Cox regression model, but differ in how the imputed haplotypes are integrated into the model. Our approaches enable haplotype imputation to be performed once as a simple data-processing step, and thus avoid implementation based on sophisticated algorithms that iterate between haplotype imputation and risk estimation. We show that non-iterative weighted estimation and weighted haplotype combination provide valid tests for genetic associations and reliable estimates of moderate associations between common haplotypes and a censored survival outcome, and are straightforward to implement in standard statistical software. We apply the methods to an analysis of HSPB7-CLCNKA haplotypes and risk of adverse outcomes in a prospective cohort study of outpatients with chronic heart failure.  相似文献   

3.
Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype - disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case - control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.  相似文献   

4.
5.
OBJECTIVE: To develop a method to estimate haplotype effects on dichotomous outcomes when phase is unknown, that can also estimate reliable effects of rare haplotypes. METHODS: In short, the method uses a logistic regression approach, with weights attached to all possible haplotype combinations of an individual. An EM-algorithm was used: in the E-step the weights are estimated, and the M-step consists of maximizing the joint log-likelihood. When rare haplotypes were present, a penalty function was introduced. We compared four different penalties. To investigate statistical properties of our method, we performed a simulation study for different scenarios. The evaluation criteria are the mean bias of the parameter estimates, the root of the mean squared error, the coverage probability, power, Type I error rate and the false discovery rate. RESULTS: For the unpenalized approach, mean bias was small, coverage probabilities were approximately 95%, power ranged from 15.2 to 44.7% depending on haplotype frequency, and Type I error rate was around 5%. All penalty functions reduced the standard errors of the rare haplotypes, but introduced bias. This trade-off decreased power. CONCLUSION: The unpenalized weighted log-likelihood approach performs well. A penalty function can help to estimate an effect for rare haplotypes.  相似文献   

6.
Haplotype-based association analysis has been recognized as a tool with high resolution and potentially great power for identifying modest etiological effects of genes. However, in practice, its efficacy has not been as successfully reproduced as expected in theory. One primary cause is that such analysis tends to require a large number of parameters to capture the abundant haplotype varieties, and many of those are expended on rare haplotypes for which studies would have insufficient power to detect association even if it existed. To concentrate statistical power on more-relevant inferences, in this study, we developed a regression-based approach using clustered haplotypes to assess haplotype-phenotype association. Specifically, we generalized the probabilistic clustering methods of Tzeng to the generalized linear model (GLM) framework established by Schaid et al. The proposed method uses unphased genotypes and incorporates both phase uncertainty and clustering uncertainty. Its GLM framework allows adjustment of covariates and can model qualitative and quantitative traits. It can also evaluate the overall haplotype association or the individual haplotype effects. We applied the proposed approach to study the association between hypertriglyceridemia and the apolipoprotein A5 gene. Through simulation studies, we assessed the performance of the proposed approach and demonstrate its validity and power in testing for haplotype-trait association.  相似文献   

7.
J P Klein 《Biometrics》1992,48(3):795-806
Consider a survival experiment where individuals within a certain subset of the population share a common, unobservable, random frailty. Such a frailty could be an unobservable genetic or early environmental effect if individuals were in sibling groups or an environmental effect if individuals were grouped by households. Suppose that if the frailty, omega, is known, the Cox proportional hazards model for the observable covariates is valid with the consequence of the random effect being a multiplicative factor on the hazard rate. Assuming tht the random frailties follow a gamma distribution, estimates of the fixed and random effects are obtained by using an EM algorithm based on a profile likelihood construction. The method developed is applied to the Framingham Heart Study to examine the risks of smoking and cholesterol levels, adjusting for potential random effects.  相似文献   

8.
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.  相似文献   

9.
OBJECTIVES: The question of interest is estimating the relationship between haplotypes and an outcome measure, based upon unphased genotypes. The outcome of interest might be predicting the presence of disease in a logistic model, predicting a numeric drug response in a linear model, or predicting survival time in a parametric survival model with censoring. Explanatory variables may include phased haplotype design variables, environmental variables, or interactions between them. METHODS: We extend existing generalized linear haplotype models to parametric survival outcomes. To improve the stability of model variance estimates, a profile likelihood solution is proposed. An adjustment for population stratification is also considered. Here we investigate data sampled from known 'strata' (e.g., gender or ethnicity) that influence haplotype prior probabilities and thus the regression model weights. Differing linear model variance estimates, and the effect of stratification and departures from Hardy-Weinberg Equilibrium (HWE) on parameter estimates, are compared and contrasted via simulation. RESULTS: From simulations, we observed an improvement in statistical power when using a solution to profile likelihood equations. We also saw that stratification had little impact on estimates. Haplotypes that are not in HWE had a negative impact on power to test hypotheses. Finally, profile likelihood solutions for haplotypes deviating from HWE had improved power and confidence interval coverage of regression model coefficients.  相似文献   

10.
Two dinucleotide short tandem-repeat polymorphisms (STRPs) and a polymorphic Alu element spanning a 22-kb region of the PLAT locus on chromosome 8p12-q11.2 were typed in 1,287-1,420 individuals originating from 30 geographically diverse human populations, as well as in 29 great apes. These data were analyzed as haplotypes consisting of each of the dinucleotide repeats and the flanking Alu insertion/deletion polymorphism. The global pattern of STRP/Alu haplotype variation and linkage disequilibrium (LD) is informative for the reconstruction of human evolutionary history. Sub-Saharan African populations have high levels of haplotype diversity within and between populations, relative to non-Africans, and have highly divergent patterns of LD. Non-African populations have both a subset of the haplotype diversity present in Africa and a distinct pattern of LD. The pattern of haplotype variation and LD observed at the PLAT locus suggests a recent common ancestry of non-African populations, from a small population originating in eastern Africa. These data indicate that, throughout much of modern human history, sub-Saharan Africa has maintained both a large effective population size and a high level of population substructure. Additionally, Papua New Guinean and Micronesian populations have rare haplotypes observed otherwise only in African populations, suggesting ancient gene flow from Africa into Papua New Guinea, as well as gene flow between Melanesian and Micronesian populations.  相似文献   

11.
The human dopaminergic system is a significant focal point of study in the fields of neuropsychiatry and pharmacology, plus it is also a promising nuclear DNA marker in studies of human genome diversity. In this study, we assayed six polymorphic markers in the dopamine D2 receptor gene (DRD2) in 482 unrelated individuals from nine ethnic populations of India. Our results demonstrate that the six markers are highly polymorphic in all populations and the constructed haplotypes show a high level of heterozygosity. Out of the eight possible three-site haplotypes, all populations commonly shared only three haplotypes. The haplotypes exhibited fairly high frequencies across multiple populations; Kurumba population showed all eight three-site haplotypes. The ancestral haplotype (B2-D2-Al) was observed at high frequency only in the Siddi population. Haplotypes based on all six markers revealed 16 haplotypes, out of which only 6 are most common with a frequency of greater than 5% in at least one of the nine populations. But only three haplotypes were shared by all nine populations with the cumulative frequency ranging from 80.8% (Kurumba) to 96.6% (Onge). Great variation in levels of linkage disequilibrium (LD) was detected, ranging from complete LD in the Badaga to virtually no LD in the Siddi. This range of LD likely reflects different population histories, such as African ancestry in the Siddi and recent founding events in the population isolates, Badaga and Kota.  相似文献   

12.
Over the last decade, multiparental populations have become a mainstay of genetics research in diploid species. Our goal was to extend this paradigm to autotetraploids by developing software for quantitative trait locus (QTL) mapping in connected F1 populations derived from a set of shared parents. For QTL discovery, phenotypes are regressed on the dosage of parental haplotypes to estimate additive effects. Statistical properties of the model were explored by simulating half-diallel diploid and tetraploid populations with different population sizes and numbers of parents. Across scenarios, the number of progeny per parental haplotype (pph) largely determined the statistical power for QTL detection and accuracy of the estimated haplotype effects. Multiallelic QTL with heritability 0.2 were detected with 90% probability at 25 pph and genome-wide significance level 0.05, and the additive haplotype effects were estimated with over 90% accuracy. Following QTL discovery, the software enables a comparison of models with multiple QTL and nonadditive effects. To illustrate, we analyzed potato tuber shape in a half-diallel population with three tetraploid parents. A well-known QTL on chromosome 10 was detected, for which the inclusion of digenic dominance lowered the Deviance Information Criterion (DIC) by 17 points compared to the additive model. The final model also contained a minor QTL on chromosome 1, but higher-order dominance and epistatic effects were excluded based on the DIC. In terms of practical impacts, the software is already being used to select offspring based on the effect and dosage of particular haplotypes in breeding programs.  相似文献   

13.
Recently, there has been a great deal of interest in the analysis of multivariate survival data. In most epidemiological studies, survival times of the same cluster are related because of some unobserved risk factors such as the environmental or genetic factors. Therefore, modelling of dependence between events of correlated individuals is required to ensure a correct inference on the effects of treatments or covariates on the survival times. In the past decades, extension of proportional hazards model has been widely considered for modelling multivariate survival data by incorporating a random effect which acts multiplicatively on the hazard function. In this article, we consider the proportional odds model, which is an alternative to the proportional hazards model at which the hazard ratio between individuals converges to unity eventually. This is a reasonable property particularly when the treatment effect fades out gradually and the homogeneity of the population increases over time. The objective of this paper is to assess the influence of the random effect on the within‐subject correlation and the population heterogeneity. We are particularly interested in the properties of the proportional odds model with univariate random effect and correlated random effect. The correlations between survival times are derived explicitly for both choices of mixing distributions and are shown to be independent of the covariates. The time path of the odds function among the survivors are also examined to study the effect of the choice of mixing distribution. Modelling multivariate survival data using a univariate mixing distribution may be inadequate as the random effect not only characterises the dependence of the survival times, but also the conditional heterogeneity among the survivors. A robust estimate for the correlation of the logarithm of the survival times within a cluster is obtained disregarding the choice of the mixing distributions. The sensitivity of the estimate of the regression parameter under a misspecification of the mixing distribution is studied through simulation. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

14.
The power variance function distributions, which include the gamma and compound Poisson (CP) distributions among others, are commonly used in frailty models for family data. In a previous paper, we presented a frailty model constructed by randomizing the scale parameter in a CP distribution. When combined with a parametric baseline hazard, this yields a model with heterogeneity on both the individual and the family level and a subgroup with zero frailty, corresponding to people not experiencing the event. In this paper, we discuss covariates in the model. Depending on where the covariates are inserted in the model, one may have proportional hazards at the individual level, the family level, and a larger group level (for covariates shared by many families, e.g. ethnic groups) or get accelerated failure times. Each of these alternatives gives a specific interpretation of the covariate effects. An application to data infant mortality in siblings from the Medical Birth Registry of Norway is included. We compare the results for some of the different covariate modeling options.  相似文献   

15.
Abstract The impact of the ongoing rapid climate change on natural systems is a major issue for human societies. An important challenge for ecologists is to identify the climatic factors that drive temporal variation in demographic parameters, and, ultimately, the dynamics of natural populations. The analysis of long-term monitoring data at the individual scale is often the only available approach to estimate reliably demographic parameters of vertebrate populations. We review statistical procedures used in these analyses to study links between climatic factors and survival variation in vertebrate populations. We evaluated the efficiency of various statistical procedures from an analysis of survival in a population of white stork, Ciconia ciconia, a simulation study and a critical review of 78 papers published in the ecological literature. We identified six potential methodological problems: (i) the use of statistical models that are not well-suited to the analysis of long-term monitoring data collected at the individual scale; (ii) low ratios of number of statistical units to number of candidate climatic covariates; (iii) collinearity among candidate climatic covariates; (iv) the use of statistics, to assess statistical support for climatic covariates effects, that deal poorly with unexplained variation in survival; (v) spurious detection of effects due to the co-occurrence of trends in survival and the climatic covariate time series; and (vi) assessment of the magnitude of climatic effects on survival using measures that cannot be compared across case studies. The critical review of the ecological literature revealed that five of these six methodological problems were often poorly tackled. As a consequence we concluded that many of these studies generated hypotheses but only few provided solid evidence for impacts of climatic factors on survival or reliable measures of the magnitude of such impacts. We provide practical advice to solve efficiently most of the methodological problems identified. The only frequent issue that still lacks a straightforward solution was the low ratio of the number of statistical units to the number of candidate climatic covariates. In the perspective of increasing this ratio and therefore of producing more robust analyses of the links between climate and demography, we suggest leads to improve the procedures for designing field protocols and selecting a set of candidate climatic covariates. Finally, we present recent statistical methods with potential interest for assessing the impact of climatic factors on demographic parameters.  相似文献   

16.
A key question for the implementation of marker-assisted selection (MAS) using markers in linkage disequilibrium with quantitative trait loci (QTLs) is how many markers surrounding each QTL should be used to ensure the marker or marker haplotypes are in sufficient linkage disequilibrium (LD) with the QTL. In this paper we compare the accuracy of MAS using either single markers or marker haplotypes in an Angus cattle data set consisting of 9323 genome-wide single nucleotide polymorphisms (SNPs) genotyped in 379 Angus cattle. The extent of LD in the data set was such that the average marker-marker r2 was 0.2 at 200 kb. The accuracy of MAS increased as the number of markers in the haplotype surrounding the QTL increased, although only when the number of markers in the haplotype was 4 or greater did the accuracy exceed that achieved when the SNP in the highest LD with the QTL was used. A large number of phenotypic records (>1000) were required to accurately estimate the effects of the haplotypes.  相似文献   

17.
Xiong M  Fan R  Jin L 《Human heredity》2002,53(3):158-172
As a dense map of single nucleotide polymorphism (SNP) markers are available, population-based linkage disequilibrium (LD) mapping or association study is becoming one of the major tools for identifying quantitative trait loci (QTL) and for fine gene mapping. However, in many cases, LD between the marker and trait locus is not very strong. Approaches that maximize the potential of detecting LD will be essential for the success of LD mapping of QTL. In this paper, we propose two strategies for increasing the probability of detecting LD: (1) phenotypic selection and (2) haplotype LD mapping. To provide the foundations for LD mapping of QTL under selection, we develop analytic tools for assessing the impact of phenotypic selection on allele and haplotype frequencies, and LD under three trait models: single trait locus, two unlinked trait loci, and two linked trait loci with or without epistasis. In addition to a traditional chi(2) test, which compares the difference in allele or haplotype frequencies in the selected sample and population sample, we present multiple regression methods for LD mapping of QTL, and investigate which methods are effective in employing phenotypic selection for QTL mapping. We also develop a statistical framework for investigating and comparing the power of the single marker and multilocus haplotype test for LD mapping of QTL. Finally, the proposed methods are applied to mapping QTL influencing variation in systolic blood pressure in an isolated Chinese population.  相似文献   

18.
Meuwissen TH  Goddard ME 《Genetics》2007,176(4):2551-2560
A novel multipoint method, based on an approximate coalescence approach, to analyze multiple linked markers is presented. Unlike other approximate coalescence methods, it considers all markers simultaneously but only two haplotypes at a time. We demonstrate the use of this method for linkage disequilibrium (LD) mapping of QTL and estimation of effective population size. The method estimates identity-by-descent (IBD) probabilities between pairs of marker haplotypes. Both LD and combined linkage and LD mapping rely on such IBD probabilities. The method is approximate in that it considers only the information on a pair of haplotypes, whereas a full modeling of the coalescence process would simultaneously consider all haplotypes. However, full coalescence modeling is computationally feasible only for few linked markers. Using simulations of the coalescence process, the method is shown to give almost unbiased estimates of the effective population size. Compared to direct marker and haplotype association analyses, IBD-based QTL mapping showed clearly a higher power to detect a QTL and a more realistic confidence interval for its position. The modeling of LD could be extended to estimate other LD-related parameters such as recombination rates.  相似文献   

19.

Background

Accurately modeling LD in simulations is essential to correctly evaluate new and existing association methods. At present, there has been minimal research comparing the quality of existing gene region simulation methods to produce LD structures similar to an existing gene region. Here we compare the ability of three approaches to accurately simulate the LD within a gene region: HapSim (2005), Hapgen (2009), and a minor extension to simple haplotype resampling.

Methodology/Principal Findings

In order to observe the variation and bias for each method, we compare the simulated pairwise LD measures and minor allele frequencies to the original HapMap data in an extensive simulation study. When possible, we also evaluate the effects of changing parameters.HapSim produces samples of haplotypes with lower LD, on average, compared to the original haplotype set while both our resampling method and Hapgen do not introduce this bias. The variation introduced across the replicates by our resampling method is quite small and may not provide enough sampling variability to make a generalizable simulation study.

Conclusion

We recommend using Hapgen to simulate replicate haplotypes from a gene region. Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.  相似文献   

20.
We present a novel approach to disease-gene mapping via cladistic analysis of single-nucleotide polymorphism (SNP) haplotypes obtained from large-scale, population-based association studies, applicable to whole-genome screens, candidate-gene studies, or fine-scale mapping. Clades of haplotypes are tested for association with disease, exploiting the expected similarity of chromosomes with recent shared ancestry in the region flanking the disease gene. The method is developed in a logistic-regression framework and can easily incorporate covariates such as environmental risk factors or additional unlinked loci to allow for population structure. To evaluate the power of this approach to detect disease-marker association, we have developed a simulation algorithm to generate high-density SNP data with short-range linkage disequilibrium based on empirical patterns of haplotype diversity. The results of the simulation study highlight substantial gains in power over single-locus tests for a wide range of disease models, despite overcorrection for multiple testing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号