首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
复杂疾病全基因组关联研究进展——遗传统计分析   总被引:7,自引:0,他引:7  
严卫丽 《遗传》2008,30(5):543-549
2005年, Science杂志首次报道了有关人类年龄相关性黄斑变性的全基因组关联研究, 此后有关肥胖、2型糖尿病、冠心病、阿尔茨海默病等一系列复杂疾病的全基因组关联研究被陆续报道, 这一阶段被称为人类全基因组关联研究的第一次浪潮。文章分别介绍了全基因组关联研究统计分析的方法、软件和应用实例; 比较了关联分析中多重检验的P值调整方法, 包括Bonferroni、递减的Bonferroni校正法、模拟运算法和控制错误发现率的方法; 还讨论了人群混杂对关联分析结果可能产生的影响及原理, 以及全基因组关联研究中控制人群混杂的方法的研究进展和应用实例。在全基因组关联研究的第一次浪潮中, 应用经典的遗传统计方法发现了许多基因-表型之间的关联并且能够对这些关联做出解释, 其中包括许多基因组中的未知基因和染色体区域。然而, 全基因组关联研究的继续发展需要进一步阐述基因组内基因之间相互作用、基因-基因之间的复杂作用网络与环境因素的相互作用在复杂疾病发生中的作用, 现有的统计分析方法肯定不能满足需要, 开发更为高级的统计分析方法势在必行。最后, 文章还给出了全基因组关联研究统计分析软件的相关网站信息。  相似文献   

2.
Deng HW  Chen WM  Recker RR 《Genetics》2001,157(2):885-897
In association studies searching for genes underlying complex traits, the results are often inconsistent, and population admixture has been recognized qualitatively as one major potential cause. Hardy-Weinberg equilibrium (HWE) is often employed to test for population admixture; however, its power is generally unknown. Through analytical and simulation approaches, we quantify the power of the HWE test for population admixture and the effects of population admixture on increasing the type I error rate of association studies under various scenarios of population differentiation and admixture. We found that (1) the power of the HWE test for detecting population admixture is usually small; (2) population admixture seriously elevates type I error rate for detecting genes underlying complex traits, the extent of which depends on the degrees of population differentiation and admixture; (3) HWE testing for population admixture should be performed with random samples or only with controls at the candidate genes, or the test can be performed for combined samples of cases and controls at marker loci that are not linked to the disease; (4) testing HWE for population admixture generally reduces false positive association findings of genes underlying complex traits but the effect is small; and (5) with population admixture, a linkage disequilibrium method that employs cases only is more robust and yields many fewer false positive findings than conventional case-control analyses. Therefore, unless random samples are carefully selected from one homogeneous population, admixture is always a legitimate concern for positive findings in association studies except for the analyses that deliberately control population admixture.  相似文献   

3.
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.  相似文献   

4.
Sergeev AS 《Genetika》2000,36(9):1279-1287
Distributions of age at onset are widely used in the genetic epidemiology of age-dependent diseases. Examples are estimation of recurrent risks in genetic counselling and testing genetic hypotheses in segregation and linkage analyses. In this study, morbidity parameters are defined, including age-specific morbidity rates, morbidity net risk (incidence), and cumulative incidence (population risk, an integrated measure of population susceptibility to the disease at the moment of the study). Age-specific morbidity risks are calculated from the respective morbidity rates, which are analogous to mortality rates used in demography. Population data typically used for calculation of morbidity rates are discussed. Methods of calculation of morbidity rates based on the data of single and interval epidemiological studies are described. Methods for calculating standard errors of these parameters, estimating their statistical reliability, and testing statistical hypotheses are discussed.  相似文献   

5.
Jiang N  Wang M  Jia T  Wang L  Leach L  Hackett C  Marshall D  Luo Z 《PloS one》2011,6(8):e23192

Background

It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.

Methodology

We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.

Results/Conclusions

The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.  相似文献   

6.

Background  

Large-scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Since the genetic markers may be correlated, a Bonferroni correction is typically too stringent a correction for multiple testing. Permutation testing is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association. However, permutation testing for large-scale genetic association studies is computationally demanding and calls for optimized algorithms and software. PRESTO is a new software package for genetic association studies that performs fast computation of multiple-testing adjusted P-values via permutation of the trait.  相似文献   

7.
MOTIVATION: Current methodology and software for quantitative trait loci (QTL) analyses do not use all available information and are inadequate to deal with the huge amount of QTL analyses to be needed in forecoming genetical genomics' studies. RESULTS: We show that a mixed model statistical framework provides a very flexible tool for QTL modeling in a variety of populations, be it a cross between inbred lines, a within population study, or experiments involving a mixture of populations or crosses. The software allows multitrait and multiQTL analyses, inclusion of infinitesimal genetic value and a batch multitrait option suitable for genetical genomics studies. It also allows massive association studies between single nucleotide polymorphisms and the trait(s) of interest. AVAILABILITY: A software (Qxpak), together with a manual and example files, is freely available for research purposes. So far, the compiled program is available for linux systems, the windows version will follow soon. See http://www.icrea.es/pag.asp?id=Miguel.Perez  相似文献   

8.
The HapMap project has given case-control association studies a unique opportunity to uncover the genetic basis of complex diseases. However, persistent issues in such studies remain the proper quantification of, testing for, and correction for population stratification (PS). In this paper, we present the first unified paradigm that addresses all three fundamental issues within one statistical framework. Our unified approach makes use of an omnibus quantity (delta), which can be estimated in a case-control study from suitable null loci. We show how this estimated value can be used to quantify PS, to statistically test for PS, and to correct for PS, all in the context of case-control studies. Moreover, we provide guidelines for interpreting values of delta in association studies (e.g., at alpha = 0.05, a delta of size 0.416 is small, a delta of size 0.653 is medium, and a delta of size 1.115 is large). A novel feature of our testing procedure is its ability to test for either strictly any PS or only 'practically important' PS. We also performed simulations to compare our correction procedure with Genomic Control (GC). Our results show that, unlike GC, it maintains good Type I error rates and power across all levels of PS.  相似文献   

9.
Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets.  相似文献   

10.
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these 'parental' populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.  相似文献   

11.
A key step toward the discovery of a gene related to a trait is the finding of an association between the trait and one or more haplotypes. Haplotype analyses can also provide critical information regarding the function of a gene; however, when unrelated subjects are sampled, haplotypes are often ambiguous because of unknown linkage phase of the measured sites along a chromosome. A popular method of accounting for this ambiguity in case-control studies uses a likelihood that depends on haplotype frequencies, so that the haplotype frequencies can be compared between the cases and controls; however, this traditional method is limited to a binary trait (case vs. control), and it does not provide a method of testing the statistical significance of specific haplotypes. To address these limitations, we developed new methods of testing the statistical association between haplotypes and a wide variety of traits, including binary, ordinal, and quantitative traits. Our methods allow adjustment for nongenetic covariates, which may be critical when analyzing genetically complex traits. Furthermore, our methods provide several different global tests for association, as well as haplotype-specific tests, which give a meaningful advantage in attempts to understand the roles of many different haplotypes. The statistics can be computed rapidly, making it feasible to evaluate the associations between many haplotypes and a trait. To illustrate the use of our new methods, they are applied to a study of the association of haplotypes (composed of genes from the human-leukocyte-antigen complex) with humoral immune response to measles vaccination. Limited simulations are also presented to demonstrate the validity of our methods, as well as to provide guidelines on how our methods could be used.  相似文献   

12.
BACKGROUND: Haplotype sharing statistics have been introduced in an ad-hoc way, often relying heavily on permutation testing. As a result, applying these approaches to whole genome association studies or to evaluate their properties in extensive simulation experiments is problematic. Further, permutation testing may be inappropriate in the presence of phase ambiguity and population stratification. AIMS: To present a simple framework for a class of haplotype sharing statistics useful for association mapping in case-parent trio data. This framework allows derivation of novel haplotype sharing tests as well as simple variance estimators and asymptotic distributions for haplotype sharing tests. RESULTS AND CONCLUSIONS: We validated that our approach is appropriately sized using simulated data, and illustrate the methodology by analyzing a Crohn's disease dataset. We find that haplotype-based analyses are much more powerful than single-locus analyses for these data.  相似文献   

13.
Aggressive manifestations and their consequences are a major issue of mankind, highlighting the need for understanding the contributory factors. Still, aggression-related genetic analyses have so far mainly been conducted on small population subsets such as individuals suffering from a certain psychiatric disorder or a narrow-range age cohort, but no data on the general population is yet available. In the present study, our aim was to identify polymorphisms in genes affecting neurobiological processes that might explain some of the inter-individual variation between aggression levels in the non-clinical Caucasian adult population. 55 single nucleotide polymorphisms (SNP) were simultaneously determined in 887 subjects who also filled out the self-report Buss-Perry Aggression Questionnaire (BPAQ). Single marker association analyses between genotypes and aggression scores indicated a significant role of rs7322347 located in the HTR2A gene encoding serotonin receptor 2a following Bonferroni correction for multiple testing (p = 0.0007) both for males and females. Taking the four BPAQ subscales individually, scores for Hostility, Anger and Physical Aggression showed significant association with rs7322347 T allele in themselves, while no association was found with Verbal Aggression. Of the subscales, relationship with rs7322347 was strongest in the case of Hostility, where statistical significance virtually equaled that observed with the whole BPAQ. In conclusion, this is the first study to our knowledge analyzing SNPs in a wide variety of genes in terms of aggression in a large sample-size non-clinical adult population, also describing a novel candidate polymorphism as predisposal to aggressive traits.  相似文献   

14.
Sequencing and exome-chip technologies have motivated development of novel statistical tests to identify rare genetic variation that influences complex diseases. Although many rare-variant association tests exist for case-control or cross-sectional studies, far fewer methods exist for testing association in families. This is unfortunate, because cosegregation of rare variation and disease status in families can amplify association signals for rare variants. Many researchers have begun sequencing (or genotyping via exome chips) familial samples that were either recently collected or previously collected for linkage studies. Because many linkage studies of complex diseases sampled affected sibships, we propose a strategy for association testing of rare variants for use in this study design. The logic behind our approach is that rare susceptibility variants should be found more often on regions shared identical by descent by affected sibling pairs than on regions not shared identical by descent. We propose both burden and variance-component tests of rare variation that are applicable to affected sibships of arbitrary size and that do not require genotype information from unaffected siblings or independent controls. Our approaches are robust to population stratification and produce analytic p values, thereby enabling our approach to scale easily to genome-wide studies of rare variation. We illustrate our methods by using simulated data and exome chip data from sibships ascertained for hypertension collected as part of the Genetic Epidemiology Network of Arteriopathy (GENOA) study.  相似文献   

15.
Although case-control association studies have been widely used, they are insufficient for many complex diseases, such as Alzheimer's disease and breast cancer, since these diseases may have multiple subtypes with distinct morphologies and clinical implications. Many multigroup studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI), have been undertaken by recruiting subjects based on their multiclass primary disease status, while extensive secondary outcomes have been collected. The aim of this paper is to develop a general regression framework for the analysis of secondary phenotypes collected in multigroup association studies. Our regression framework is built on a conditional model for the secondary outcome given the multigroup status and covariates and its relationship with the population regression of interest of the secondary outcome given the covariates. Then, we develop generalized estimation equations to estimate the parameters of interest. We use both simulations and a large-scale imaging genetic data analysis from the ADNI to evaluate the effect of the multigroup sampling scheme on standard genome-wide association analyses based on linear regression methods, while comparing it with our statistical methods that appropriately adjust for the multigroup sampling scheme. Data used in preparation of this article were obtained from the ADNI database.  相似文献   

16.
The central questions asked in whole-genome association studies are how to locate associated regions in the genome and how to estimate the significance of these findings. Researchers usually do this by testing each SNP separately for association and then applying a suitable correction for multiple-hypothesis testing. However, SNPs are correlated by the unobserved genealogy of the population, and a more powerful statistical methodology would attempt to take this genealogy into account. Leveraging the genealogy in association studies is challenging, however, because the inference of the genealogy from the genotypes is a computationally intensive task, in particular when recombination is modeled, as in ancestral recombination graphs. Furthermore, if large numbers of genealogies are imputed from the genotypes, the power of the study might decrease if these imputed genealogies create an additional multiple-hypothesis testing burden. Indeed, we show in this paper that several existing methods that aim to address this problem suffer either from low power or from a very high false-positive rate; their performance is generally not better than the standard approach of separate testing of SNPs. We suggest a new genealogy-based approach, CAMP (coalescent-based association mapping), that takes into account the trade-off between the complexity of the genealogy and the power lost due to the additional multiple hypotheses. Our experiments show that CAMP yields a significant increase in power relative to that of previous methods and that it can more accurately locate the associated region.  相似文献   

17.
It has been newly reported in recent studies that single-nucleotide polymorphisms (SNPs) in the first intron of the FTO gene have been associated with BMI in whites. To determine whether the gene is associated with BMI in Asians also, we performed a replication study of the association of the gene with BMI in a Korean population. Two SNPs in the FTO gene (rs1421085 and rs17817449) were genotyped using the TaqMan method in a Korean population (n = 1,733). The two SNPs were then used for an association study with BMI through statistical analyses. The rs1421085 C allele (P = 0.0015, effect size = 0.0056) and rs17817449 G allele (P = 0.0019, effect size = 0.0053) were found to be significantly associated with increased BMI. Our results suggest that FTO may be one of the worldwide obesity-risk genes.  相似文献   

18.
Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women''s Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.  相似文献   

19.
European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans.  相似文献   

20.
Case-control studies of association in structured or admixed populations   总被引:7,自引:0,他引:7  
Case-control tests for association are an important tool for mapping complex-trait genes. But population structure can invalidate this approach, leading to apparent associations at markers that are unlinked to disease loci. Family-based tests of association can avoid this problem, but such studies are often more expensive and in some cases--particularly for late-onset diseases--are impractical. In this review article we describe a series of approaches published over the past 2 years which use multilocus genotype data to enable valid case-control tests of association, even in the presence of population structure. These tests can be classified into two categories. "Genomic control" methods use the independent marker loci to adjust the distribution of a standard test statistic, while "structured association" methods infer the details of population structure en route to testing for association. We discuss the statistical issues involved in the different approaches and present results from simulations comparing the relative performance of the methods under a range of models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号