首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
OBJECTIVES: Genetic association studies are usually based upon restricted sets of 'tag' markers selected to represent the total sequence variation. Tag selection is often determined by some threshold for the r(2) coefficients of linkage disequilibrium (LD) between tag and untyped markers, it being widely assumed that power to detect an effect at the untyped sites is retained by typing the tag marker in a sample scaled by the inverse of the selected threshold (1/r(2)). However, unless only a single causal variant occurs at a locus, it has been shown [Eur J Hum Genet 2006;14:426-437] that significant power loss can occur if this principle is applied. We sought to investigate whether unexpected loss of power might be an exceptional case or more general concern. In the absence of detailed knowledge about the genetic architecture at complex disease loci, we developed a mathematical approach to test all possible situations. METHODS: We derived mathematical formulae allowing the calculation of all possible odds ratios (OR) at a tag marker locus given the effect size that would be observed by typing a second locus and the r(2) between the two loci. For a range of allele frequencies, r(2) between loci, and strengths of association at the causal locus (OR from 0.5 to 2) that we consider realistic for complex disease loci, we next determined the sample sizes that would be necessary to give equivalent power to detect association by genotyping tag and causal loci and compared these with the sample sizes predicted by applying 1/r(2). RESULTS: Under most of the hypothetical scenarios we examined, the calculated sample sizes required to maintain power by typing markers that tag the causal locus at even moderately high r(2) (0.8) were greater than that calculated by applying 1/r(2). Even in populations with apparently similar measurements of allele frequency, LD structure, and effect size at the susceptibility allele, the required sample size to detect association with a tag marker can vary substantially. We also show that in apparently similar populations, associations to either allele at the tag site are possible. CONCLUSIONS: Indirect tests of association are less powered than sizes predicted by applying 1/r(2) in the majority of hypothetical scenarios we examined. Our findings pertain even for what we consider likely to be larger than average effect sizes in complex diseases (OR = 1.5-2) and even for moderately high r(2) values between the markers. Until a substantial number of disease genes have been identified through methods that are not based on tagging, and therefore biased towards those situations most favourable to tagging, it is impossible to know how the true scenarios are distributed across the range of possible scenarios. Nevertheless, while association designs based upon tag marker selection by necessity are the tool of choice for de novo gene discovery, our data suggest power to initially detect association may often be less than assumed. Moreover, our data suggest that to avoid genuine findings being subsequently discarded by unpredictable losses of power, follow up studies in other samples should be based upon more detailed analyses of the gene rather than simply on the tag SNPs showing association in the discovery study.  相似文献   

2.
Genome-wide association (GWA) analyses have generally been used to detect individual loci contributing to the phenotypic diversity in a population by the effects of these loci on the trait mean. More rarely, loci have also been detected based on variance differences between genotypes. Several hypotheses have been proposed to explain the possible genetic mechanisms leading to such variance signals. However, little is known about what causes these signals, or whether this genetic variance-heterogeneity reflects mechanisms of importance in natural populations. Previously, we identified a variance-heterogeneity GWA (vGWA) signal for leaf molybdenum concentrations in Arabidopsis thaliana. Here, fine-mapping of this association reveals that the vGWA emerges from the effects of three independent genetic polymorphisms that all are in strong LD with the markers displaying the genetic variance-heterogeneity. By revealing the genetic architecture underlying this vGWA signal, we uncovered the molecular source of a significant amount of hidden additive genetic variation or “missing heritability”. Two of the three polymorphisms underlying the genetic variance-heterogeneity are promoter variants for Molybdate transporter 1 (MOT1), and the third a variant located ~25 kb downstream of this gene. A fourth independent association was also detected ~600 kb upstream of MOT1. Use of a T-DNA knockout allele highlights Copper Transporter 6; COPT6 (AT2G26975) as a strong candidate gene for this association. Our results show that an extended LD across a complex locus including multiple functional alleles can lead to a variance-heterogeneity between genotypes in natural populations. Further, they provide novel insights into the genetic regulation of ion homeostasis in A. thaliana, and empirically confirm that variance-heterogeneity based GWA methods are a valuable tool to detect novel associations of biological importance in natural populations.  相似文献   

3.
4.
Attention-deficit/hyperactivity disorder (ADHD) has an estimated prevalence of 3-5% in adults. Genome-wide association (GWA) studies have not been performed in adults with ADHD and studies in children have so far been inconclusive, possibly because of the small sample sizes. Larger GWA studies have been performed on bipolar disorder (BD) and BD symptoms, and several potential risk genes have been reported. ADHD and BD share many clinical features and comorbidity between these two disorders is common. We therefore wanted to examine whether the reported BD genetic variants in CACNA1C, ANK3, MYO5B, TSPAN8 and ZNF804A loci are associated with ADHD or with scores on the Mood Disorder Questionnaire (MDQ), a commonly used screening instrument for bipolar spectrum disorders. We studied 561 adult Norwegian ADHD patients and 711 controls from the general population. No significant associations or trends were found between any of the single nucleotide polymorphisms (SNPs) studied and ADHD [odds ratios (ORs) ≤ 1.05]. However, a weak association was found between rs1344706 in ZNF804A (OR = 1.25; P = 0.05) and MDQ. In conclusion, it seems unlikely that these six SNPs with strong evidence of association in BD GWA studies are shared risk variants between ADHD and BD.  相似文献   

5.

Background

Obesity is a major health problem. Although heritability is substantial, genetic mechanisms predisposing to obesity are not very well understood. We have performed a genome wide association study (GWA) for early onset (extreme) obesity.

Methodology/Principal Findings

a) GWA (Genome-Wide Human SNP Array 5.0 comprising 440,794 single nucleotide polymorphisms) for early onset extreme obesity based on 487 extremely obese young German individuals and 442 healthy lean German controls; b) confirmatory analyses on 644 independent families with at least one obese offspring and both parents. We aimed to identify and subsequently confirm the 15 SNPs (minor allele frequency ≥10%) with the lowest p-values of the GWA by four genetic models: additive, recessive, dominant and allelic. Six single nucleotide polymorphisms (SNPs) in FTO (fat mass and obesity associated gene) within one linkage disequilibrium (LD) block including the GWA SNP rendering the lowest p-value (rs1121980; log-additive model: nominal p = 1.13×10−7, corrected p = 0.0494; odds ratio (OR)CT 1.67, 95% confidence interval (CI) 1.22–2.27; ORTT 2.76, 95% CI 1.88–4.03) belonged to the 15 SNPs showing the strongest evidence for association with obesity. For confirmation we genotyped 11 of these in the 644 independent families (of the six FTO SNPs we chose only two representing the LD bock). For both FTO SNPs the initial association was confirmed (both Bonferroni corrected p<0.01). However, none of the nine non-FTO SNPs revealed significant transmission disequilibrium.

Conclusions/Significance

Our GWA for extreme early onset obesity substantiates that variation in FTO strongly contributes to early onset obesity. This is a further proof of concept for GWA to detect genes relevant for highly complex phenotypes. We concurrently show that nine additional SNPs with initially low p-values in the GWA were not confirmed in our family study, thus suggesting that of the best 15 SNPs in the GWA only the FTO SNPs represent true positive findings.  相似文献   

6.
An integrated haplotype map of the human major histocompatibility complex   总被引:26,自引:0,他引:26  
Numerous studies have clearly indicated a role for the major histocompatibility complex (MHC) in susceptibility to autoimmune diseases. Such studies have focused on the genetic variation of a small number of classical human-leukocyte-antigen (HLA) genes in the region. Although these genes represent good candidates, given their immunological roles, linkage disequilibrium (LD) surrounding these genes has made it difficult to rule out neighboring genes, many with immune function, as influencing disease susceptibility. It is likely that a comprehensive analysis of the patterns of LD and variation, by using a high-density map of single-nucleotide polymorphisms (SNPs), would enable a greater understanding of the nature of the observed associations, as well as lead to the identification of causal variation. We present herein an initial analysis of this region, using 201 SNPs, nine classical HLA loci, two TAP genes, and 18 microsatellites. This analysis suggests that LD and variation in the MHC, aside from the classical HLA loci, are essentially no different from those in the rest of the genome. Furthermore, these data show that multi-SNP haplotypes will likely be a valuable means for refining association signals in this region.  相似文献   

7.
Ahn MJ  Won HH  Lee J  Lee ST  Sun JM  Park YH  Ahn JS  Kwon OJ  Kim H  Shim YM  Kim J  Kim K  Kim YH  Park JY  Kim JW  Park K 《Human genetics》2012,131(3):365-372
The proportion of never smoker non-small cell lung cancer (NSCLC) in Asia is about 30-40%. Despite the striking demographics and high prevalence of never smoker NSCLC, the exact causes still remain undetermined. Although several genome wide association (GWA) studies were conducted to find susceptibility loci for lung cancer in never smokers, no regions were replicated except for 5p15.33, suggesting locus heterogeneity and different environmental toxic effects. To identify genetic loci associated with susceptibility of lung cancer in never smokers, we performed a GWA analysis using the Affymetrix 6.0 SNP array. For discovery GWA set, we recruited 446 never smoking Korean patients with NSCLC and 497 normal subjects. We tested association of SNPs with lung cancer susceptibility using the Cochran-Armitage trend test. For validation, 39 SNPs were selected from the top 50 SNPs and five additional SNPs were selected in the DAB1 gene region which showed significant associations in the GWA analysis. The validation SNPs were genotyped in an independent sample including 434 patients and 1,000 controls. Among the 44 validation SNPs, two SNPs (rs11080466 and rs11663246) near the APCDD1, NAPG and FAM38B genes in the 18p11.22 region were replicated. P value of rs11080466 was 1.08 × 10(-6) in the combined sets (2.68 × 10(-5) in the discovery set and 2.60 × 10(-3) in the validation set) and odds ratio was 0.68 (0.58-0.79). We observed similar association for rs11663246. Our result suggests the 18p11.22 region as a novel lung cancer susceptibility locus in never smokers.  相似文献   

8.
An eggplant (Solanum melongena) association panel of 191 accessions, comprising a mixture of breeding lines, old varieties and landrace selections was SNP genotyped and phenotyped for key breeding fruit and plant traits at two locations over two seasons. A genome-wide association (GWA) analysis was performed using the mixed linear model, which takes into account both a kinship matrix and the sub-population membership of the accessions. Overall, 194 phenotype/genotype associations were uncovered, relating to 30 of the 33 measured traits. These associations involved 79 SNP loci mapping to 39 distinct chromosomal regions distributed over all 12 eggplant chromosomes. A comparison of the map positions of these SNPs with those of loci derived from conventional linkage mapping showed that GWA analysis both validated many of the known controlling loci and detected a large number of new marker/trait associations. Exploiting established syntenic relationships between eggplant chromosomes and those of tomato and pepper recognized orthologous regions in ten eggplant chromosomes harbouring genes influencing breeders’ traits.  相似文献   

9.
Linkage disequilibrium (LD) mapping has been applied to many simple, monogenic, overtly Mendelian human traits, with great success. However, extensions and applications of LD mapping approaches to more complex human quantitative traits have not been straightforward. In this article, we consider the analysis of biallelic DNA marker loci and human quantitative trait loci in settings that involve sampling individuals from opposite ends of the trait distribution. The purpose of this sampling strategy is to enrich samples for individuals likely to possess (and not possess) trait-influencing alleles. Simple statistical models for detecting LD between a trait-influencing allele and neighboring marker alleles are derived that make use of this sampling scheme. The power of the proposed method is investigated analytically for some hypothetical gene-effect scenarios. Our studies indicate that LD mapping of loci influencing human quantitative trait variation should be possible in certain settings. Finally, we consider possible extensions of the proposed methods, as well as areas for further consideration and improvement.  相似文献   

10.
Genome-wide association (GWA) studies usually detect common genetic variants with low-to-medium effect sizes. Many contributing variants are not revealed, since they fail to reach significance after strong correction for multiple comparisons. The WTCCC study for hypertension, for example, failed to identify genome-wide significant associations. We hypothesized that genetic variation in genes expressed specifically in the endothelium may be important for hypertension development. Results from the WTCCC study were combined with previously published gene expression data from mice to specifically investigate SNPs located within endothelial-specific genes, bypassing the requirement for genome-wide significance. Six SNPs from the WTCCC study were selected for independent replication in 5205 hypertensive patients and 5320 population-based controls, and successively in a cohort of 16537 individuals. A common variant (rs10860812) in the DRAM (damage-regulated autophagy modulator) locus showed association with hypertension (P = 0.008) in the replication study. The minor allele (A) had a protective effect (OR = 0.93; 95% CI 0.88–0.98 per A-allele), which replicates the association in the WTCCC GWA study. However, a second follow-up, in the larger cohort, failed to reveal an association with blood pressure. We further tested the endothelial-specific genes for co-localization with a panel of newly discovered SNPs from large meta-GWAS on hypertension or blood pressure. There was no significant overlap between those genes and hypertension or blood pressure loci. The result does not support the hypothesis that genetic variation in genes expressed in endothelium plays an important role for hypertension development. Moreover, the discordant association of rs10860812 with blood pressure in the case control study versus the larger Malmö Preventive Project–study highlights the importance of rigorous replication in multiple large independent studies.  相似文献   

11.
The prevalence of obesity in children and adults in the United States has increased dramatically over the past decade. Besides environmental factors, genetic factors are known to play an important role in the pathogenesis of obesity. A number of genetic determinants of adult BMI have already been established through genome‐wide association (GWA) studies. In this study, we examined 25 single‐nucleotide polymorphisms (SNPs) corresponding to 13 previously reported genomic loci in 6,078 children with measures of BMI. Fifteen of these SNPs yielded at least nominally significant association to BMI, representing nine different loci including INSIG2, FTO, MC4R, TMEM18, GNPDA2, NEGR1, BDNF, KCTD15, and 1q25. Other loci revealed no evidence for association, namely at MTCH2, SH2B1, 12q13, and 3q27. For the 15 associated variants, the genotype score explained 1.12% of the total variation for BMI z‐score. We conclude that among 13 loci that have been reported to associate with adult BMI, at least nine also contribute to the determination of BMI in childhood as demonstrated by their associations in our pediatric cohort.  相似文献   

12.
Variability in cystic fibrosis (CF) lung disease is partially due to non-CFTR genetic modifiers. Mucin genes are very polymorphic, and mucins play a key role in the pathogenesis of CF lung disease; therefore, mucin genes are strong candidates as genetic modifiers. DNA from CF patients recruited for extremes of lung phenotype was analyzed by Southern blot or PCR to define variable number tandem repeat (VNTR) length polymorphisms for MUC1, MUC2, MUC5AC, and MUC7. VNTR length polymorphisms were tested for association with lung disease severity and for linkage disequilibrium (LD) with flanking single nucleotide polymorphisms (SNPs). No strong associations were found for MUC1, MUC2, or MUC7. A significant association was found between the overall distribution of MUC5AC VNTR length and CF lung disease severity (p = 0.025; n = 468 patients); plus, there was robust association of the specific 6.4 kb HinfI VNTR fragment with severity of lung disease (p = 6.2×10(-4) after Bonferroni correction). There was strong LD between MUC5AC VNTR length modes and flanking SNPs. The severity-associated 6.4 kb VNTR allele of MUC5AC was confirmed to be genetically distinct from the 6.3 kb allele, as it showed significantly stronger association with nearby SNPs. These data provide detailed respiratory mucin gene VNTR allele distributions in CF patients. Our data also show a novel link between the MUC5AC 6.4 kb VNTR allele and severity of CF lung disease. The LD pattern with surrounding SNPs suggests that the 6.4 kb allele contains, or is linked to, important functional genetic variation.  相似文献   

13.
The Ethiopian plateau hosts thousands of durum wheat (Triticum turgidum subsp. durum) farmer varieties (FV) with high adaptability and breeding potential. To harness their unique allelic diversity, we produced a large nested association mapping (NAM) population intercrossing fifty Ethiopian FVs with an international elite durum wheat variety (Asassa). The Ethiopian NAM population (EtNAM) is composed of fifty interconnected bi‐parental families, totalling 6280 recombinant inbred lines (RILs) that represent both a powerful quantitative trait loci (QTL) mapping tool, and a large pre‐breeding panel. Here, we discuss the molecular and phenotypic diversity of the EtNAM founder lines, then we use an array featuring 13 000 single nucleotide polymorphisms (SNPs) to characterize a subset of 1200 EtNAM RILs from 12 families. Finally, we test the usefulness of the population by mapping phenology traits and plant height using a genome wide association (GWA) approach. EtNAM RILs showed high allelic variation and a genetic makeup combining genetic diversity from Ethiopian FVs with the international durum wheat allele pool. EtNAM SNP data were projected on the fully sequenced AB genome of wild emmer wheat, and were used to estimate pairwise linkage disequilibrium (LD) measures that reported an LD decay distance of 7.4 Mb on average, and balanced founder contributions across EtNAM families. GWA analyses identified 11 genomic loci individually affecting up to 3 days in flowering time and more than 1.6 cm in height. We argue that the EtNAM is a powerful tool to support the production of new durum wheat varieties targeting local and global agriculture.  相似文献   

14.
Kostem E  Lozano JA  Eskin E 《Genetics》2011,188(2):449-460
Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.  相似文献   

15.
Identifying causal genetic variants underlying heritable phenotypic variation is a long‐standing goal in evolutionary genetics. We previously identified several quantitative trait loci (QTL) for five morphological traits in a captive population of zebra finches (Taeniopygia guttata) by whole‐genome linkage mapping. We here follow up on these studies with the aim to narrow down on the quantitative trait variants (QTN) in one wild and three captive populations. First, we performed an association study using 672 single nucleotide polymorphisms (SNPs) within candidate genes located in the previously identified QTL regions in a sample of 939 wild‐caught zebra finches. Then, we validated the most promising SNP–phenotype associations (n = 25 SNPs) in 5228 birds from four populations. Genotype–phenotype associations were generally weak in the wild population, where linkage disequilibrium (LD) spans only short genomic distances. In contrast, in captive populations, where LD blocks are large, apparent SNP effects on morphological traits (i.e. associations) were highly repeatable with independent data from the same population. Most of those SNPs also showed significant associations with the same trait in other captive populations, but the direction and magnitude of these effects varied among populations. This suggests that the tested SNPs are not the causal QTN but rather physically linked to them, and that LD between SNPs and causal variants differs between populations due to founder effects. While the identification of QTN remains challenging in nonmodel organisms, we illustrate that it is indeed possible to confirm the location and magnitude of QTL in a population with stable linkage between markers and causal variants.  相似文献   

16.
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.  相似文献   

17.
A genome-wide association study of seed protein and oil content in soybean   总被引:8,自引:0,他引:8  

Background

Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content.

Results

A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r 2 ) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil.

Conclusions

This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).  相似文献   

18.
Understanding the genomic signatures, genes, and traits underlying local adaptation of organisms to heterogeneous environments is of central importance to the field evolutionary biology. To identify loci underlying local adaptation, models that combine allelic and environmental variation while controlling for the effects of population structure have emerged as the method of choice. Despite being evaluated in simulation studies, there has not been a thorough investigation of empirical evidence supporting local adaptation across these alleles. To evaluate these methods, we use 875 Arabidopsis thaliana Eurasian accessions and two mixed models (GEMMA and LFMM) to identify candidate SNPs underlying local adaptation to climate. Subsequently, to assess evidence of local adaptation and function among significant SNPs, we examine allele frequency differentiation and recent selection across Eurasian populations, in addition to their distribution along quantitative trait loci (QTL) explaining fitness variation between Italy and Sweden populations and cis‐regulatory/nonsynonymous sites showing significant selective constraint. Our results indicate that significant LFMM/GEMMA SNPs show low allele frequency differentiation and linkage disequilibrium across locally adapted Italy and Sweden populations, in addition to a poor association with fitness QTL peaks (highest logarithm of odds score). Furthermore, when examining derived allele frequencies across the Eurasian range, we find that these SNPs are enriched in low‐frequency variants that show very large climatic differentiation but low levels of linkage disequilibrium. These results suggest that their enrichment along putative functional sites most likely represents deleterious variation that is independent of local adaptation. Among all the genomic signatures examined, only SNPs showing high absolute allele frequency differentiation (AFD) and linkage disequilibrium (LD) between Italy and Sweden populations showed a strong association with fitness QTL peaks and were enriched along selectively constrained cis‐regulatory/nonsynonymous sites. Using these SNPs, we find strong evidence linking flowering time, freezing tolerance, and the abscisic‐acid pathway to local adaptation.  相似文献   

19.
Pei YF  Li J  Zhang L  Papasian CJ  Deng HW 《PloS one》2008,3(10):e3551
The power of genetic association analyses is often compromised by missing genotypic data which contributes to lack of significant findings, e.g., in in silico replication studies. One solution is to impute untyped SNPs from typed flanking markers, based on known linkage disequilibrium (LD) relationships. Several imputation methods are available and their usefulness in association studies has been demonstrated, but factors affecting their relative performance in accuracy have not been systematically investigated. Therefore, we investigated and compared the performance of five popular genotype imputation methods, MACH, IMPUTE, fastPHASE, PLINK and Beagle, to assess and compare the effects of factors that affect imputation accuracy rates (ARs). Our results showed that a stronger LD and a lower MAF for an untyped marker produced better ARs for all the five methods. We also observed that a greater number of haplotypes in the reference sample resulted in higher ARs for MACH, IMPUTE, PLINK and Beagle, but had little influence on the ARs for fastPHASE. In general, MACH and IMPUTE produced similar results and these two methods consistently outperformed fastPHASE, PLINK and Beagle. Our study is helpful in guiding application of imputation methods in association analyses when genotype data are missing.  相似文献   

20.

Background  

Genome wide association (GWA) studies are now being widely undertaken aiming to find the link between genetic variations and common diseases. Ideally, a well-powered GWA study will involve the measurement of hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of individuals. The sheer volume of data generated by these experiments creates very high analytical demands. There are a number of important steps during the analysis of such data, many of which may present severe bottlenecks. The data need to be imported and reviewed to perform initial quality control (QC) before proceeding to association testing. Evaluation of results may involve further statistical analysis, such as permutation testing, or further QC of associated markers, for example, reviewing raw genotyping intensities. Finally significant associations need to be prioritised using functional and biological interpretation methods, browsing available biological annotation, pathway information and patterns of linkage disequilibrium (LD).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号