首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Complex trait genome-wide association studies (GWAS) provide an efficient strategy for evaluating large numbers of common variants in large numbers of individuals and for identifying trait-associated variants. Nevertheless, GWAS often leave much of the trait heritability unexplained. We hypothesized that some of this unexplained heritability might be due to common and rare variants that reside in GWAS identified loci but lack appropriate proxies in modern genotyping arrays. To assess this hypothesis, we re-examined 7 genes (APOE, APOC1, APOC2, SORT1, LDLR, APOB, and PCSK9) in 5 loci associated with low-density lipoprotein cholesterol (LDL-C) in multiple GWAS. For each gene, we first catalogued genetic variation by re-sequencing 256 Sardinian individuals with extreme LDL-C values. Next, we genotyped variants identified by us and by the 1000 Genomes Project (totaling 3,277 SNPs) in 5,524 volunteers. We found that in one locus (PCSK9) the GWAS signal could be explained by a previously described low-frequency variant and that in three loci (PCSK9, APOE, and LDLR) there were additional variants independently associated with LDL-C, including a novel and rare LDLR variant that seems specific to Sardinians. Overall, this more detailed assessment of SNP variation in these loci increased estimates of the heritability of LDL-C accounted for by these genes from 3.1% to 6.5%. All association signals and the heritability estimates were successfully confirmed in a sample of ~10,000 Finnish and Norwegian individuals. Our results thus suggest that focusing on variants accessible via GWAS can lead to clear underestimates of the trait heritability explained by a set of loci. Further, our results suggest that, as prelude to large-scale sequencing efforts, targeted re-sequencing efforts paired with large-scale genotyping will increase estimates of complex trait heritability explained by known loci.  相似文献   

2.
Large-scale genome-wide association studies (GWAS) have identified many loci associated with body mass index (BMI), but few studies focused on obesity as a binary trait. Here we report the results of a GWAS and candidate SNP genotyping study of obesity, including extremely obese cases and never overweight controls as well as families segregating extreme obesity and thinness. We first performed a GWAS on 520 cases (BMI>35 kg/m(2)) and 540 control subjects (BMI<25 kg/m(2)), on measures of obesity and obesity-related traits. We subsequently followed up obesity-associated signals by genotyping the top ~500 SNPs from GWAS in the combined sample of cases, controls and family members totaling 2,256 individuals. For the binary trait of obesity, we found 16 genome-wide significant signals within the FTO gene (strongest signal at rs17817449, P = 2.5 × 10(-12)). We next examined obesity-related quantitative traits (such as total body weight, waist circumference and waist to hip ratio), and detected genome-wide significant signals between waist to hip ratio and NRXN3 (rs11624704, P = 2.67 × 10(-9)), previously associated with body weight and fat distribution. Our study demonstrated how a relatively small sample ascertained through extreme phenotypes can detect genuine associations in a GWAS.  相似文献   

3.
In contrast to large GWA studies based on thousands of individuals and large meta-analyses combining GWAS results, we analyzed a small case/control sample for uric acid nephrolithiasis. Our cohort of closely related individuals is derived from a small, genetically isolated village in Sardinia, with well-characterized genealogical data linking the extant population up to the 16(th) century. It is expected that the number of risk alleles involved in complex disorders is smaller in isolated founder populations than in more diverse populations, and the power to detect association with complex traits may be increased when related, homogeneous affected individuals are selected, as they are more likely to be enriched with and share specific risk variants than are unrelated, affected individuals from the general population. When related individuals are included in an association study, correlations among relatives must be accurately taken into account to ensure validity of the results. A recently proposed association method uses an empirical genotypic covariance matrix estimated from genome-screen data to allow for additional population structure and cryptic relatedness that may not be captured by the genealogical data. We apply the method to our data, and we also investigate the properties of the method, as well as other association methods, in our highly inbred population, as previous applications were to outbred samples. The more promising regions identified in our initial study in the genetic isolate were then further investigated in an independent sample collected from the Italian population. Among the loci that showed association in this study, we observed evidence of a possible involvement of the region encompassing the gene LRRC16A, already associated to serum uric acid levels in a large meta-analysis of 14 GWAS, suggesting that this locus might lead a pathway for uric acid metabolism that may be involved in gout as well as in nephrolithiasis.  相似文献   

4.
The success of genome-wide association studies (GWAS) to identify risk loci of complex diseases is now well-established. One persistent major hurdle is the cost of those studies, which make them beyond the reach of most research groups. Performing GWAS on pools of DNA samples may be an effective strategy to reduce the costs of these studies. In this study, we performed pooling-based GWAS with more than 550,000 SNPs in two case-control cohorts consisting of patients with Type II diabetes (T2DM) and with chronic rhinosinusitis (CRS). In the T2DM study, the results of the pooling experiment were compared to individual genotypes obtained from a previously published GWAS. TCF7L2 and HHEX SNPs associated with T2DM by the traditional GWAS were among the top ranked SNPs in the pooling experiment. This dataset was also used to refine the best strategy to correctly identify SNPs that will remain significant based on individual genotyping. In the CRS study, the top hits from the pooling-based GWAS located within ten kilobases of known genes were validated by individual genotyping of 1,536 SNPs. Forty-one percent (598 out of the 1,457 SNPs that passed quality control) were associated with CRS at a nominal P value of 0.05, confirming the potential of pooling-based GWAS to identify SNPs that differ in allele frequencies between two groups of subjects. Overall, our results demonstrate that a pooling experiment on high-density genotyping arrays can accurately determine the minor allelic frequency as compared to individual genotyping and produce a list of top ranked SNPs that captures genuine allelic differences between a group of cases and controls. The low cost associated with a pooling-based GWAS clearly justifies its use in screening for genetic determinants of complex diseases. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

5.
Single‐nucleotide polymorphisms (SNPs) are rapidly becoming the standard markers in population genomics studies; however, their use in nonmodel organisms is limited due to the lack of cost‐effective approaches to uncover genome‐wide variation, and the large number of individuals needed in the screening process to reduce ascertainment bias. To discover SNPs for population genomics studies in the fungal symbionts of the mountain pine beetle (MPB), we developed a road map to discover SNPs and to produce a genotyping platform. We undertook a whole‐genome sequencing approach of Leptographium longiclavatum in combination with available genomics resources of another MPB symbiont, Grosmannia clavigera. We sequenced 71 individuals pooled into four groups using the Illumina sequencing technology. We generated between 27 and 30 million reads of 75 bp that resulted in a total of 1, 181 contigs longer than 2 kb and an assembled genome size of 28.9 Mb (N50 = 48 kb, average depth = 125x). A total of 9052 proteins were annotated, and between 9531 and 17 266 SNPs were identified in the four pools. A subset of 206 genes (containing 574 SNPs, 11% false positives) was used to develop a genotyping platform for this species. Using this roadmap, we developed a genotyping assay with a total of 147 SNPs located in 121 genes using the Illumina® Sequenom iPLEX Gold. Our preliminary genotyping (success rate = 85%) of 304 individuals from 36 populations supports the utility of this approach for population genomics studies in other MPB fungal symbionts and other fungal nonmodel species.  相似文献   

6.
If a healthy stable host population at the disease-free equilibrium is subject to the Allee effect, can a small number of infected individuals with a fatal disease cause the host population to go extinct? That is, does the Allee effect matter at high densities? To answer this question, we use a susceptible-infected epidemic model to obtain model parameters that lead to host population persistence (with or without infected individuals) and to host extinction. We prove that the presence of an Allee effect in host demographics matters even at large population densities. We show that a small perturbation to the disease-free equilibrium can eventually lead to host population extinction. In addition, we prove that additional deaths due to a fatal infectious disease effectively increase the Allee threshold of the host population demographics.  相似文献   

7.
It is widely acknowledged that genome-wide association studies (GWAS) of complex human disease fail to explain a large portion of heritability, primarily due to lack of statistical power—a problem that is exacerbated when seeking detection of interactions of multiple genomic loci. An untapped source of information that is already widely available, and that is expected to grow in coming years, is population samples. Such samples contain genetic marker data for additional individuals, but not their relevant phenotypes. In this article we develop a highly efficient testing framework based on a constrained maximum-likelihood estimate in a case–control–population setting. We leverage the available population data and optional modeling assumptions, such as Hardy–Weinberg equilibrium (HWE) in the population and linkage equilibrium (LE) between distal loci, to substantially improve power of association and interaction tests. We demonstrate, via simulation and application to actual GWAS data sets, that our approach is substantially more powerful and robust than standard testing approaches that ignore or make naive use of the population sample. We report several novel and credible pairwise interactions, in bipolar disorder, coronary artery disease, Crohn’s disease, and rheumatoid arthritis.  相似文献   

8.
If a healthy stable host population at the disease-free equilibrium is subject to the Allee effect, can a small number of infected individuals with a fatal disease cause the host population to go extinct? That is, does the Allee effect matter at high densities? To answer this question, we use a susceptible–infected epidemic model to obtain model parameters that lead to host population persistence (with or without infected individuals) and to host extinction. We prove that the presence of an Allee effect in host demographics matters even at large population densities. We show that a small perturbation to the disease-free equilibrium can eventually lead to host population extinction. In addition, we prove that additional deaths due to a fatal infectious disease effectively increase the Allee threshold of the host population demographics.  相似文献   

9.
Although approaches for performing genome‐wide association studies (GWAS) are well developed, conventional GWAS requires high‐density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP‐GWAS (extreme‐phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well‐characterized kernel row number trait, which was selected to enable comparisons between the results of XP‐GWAS and conventional GWAS. An exome‐sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait‐associated variants were significantly enriched in regions identified by conventional GWAS. XP‐GWAS was able to resolve several linked QTL and detect trait‐associated variants within a single gene under a QTL peak. XP‐GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest.  相似文献   

10.
Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association. Tests of HWE are commonly performed using a simple chi2 goodness-of-fit test. We show that this chi2 test can have inflated type I error rates, even in relatively large samples (e.g., samples of 1,000 individuals that include approximately 100 copies of the minor allele). On the basis of previous work, we describe exact tests of HWE together with efficient computational methods for their implementation. Our methods adequately control type I error in large and small samples and are computationally efficient. They have been implemented in freely available code that will be useful for quality assessment of genotype data and for the detection of genetic association or population stratification in very large data sets.  相似文献   

11.
Over the past two decades many quantitative trait loci (QTL) have been detected; however, very few have been incorporated into breeding programs. The recent development of genome-wide association studies (GWAS) in plants provides the opportunity to detect QTL in germplasm collections such as unstructured populations from breeding programs. The overall goal of the barley Coordinated Agricultural Project was to conduct GWAS with the intent to couple QTL detection and breeding. The basic idea is that breeding programs generate a vast amount of phenotypic data and combined with cheap genotyping it should be possible to use GWAS to detect QTL that would be immediately accessible and used by breeding programs. There are several constraints to using breeding program-derived phenotype data for conducting GWAS namely: limited population size and unbalanced data sets. We chose the highly heritable trait heading date to study these two variables. We examined 766 spring barley breeding lines (panel #1) grown in balanced trials and a subset of 384 spring barley breeding lines (panel #2) grown in balanced and unbalanced trials. In panel #1, we detected three major QTL for heading date that have been detected in previous bi-parental mapping studies. Simulation studies showed that population sizes greater than 384 individuals are required to consistently detect QTL. We also showed that unbalanced data sets from panel #2 can be used to detect the three major QTL. However, unbalanced data sets resulted in an increase in the false-positive rate. Interestingly, one-step analysis performed better than two-step analysis in reducing the false-positive rate. The results of this work show that it is possible to use phenotypic data from breeding programs to detect QTL, but that careful consideration of population size and experimental design are required.  相似文献   

12.
Patterns of genetic diversity have previously been shown to mirror geography on a global scale and within continents and individual countries. Using genome-wide SNP data on 5174 Swedes with extensive geographical coverage, we analyzed the genetic structure of the Swedish population. We observed strong differences between the far northern counties and the remaining counties. The population of Dalarna county, in north middle Sweden, which borders southern Norway, also appears to differ markedly from other counties, possibly due to this county having more individuals with remote Finnish or Norwegian ancestry than other counties. An analysis of genetic differentiation (based on pairwise F(st)) indicated that the population of Sweden's southernmost counties are genetically closer to the HapMap CEU samples of Northern European ancestry than to the populations of Sweden's northernmost counties. In a comparison of extended homozygous segments, we detected a clear divide between southern and northern Sweden with small differences between the southern counties and considerably more segments in northern Sweden. Both the increased degree of homozygosity in the north and the large genetic differences between the south and the north may have arisen due to a small population in the north and the vast geographical distances between towns and villages in the north, in contrast to the more densely settled southern parts of Sweden. Our findings have implications for future genome-wide association studies (GWAS) with respect to the matching of cases and controls and the need for within-county matching. We have shown that genetic differences within a single country may be substantial, even when viewed on a European scale. Thus, population stratification needs to be accounted for, even within a country like Sweden, which is often perceived to be relatively homogenous and a favourable resource for genetic mapping, otherwise inferences based on genetic data may lead to false conclusions.  相似文献   

13.
The increasing affordability of sequencing and genotyping technologies has transformed the field of molecular ecology in recent decades. By correlating marker variants with trait variation using association analysis, large‐scale genotyping and phenotyping of individuals from wild populations has enabled the identification of genomic regions that contribute to phenotypic differences among individuals. Such “gene mapping” studies are enabling us to better predict evolutionary potential and the ability of populations to adapt to challenges, such as changing environment. These studies are also allowing us to gain insight into the evolutionary processes maintaining variation in natural populations, to better understand genotype‐by‐environment and epistatic interactions and to track the dynamics of allele frequency change at loci contributing to traits under selection. Gene mapping in the wild using genomewide association scans (GWAS) do, however, come with a number of methodological challenges, not least the population structure in space and time inherent to natural populations. We here provide an overview of these challenges, summarize the exciting methodological advances and applications of association mapping in natural populations reported in this special issue and provide some guidelines for future “wild GWAS” research.  相似文献   

14.
Kostem E  Lozano JA  Eskin E 《Genetics》2011,188(2):449-460
Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.  相似文献   

15.
Small, isolated populations are constantly threatened by loss of genetic diversity due to drift. Such situations are found, for instance, in laboratory culturing. In guarding against diversity loss, monitoring of potential changes in population structure is paramount; this monitoring is most often achieved using microsatellite markers, which can be costly in terms of time and money when many loci are scored in large numbers of individuals. Here, we present a case study reducing the number of microsatellites to the minimum necessary to correctly detect the population structure of two Drosophila nigrosparsa populations. The number of loci was gradually reduced from 11 to 1, using the Allelic Richness (AR) and Private Allelic Richness (PAR) as criteria for locus removal. The effect of each reduction step was evaluated by the number of genetic clusters detectable from the data and by the allocation of individuals to the clusters; in the latter, excluding ambiguous individuals was tested to reduce the rate of incorrect assignments. We demonstrate that more than 95% of the individuals can still be correctly assigned when using eight loci and that the major population structure is still visible when using two highly polymorphic loci. The differences between sorting the loci by AR and PAR were negligible. The method presented here will most efficiently reduce genotyping costs when small sets of loci (“core sets”) for long-time use in large-scale population screenings are compiled.  相似文献   

16.
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute''s (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.  相似文献   

17.
Capsaicinoids are unique compounds produced only in peppers (Capsicum spp.). Several studies using classical quantitative trait loci (QTLs) mapping and genomewide association studies (GWAS) have identified QTLs controlling capsaicinoid content in peppers; however, neither the QTLs common to each population nor the candidate genes underlying them have been identified due to the limitations of each approach used. Here, we performed QTL mapping and GWAS for capsaicinoid content in peppers using two recombinant inbred line (RIL) populations and one GWAS population. Whole‐genome resequencing and genotyping by sequencing (GBS) were used to construct high‐density single nucleotide polymorphism (SNP) maps. Five QTL regions on chromosomes 1, 2, 3, 4 and 10 were commonly identified in both RIL populations over multiple locations and years. Furthermore, a total of 109 610 SNPs derived from two GBS libraries were used to analyse the GWAS population consisting of 208 C. annuum‐clade accessions. A total of 69 QTL regions were identified from the GWAS, 10 of which were co‐located with the QTLs identified from the two biparental populations. Within these regions, we were able to identify five candidate genes known to be involved in capsaicinoid biosynthesis. Our results demonstrate that QTL mapping and GBS‐GWAS represent a powerful combined approach for the identification of loci controlling complex traits.  相似文献   

18.
The advent of the pangenome era has unraveled previously unknown genetic variation existing within diverse crop plants, including rice. This untapped genetic variation is believed to account for a major portion of phenotypic variation existing in crop plants. However, the use of conventional single reference-guided genotyping often fails to capture a large portion of this genetic variation leading to a reference bias. This makes it difficult to identify and utilize novel population/cultivar-specific genes for crop improvement. Thus, we developed a Rice Pangenome Genotyping Array (RPGA) harboring probes assaying 80K single-nucleotide polymorphisms (SNPs) and presence–absence variants spanning the entire 3K rice pangenome. This array provides a simple, user-friendly and cost-effective (60–80 USD per sample) solution for rapid pangenome-based genotyping in rice. The genome-wide association study (GWAS) conducted using RPGA-SNP genotyping data of a rice diversity panel detected a total of 42 loci, including previously known as well as novel genomic loci regulating grain size/weight traits in rice. Eight of these identified trait-associated loci (dispensable loci) could not be detected with conventional single reference genome-based GWAS. A WD repeat-containing PROTEIN 12 gene underlying one of such dispensable locus on chromosome 7 (qLWR7) along with other non-dispensable loci were subsequently detected using high-resolution quantitative trait loci mapping confirming authenticity of RPGA-led GWAS. This demonstrates the potential of RPGA-based genotyping to overcome reference bias. The application of RPGA-based genotyping for population structure analysis, hybridity testing, ultra-high-density genetic map construction and chromosome-level genome assembly, and marker-assisted selection was also demonstrated. A web application ( http://www.rpgaweb.com ) was further developed to provide an easy to use platform for the imputation of RPGA-based genotyping data using 3K rice reference panel and subsequent GWAS.  相似文献   

19.
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.  相似文献   

20.
Previous genome-wide association studies (GWAS) have shown several risk alleles to be associated with breast cancer. However, the variants identified so far contribute to only a small proportion of disease risk. The objective of our GWAS was to identify additional novel breast cancer susceptibility variants and to replicate these findings in an independent cohort. We performed a two-stage association study in a cohort of 3,064 women from Alberta, Canada. In Stage I, we interrogated 906,600 single nucleotide polymorphisms (SNPs) on Affymetrix SNP 6.0 arrays using 348 breast cancer cases and 348 controls. We used single-locus association tests to determine statistical significance for the observed differences in allele frequencies between cases and controls. In Stage II, we attempted to replicate 35 significant markers identified in Stage I in an independent study of 1,153 cases and 1,215 controls. Genotyping of Stage II samples was done using Sequenom Mass-ARRAY iPlex platform. Six loci from four different gene regions (chromosomes 4, 5, 16 and 19) showed statistically significant differences between cases and controls in both Stage I and Stage II testing, and also in joint analysis. The identified variants were from EDNRA, ROPN1L, C16orf61 and ZNF577 gene regions. The presented joint analyses from the two-stage study design were not significant after genome-wide correction. The SNPs identified in this study may serve as potential candidate loci for breast cancer risk in a further replication study in Stage III from Alberta population or independent validation in Caucasian cohorts elsewhere.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号