首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The level of population structure and the extent of linkage disequilibrium (LD) can have large impacts on the power, resolution, and design of genome-wide association studies (GWAS) in plants. Until recently, the topics of LD and population structure have not been explored in oat due to the lack of a high-throughput, high-density marker system. The objectives of this research were to survey the level of population structure and the extent of LD in oat germplasm and determine their implications for GWAS. In total, 1,205 lines and 402 diversity array technology (DArT) markers were used to explore population structure. Principal component analysis and model-based cluster analysis of these data indicated that, for the lines used in this study, relatively weak population structure exists. To explore LD decay, map distances of 2,225 linked DArT marker pairs were compared with LD (estimated as r 2). Results showed that LD between linked markers decayed rapidly to r 2 = 0.2 for marker pairs with a map distance of 1.0 centi-Morgan (cM). For GWAS, we suggest a minimum of one marker every cM, but higher densities of markers should increase marker-QTL association and therefore detection power. Additionally, it was found that LD was relatively consistent across the majority of germplasm clusters. These findings suggest that GWAS in oat can include germplasm with diverse origins and backgrounds. The results from this research demonstrate the feasibility of GWAS and related analyses in oat.  相似文献   

2.
Perennial ryegrass (Lolium perenne L.) is a highly valued temperate climate grass species grown as forage crop and for amenity uses. Due to its outbreeding nature and recent domestication, a high degree of genetic diversity is expected among cultivars. The aim of this study was to assess the extent of linkage disequilibrium (LD) within European elite germplasm and to evaluate the appropriate methodology for genetic association mapping in perennial ryegrass. A high level of genetic diversity was observed in a set of 380 perennial ryegrass elite genotypes when genotyped with 40 SSRs and 2 STS markers. A Bayesian structure analysis identified two subpopulations, which were confirmed by principal coordinate analysis (PCoA). One subpopulation consisted mainly of genotypes originating from the UK, while germplasm mostly from Continental Europe was grouped into the second subpopulation. LD (r2) decay was rapid and occurred within 0.4 cM across European varieties, when population structure was taken into consideration. However, an extended LD of up to 6.6 cM was detected within the variety Aberdart. High genetic diversity and rapid LD decay provide means for high resolution association mapping in elite materials of perennial ryegrass. However, different strategies need to be applied depending on the material used. Genome-wide association study (GWAS) with several hundred markers can be applied within synthetic varieties to identify large (up to 10 cM) genomic regions affecting trait variation. A combination of available and novel DNA markers is needed to achieve resolution required for GWAS in elite breeding materials. An even higher marker density of several million SNPs might be needed for GWAS in diverse ecotype collections, potentially resulting in quantitative trait polymorphism (QTP) identification.  相似文献   

3.
Recent advances in sequencing allow population‐genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction‐site‐associated DNA sequence (RAD‐seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well‐characterized single nucleotide polymorphism (SNP) data set from 21 three‐spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single‐outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population‐genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population‐demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population‐genomic data set, making it especially valuable for nonmodel species.  相似文献   

4.
Information about the extent and genomic distribution of linkage disequilibrium (LD) is of fundamental importance for association mapping. The main objectives of this study were to (1) investigate genetic diversity within germplasm groups of elite European maize (Zea mays L.) inbred lines, (2) examine the population structure of elite European maize germplasm, and (3) determine the extent and genomic distribution of LD between pairs of simple sequence repeat (SSR) markers. We examined genetic diversity and LD in a cross section of European and US elite breeding material comprising 147 inbred lines genotyped with 100 SSR markers. For gene diversity within each group, significant (P<0.05) differences existed among the groups. The LD was significant (P<0.05) for 49% of the SSR marker pairs in the 80 flint lines and for 56% of the SSR marker pairs in the 57 dent lines. The ratio of linked to unlinked loci in LD was 1.1 for both germplasm groups. The high incidence of LD suggests that the extent of LD between SSR markers should allow the detection of marker-phenotype associations in a genome scan. However, our results also indicate that a high proportion of the observed LD is generated by forces, such as relatedness, population stratification, and genetic drift, which cause a high risk of detecting false positives in association mapping.  相似文献   

5.
A major goal in evolutionary biology is to understand the genetic basis of adaptive traits. In migratory birds, wing morphology is such a trait. Our previous work on the great reed warbler (Acrocephalus arundinaceus) shows that wing length is highly heritable and under sexually antagonistic selection. Moreover, a quantitative trait locus (QTL) mapping analysis detected a pronounced QTL for wing length on chromosome 2, suggesting that wing morphology is partly controlled by genes with large effects. Here, we re‐evaluate the genetic basis of wing length in great reed warblers using a genomewide association study (GWAS) approach based on restriction site‐associated DNA sequencing (RADseq) data. We use GWAS models that account for relatedness between individuals and include covariates (sex, age and tarsus length). The resulting association landscape was flat with no peaks on chromosome 2 or elsewhere, which is in line with expectations for polygenic traits. Analysis of the distribution of p‐values did not reveal biases, and the inflation factor was low. Effect sizes were however not uniformly distributed on some chromosomes, and the Z chromosome had weaker associations than autosomes. The level of linkage disequilibrium (LD) in the population decayed to background levels within c. 1 kbp. There could be several reasons to why our QTL study and GWAS gave contrasting results including differences in how associations are modelled (cosegregation in pedigree vs. LD associations), how covariates are accounted for in the models, type of marker used (multi‐ vs. biallelic), difference in power or a combination of these. Our study highlights that the genetic architecture even of highly heritable traits is difficult to characterize in wild populations.  相似文献   

6.
High-density genetic markers are the prerequisite for understanding linkage disequilibrium (LD) and genome-wide association studies (GWASs) of complex traits in crops. To evaluate the LD pattern in oilseed rape, we sequenced a previous association panel containing 189 B. napus inbred lines using double-digested restriction-site associated DNA (ddRAD) and genotyped 19,327 RAD tags. A total of 15,921 RAD tags were assigned to a published genetic linkage map and the majority (71.1%) of these tags was uniquely mapped to the draft reference genome “Darmor-bzh.” The distance of LD decay was 1,214 kb across the genome at the background level (r2 = 0.26), with the distances of LD decay being 405 kb and 2,111 kb in the A and C subgenomes, respectively. A total of 361 haplotype blocks with length > 100 kb were identified in the entire genome. The association panel could be classified into two groups, P1 and P2, which are essentially consistent with the geographical origins of varieties. A large number of group-specific haplotypes were identified, reflecting that varieties in the P1 and P2 groups experienced distinct selection in breeding programs to adapt their different growth habitats. GWAS repeatedly detected two loci significantly associated with oil content of seeds based on the developed SNPs, suggesting that the high-density SNPs were useful for understanding the genetic determinants of complex traits in GWAS.  相似文献   

7.

Background  

Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data.  相似文献   

8.
Several genes have been suggested as dyslexia candidates. Some of these candidate genes have been recently shown to be associated with literacy measures in sample cohorts derived from the general population. Here, we have conducted an association study in a novel sample derived from the Australian population (the Raine cohort) to further investigate the role of dyslexia candidate genes. We analysed markers, previously reported to be associated with dyslexia, located within the MRPL19/C2ORF3, KIAA0319, DCDC2 and DYX1C1 genes in a sample of 520 individuals and tested them for association with reading and spelling measures. Association signals were detected for several single nucleotide polymorphisms (SNPs) within DYX1C1 with both the reading and spelling tests. The high linkage disequilibrium (LD) we observed across the DYX1C1 gene suggests that the association signal might not be refined by further genetic mapping.  相似文献   

9.
Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a “gene set ridge regression in association studies (GRASS)” algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology.  相似文献   

10.
11.
12.
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are “true,” but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.  相似文献   

13.
The hemochromatosis gene (HFE) maps to 6p21.3, in close linkage with the HLA Class I genes. Linkage disequilibrium (LD) studies were designed to narrow down the most likely candidate region for HFE, as an alternative to traditional linkage analysis. However, both the HLA-A and D6S105 subregions, which are situated 2–3 cM and approximately 3 Mb apart, have been suggested to contain HFE. The present report extends our previous study based upon the analysis of a large number of HFE and normal chromosomes from 66families of Breton ancestry. In addition to the previously used RFLP markers spanning the 400-kb surrounding HLA-A, we examined three microsatellites: D6S510, HLA-F, and D6S105. Our combined data not only confirm a peak of LD at D6S105, but also reveal a complex pattern of LD over the i82 to D6S105 interval. Within our ethnically well-defined population of Brittany, the association of HFE with D6S105 is as great as that with HLA-A, while the internal markers display a lower LD. Fine haplotype analysis enabled us to identify two categories of haplotypes segregating with HFE. In contrast to the vast majority of normal haplotypes, 50% of HFE haplotypes are completely conserved over the HLA-A to D6S105 interval. These haplotypes could have been conserved through recombination suppression, selective forces and/or other evolutionary factors. This particular haplotypic configuration might account for the apparent inconsistencies between genetic linkage and LD data, and additionally greatly complicates positional cloning of HFE through disequilibrium mapping.The authors contributed equally to this work  相似文献   

14.
We present here the first study of linkage disequilibrium (LD) in cultivated grapevine, Vitis vinifera L. subsp. vinifera (sativa), an outcrossing highly heterozygous perennial species. Our goal was to characterize the amount and pattern of LD at the scale of a few centiMorgans (cM) between 38 microsatellite loci located on five linkage groups, in order to assess its origin and potential applications. We used a core collection of 141 cultivars representing the diversity of the cultivated compartment. LD was evaluated with both independence tests and multilocus r 2 , both on raw genotypic and reconstructed haplotypic data. Significant genotypic LD was found only within linkage groups, extending up to 16.8 cM. It appeared not to be influenced by the weak structure of the sample and seemed to be mainly of haplotypic origin. Significant haplotypic LD was found over 30 cM. Both genotypic and haplotypic r 2 values declined to around 0.1 within 5–10 cM, suggesting a rather narrow genetic base of the cultivated compartment and limited recombination since domestication events. These first results open up a few application opportunities for association mapping of QTLs and marker assisted selection. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

15.
The narrow genetic base of cultivated cotton germplasm is hindering the cotton productivity worldwide. Although potential genetic diversity exists in Gossypium genus, it is largely ‘underutilized’ due to photoperiodism and the lack of innovative tools to overcome such challenges. The application of linkage disequilibrium (LD)-based association mapping is an alternative powerful molecular tool to dissect and exploit the natural genetic diversity conserved within cotton germplasm collections, greatly accelerating still ‘lagging’ cotton marker-assisted selection (MAS) programs. However, the extent of genome-wide linkage disequilibrium (LD) has not been determined in cotton. We report the extent of genome-wide LD and association mapping of fiber quality traits by using a 95 core set of microsatellite markers in a total of 285 exotic Gossypium hirsutum accessions, comprising of 208 landrace stocks and 77 photoperiodic variety accessions. We demonstrated the existence of useful genetic diversity within exotic cotton germplasm. In this germplasm set, 11–12% of SSR loci pairs revealed a significant LD. At the significance threshold (r2 ≥ 0.1), a genome-wide average of LD declines within the genetic distance at < 10 cM in the landrace stocks germplasm and > 30 cM in variety germplasm. Genome wide LD at r2 ≥ 0.2 was reduced on average to  1–2 cM in the landrace stock germplasm and 6–8 cM in variety germplasm, providing evidence of the potential for association mapping of agronomically important traits in cotton. We observed significant population structure and relatedness in assayed germplasm. Consequently, the application of the mixed liner model (MLM), considering both kinship (K) and population structure (Q) detected between 6% and 13% of SSR markers associated with the main fiber quality traits in cotton. Our results highlight for the first time the feasibility and potential of association mapping, with consideration of the population structure and stratification existing in cotton germplasm resources. The number of SSR markers associated with fiber quality traits in diverse cotton germplasm, which broadly covered many historical meiotic events, should be useful to effectively exploit potentially new genetic variation by using MAS programs.  相似文献   

16.
Although linkage maps are important tools in evolutionary biology, their availability for wild populations is limited. The population of song sparrows (Melospiza melodia) on Mandarte Island, Canada, is among the more intensively studied wild animal populations. Its long‐term pedigree data, together with extensive genetic sampling, have allowed the study of a range of questions in evolutionary biology and ecology. However, the availability of genetic markers has been limited. We here describe 191 new microsatellite loci, including 160 high‐quality polymorphic autosomal, 7 Z‐linked and 1 W‐linked markers. We used these markers to construct a linkage map for song sparrows with a total sex‐averaged map length of 1731 cM and covering 35 linkage groups, and hence, these markers cover most of the 38–40 chromosomes. Female and male map lengths did not differ significantly. We then bioinformatically mapped these loci to the zebra finch (Taeniopygia guttata) genome and found that linkage groups were conserved between song sparrows and zebra finches. Compared to the zebra finch, marker order within small linkage groups was well conserved, whereas the larger linkage groups showed some intrachromosomal rearrangements. Finally, we show that as expected, recombination frequency between linked loci explained the majority of variation in gametic phase disequilibrium. Yet, there was substantial overlap in gametic phase disequilibrium between pairs of linked and unlinked loci. Given that the microsatellites described here lie on 35 of the 38–40 chromosomes, these markers will be useful for studies in this species, as well as for comparative genomics studies with other species.  相似文献   

17.
Population genetics of genomics-based crop improvement methods   总被引:1,自引:0,他引:1  
Many genome-wide association studies (GWAS) in humans are concluding that, even with very large sample sizes and high marker densities, most of the genetic basis of complex traits may remain unexplained. At the same time, recent research in plant GWAS is showing much greater success with fewer resources. Both GWAS and genomic selection (GS), a method for predicting phenotypes by the use of genome-wide marker data, are receiving considerable attention among plant breeders. In this review we explore how differences in population genetic histories, as well as past selection for traits of interest, have produced trait architectures and patterns of linkage disequilibrium (LD) that frequently differ dramatically between domesticated plants and humans, making detection of quantitative trait loci (QTL) effects in crops more rewarding and less costly than in humans.  相似文献   

18.
Knowledge about the forces generating and conserving linkage disequilibrium (LD) is important for drawing conclusions about the prospects and limitations of association mapping. The objectives of our research were to examine the importance of (1) selection, (2) mutation, and (3) genetic drift for generating LD in a typical maize breeding program. We conducted computer simulations based on genotypic data of Central European maize open-pollinated varieties which have played an important role as founders of the European flint heterotic group. The breeding scheme and the dimensioning underlying our simulations reflect essentially the maize breeding program of the University of Hohenheim. Results suggested that in a plant breeding program of the examined dimension and breeding scheme, genetic drift and selection are major forces generating LD. The currently used population-based association mapping tests do not explicitly correct for LD caused by these two forces. Therefore, increased type I error rates are expected if these tests are applied to plant breeding populations. As a consequence, we recommend to use family-based association tests for association mapping approaches in plant breeding populations.  相似文献   

19.
Jiang N  Wang M  Jia T  Wang L  Leach L  Hackett C  Marshall D  Luo Z 《PloS one》2011,6(8):e23192

Background

It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.

Methodology

We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.

Results/Conclusions

The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.  相似文献   

20.

Background

The advent of genome-wide association studies has led to many novel disease-SNP associations, opening the door to focused study on their biological underpinnings. Because of the importance of analyzing these associations, numerous statistical methods have been devoted to them. However, fewer methods have attempted to associate entire genes or genomic regions with outcomes, which is potentially more useful knowledge from a biological perspective and those methods currently implemented are often permutation-based.

Results

One property of some permutation-based tests is that their power varies as a function of whether significant markers are in regions of linkage disequilibrium (LD) or not, which we show from a theoretical perspective. We therefore develop two methods for quantifying the degree of association between a genomic region and outcome, both of whose power does not vary as a function of LD structure. One method uses dimension reduction to “filter” redundant information when significant LD exists in the region, while the other, called the summary-statistic test, controls for LD by scaling marker Z-statistics using knowledge of the correlation matrix of markers. An advantage of this latter test is that it does not require the original data, but only their Z-statistics from univariate regressions and an estimate of the correlation structure of markers, and we show how to modify the test to protect the type 1 error rate when the correlation structure of markers is misspecified. We apply these methods to sequence data of oral cleft and compare our results to previously proposed gene tests, in particular permutation-based ones. We evaluate the versatility of the modification of the summary-statistic test since the specification of correlation structure between markers can be inaccurate.

Conclusion

We find a significant association in the sequence data between the 8q24 region and oral cleft using our dimension reduction approach and a borderline significant association using the summary-statistic based approach. We also implement the summary-statistic test using Z-statistics from an already-published GWAS of Chronic Obstructive Pulmonary Disorder (COPD) and correlation structure obtained from HapMap. We experiment with the modification of this test because the correlation structure is assumed imperfectly known.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号