首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
Association mapping is a powerful approach for exploring the molecular basis of phenotypic variations in plants. A maize (Zea mays L.) association mapping panel including 527 inbred lines with tropical, subtropical and temperate backgrounds, representing the global maize diversity, was genotyped using 1,536 single nucleotide polymorphisms (SNPs). In total, 926 SNPs with minor allele frequencies of ≥0.1 were used to estimate the pattern of genetic diversity and relatedness among individuals. The analysis revealed broad phenotypic diversity and complex genetic relatedness in the maize panel. Two different Bayesian approaches identified three specific subpopulations, which were then reconfirmed by principal component analysis (PCA) and tree-based analyses. Marker–trait associations were performed to assess the suitability of different models for false-positive correction by population structure (Q matrix/PCA) and familial kinship (K matrix) alone or in combination in this panel. The K, Q + K and PCA + K models could reduce the false positives, and the Q + K model performed slightly better for flowering time, ear height and ear diameter. Our findings suggest that this maize panel is suitable for association mapping in order to understand the relationship between genotypic and phenotypic variations for agriculturally complex quantitative traits using optimal statistical methods.  相似文献   

2.
False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.  相似文献   

3.
Comparison of SSRs and SNPs in assessment of genetic relatedness in maize   总被引:3,自引:0,他引:3  
Yang X  Xu Y  Shah T  Li H  Han Z  Li J  Yan J 《Genetica》2011,139(8):1045-1054
Advances in high-throughput SNP genotyping and genome sequencing technologies have enabled genome-wide association mapping in dissecting the genetic basis of complex quantitative traits. In this study, 82 SSRs and 884 SNPs with minor allele frequencies (MAF) over 0.20 were used to compare their ability to assess population structure, principal component analysis (PCA) and relative kinship in a maize association panel consisting of 154 inbred lines. Compared to SNPs, SSRs provided more information on genetic diversity. The expected heterozygosity (He) of SSRs and SNPs averaged 0.65 and 0.44, and the polymorphic information content of these two markers was 0.61 and 0.34 in this panel, respectively. Additionally, SSRs performed better at clustering all lines into groups using STRUCTURE and PCA approaches, and estimating relative kinship. For both marker systems, the same clusters were observed based on PCA and the first two eigenvectors accounted for similar percentage of genetic variations in this panel. The correlation coefficients of each eigenvector from SSRs and SNPs decreased sharply when the eigenvector varied from 1 to 3, but kept around 0 when the eigenvector were over 3. The kinship estimates based on SSRs and SNPs were moderately correlated (r (2)?=?0.69). All these results suggest that SSR markers with moderate density are more informative than SNPs for assessing genetic relatedness in maize association mapping panels.  相似文献   

4.
Information about the extent and genomic distribution of linkage disequilibrium (LD) is of fundamental importance for association mapping. The main objectives of this study were to (1) investigate genetic diversity within germplasm groups of elite European maize (Zea mays L.) inbred lines, (2) examine the population structure of elite European maize germplasm, and (3) determine the extent and genomic distribution of LD between pairs of simple sequence repeat (SSR) markers. We examined genetic diversity and LD in a cross section of European and US elite breeding material comprising 147 inbred lines genotyped with 100 SSR markers. For gene diversity within each group, significant (P<0.05) differences existed among the groups. The LD was significant (P<0.05) for 49% of the SSR marker pairs in the 80 flint lines and for 56% of the SSR marker pairs in the 57 dent lines. The ratio of linked to unlinked loci in LD was 1.1 for both germplasm groups. The high incidence of LD suggests that the extent of LD between SSR markers should allow the detection of marker-phenotype associations in a genome scan. However, our results also indicate that a high proportion of the observed LD is generated by forces, such as relatedness, population stratification, and genetic drift, which cause a high risk of detecting false positives in association mapping.  相似文献   

5.
We present a new method, termed QBlossoc, for linkage disequilibrium (LD) mapping of genetic variants underlying a quantitative trait. The method uses principles similar to a previously published method, Blossoc, for LD mapping of case/control studies. The method builds local genealogies along the genome and looks for a significant clustering of quantitative trait values in these trees. We analyze its efficiency in terms of localization and ranking of true positives among a large number of negatives and compare the results with single-marker approaches. Simulation results of markers at densities comparable to contemporary genotype chips show that QBlossoc is more accurate in localization of true positives as expected since it uses the additional information of LD between markers simultaneously. More importantly, however, for genomewide surveys, QBlossoc places regions with true positives higher on a ranked list than single-marker approaches, again suggesting that a true signal displays itself more strongly in a set of adjacent markers than a spurious (false) signal. The method is both memory and central processing unit (CPU) efficient. It has been tested on a real data set of height data for 5000 individuals measured at ~317,000 markers and completed analysis within 5 CPU days.  相似文献   

6.
Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within- and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.  相似文献   

7.
Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.  相似文献   

8.
Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.  相似文献   

9.
Breeding programs aim to improve the yield and quality of peanut (Arachis hypogaea L.); using association mapping to identify genetic markers linked to these quantitative traits could facilitate selection efficiency. A peanut association panel was established consisting of 268 lines with extensive phenotypic and genetic variation, meeting the requirements for association analysis. These lines were grown over 3 years and the key agronomic traits, including protein and oil content were examined. Population structure (Q) analysis showed two subpopulations and clustering analysis was consistent with Q‐based membership assignment and closely related to botanical type. Relative Kinship (K) indicated that most of the panel members have no or weak familial relatedness, with 52.78% of lines showing K = 0. Linkage disequilibrium (LD) analysis showed a high level of LD occurs in the panel. Model comparisons indicated false positives can be effectively controlled by taking Q and K into consideration and more false positives were generated by K than Q. A preliminary association analysis using a Q + K model found markers significantly associated with oil, protein, oleic acid, and linoleic acid, and identified a set of alleles with positive and negative effects. These results show that this panel is suitable for association analysis, providing a resource for marker‐assisted selection for peanut improvement.  相似文献   

10.
Linkage disequilibrium (LD) refers to the correlation among neighboring alleles, reflecting non-random patterns of association between alleles at (nearby) loci. A better understanding of LD in the porcine genome is of direct relevance for identification of genes and mutations with a certain effect on the traits of interest. Here, 215 SNPs in seven genomic regions were genotyped in individuals of three breeds. Pairwise linkage disequilibrium was calculated for all marker pairs. To estimate the extent of LD, all pairwise LD values were plotted against the distance between the markers. Based on SNP markers in four genomic regions analyzed in three panels from populations of Large White, Dutch Landrace, and Meishan origin, useful LD is estimated to extend for approximately 40 to 60 kb in the porcine genome.  相似文献   

11.
12.
Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population.  相似文献   

13.
Kang HM  Zaitlen NA  Wade CM  Kirby A  Heckerman D  Daly MJ  Eskin E 《Genetics》2008,178(3):1709-1723
Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.  相似文献   

14.
Technological developments allow increasing numbers of markers to be deployed in case-control studies searching for genetic factors that influence disease susceptibility. However, with vast numbers of markers, true 'hits' may become lost in a sea of false positives. This problem may be particularly acute for infectious diseases, where the control group may contain unexposed individuals with susceptible genotypes. To explore this effect, we used a series of stochastic simulations to model a scenario based loosely on bovine tuberculosis. We find that a candidate gene approach tends to have greater statistical power than studies that use large numbers of single nucleotide polymorphisms (SNPs) in genome-wide association tests, almost regardless of the number of SNPs deployed. Both approaches struggle to detect genetic effects when these are either weak or if an appreciable proportion of individuals are unexposed to the disease when modest sample sizes (250 each of cases and controls) are used, but these issues are largely mitigated if sample sizes can be increased to 2000 or more of each class. We conclude that the power of any genotype-phenotype association test will be improved if the sampling strategy takes account of exposure heterogeneity, though this is not necessarily easy to do.  相似文献   

15.
Detecting and localizing selective sweeps on the basis of SNP data has recently received considerable attention. Here we introduce the use of hidden Markov models (HMMs) for the detection of selective sweeps in DNA sequences. Like previously published methods, our HMMs use the site frequency spectrum, and the spatial pattern of diversity along the sequence, to identify selection. In contrast to earlier approaches, our HMMs explicitly model the correlation structure between linked sites. The detection power of our methods, and their accuracy for estimating the selected site location, is similar to that of competing methods for constant size populations. In the case of population bottlenecks, however, our methods frequently showed fewer false positives.  相似文献   

16.
A basic knowledge on linkage disequilibrium (LD) is necessary in order to determine resolution of association studies. We investigated the extent and patterns of LD in a self-incompatible species (Prunus avium L.), in 3 groups (wild cherry, sweet cherry landraces and sweet cherry modern varieties), using a set of 35 microsatellite markers and the gametophytic self-incompatibility locus. Since population structure might create spurious LD, we thus used the information provided by a structure analysis published in a previous study to perform the LD analysis. In the current study, we detected a greater LD extent in sweet cherry than in wild cherry, which is plausibly due to the bottleneck associated with domestication and breeding. Higher LD values in sweet cherry sub-groups may be explained by smaller sample sizes. We also showed that the remaining structure in the groups of sweet cherry, in particular landraces, is responsible for a part of the LD extent. Intra-group relatedness may also account for extensive LD in two sub-groups. These results demonstrate, if ever necessary, the importance of controlling the genetic structure and relatedness when estimating LD. Moreover, LD decays very rapidly with genetic linkage distance in both wild and sweet cherries, which seems promising for future association studies.  相似文献   

17.
Among the several linkage disequilibrium measures known to capture different features of the non-independence between alleles at different loci, the most commonly used for diallelic loci is the r(2) measure. In the present study, we tackled the problem of the bias of r(2) estimate, which results from the sample structure and/or the relatedness between genotyped individuals. We derived two novel linkage disequilibrium measures for diallelic loci that are both extensions of the usual r(2) measure. The first one, r(S)(2), uses the population structure matrix, which consists of information about the origins of each individual and the admixture proportions of each individual genome. The second one, r(V)(2), includes the kinship matrix into the calculation. These two corrections can be applied together in order to correct for both biases and are defined either on phased or unphased genotypes.We proved that these novel measures are linked to the power of association tests under the mixed linear model including structure and kinship corrections. We validated them on simulated data and applied them to real data sets collected on Vitis vinifera plants. Our results clearly showed the usefulness of the two corrected r(2) measures, which actually captured 'true' linkage disequilibrium unlike the usual r(2) measure.  相似文献   

18.
Experimental evolution studies can be used to explore genomic response to artificial and natural selection. In such studies, loci that display larger allele frequency change than expected by genetic drift alone are assumed to be directly or indirectly associated with traits under selection. However, such studies report surprisingly many loci under selection, suggesting that current tests for allele frequency change may be subject to P‐value inflation and hence be anticonservative. One factor known from genomewide association (GWA) studies to cause P‐value inflation is population stratification, such as relatedness among individuals. Here, we suggest that by treating presence of an individual in a population after selection as a binary response variable, existing GWA methods can be used to account for relatedness when estimating allele frequency change. We show that accounting for relatedness like this effectively reduces false‐positives in tests for allele frequency change in simulated data with varying levels of population structure. However, once relatedness has been accounted for, the power to detect causal loci under selection is low. Finally, we demonstrate the presence of P‐value inflation in allele frequency change in empirical data spanning multiple generations from an artificial selection experiment on tarsus length in two free‐living populations of house sparrow and correct for this using genomic control. Our results indicate that since allele frequencies in large parts of the genome may change when selection acts on a heritable trait, such selection is likely to have considerable and immediate consequences for the eco‐evolutionary dynamics of the affected populations.  相似文献   

19.

Background

A large single nucleotide polymorphism (SNP) dataset was used to analyze genome-wide diversity in a diverse collection of watermelon cultivars representing globally cultivated, watermelon genetic diversity. The marker density required for conducting successful association mapping depends on the extent of linkage disequilibrium (LD) within a population. Use of genotyping by sequencing reveals large numbers of SNPs that in turn generate opportunities in genome-wide association mapping and marker-assisted selection, even in crops such as watermelon for which few genomic resources are available. In this paper, we used genome-wide genetic diversity to study LD, selective sweeps, and pairwise FST distributions among worldwide cultivated watermelons to track signals of domestication.

Results

We examined 183 Citrullus lanatus var. lanatus accessions representing domesticated watermelon and generated a set of 11,485 SNP markers using genotyping by sequencing. With a diverse panel of worldwide cultivated watermelons, we identified a set of 5,254 SNPs with a minor allele frequency of ≥ 0.05, distributed across the genome. All ancestries were traced to Africa and an admixture of various ancestries constituted secondary gene pools across various continents. A sliding window analysis using pairwise FST values was used to resolve selective sweeps. We identified strong selection on chromosomes 3 and 9 that might have contributed to the domestication process. Pairwise analysis of adjacent SNPs within a chromosome as well as within a haplotype allowed us to estimate genome-wide LD decay. LD was also detected within individual genes on various chromosomes. Principal component and ancestry analyses were used to account for population structure in a genome-wide association study. We further mapped important genes for soluble solid content using a mixed linear model.

Conclusions

Information concerning the SNP resources, population structure, and LD developed in this study will help in identifying agronomically important candidate genes from the genomic regions underlying selection and for mapping quantitative trait loci using a genome-wide association study in sweet watermelon.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-767) contains supplementary material, which is available to authorized users.  相似文献   

20.
Measuring the information content of markers in relationship/relatedness inferences is important in selecting highly informative markers to attain a given statistical power with the minimal genotyping effort. Using information-theoretic principles, I introduce the informativeness for relationship (I(R)) and the informativeness for relatedness (I(r)) to measure the amount of information provided by markers in inferring pairwise relationships (R) and relatedness (r), respectively. I also propose a fast and accurate algorithm to calculate the power (PW(R)) of a set of markers in differentiating two candidate relationships, and the reciprocal of the mean squared deviations of relatedness estimates (RMSD) to measure the amount of information of markers actually used by an estimator in estimating relatedness. All of the four measurements (I(R), I(r), PW(R), RMSD) apply to dominant and codominant markers, haploid and diploid individuals, and take into account of mutations and typing errors in data. The statistical properties of the four measurements and their relationships are investigated analytically and are examined by applying these methods to simulated and empirical data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号