首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
First-generation selection (FGS) and second-generation selection (SGS) breeding populations of loblolly pine from east Texas were studied to estimate the genetic diversity, population structure, linkage disequilibrium (LD), signatures of selection and association of breeding traits with a genome-wide panel of 4,264 single nucleotide polymorphisms (SNPs). Relatively high levels of observed (H o?=?0.178–0.198) and expected (H e?=?0.180–0.198) heterozygosities were observed in all populations. The amount of inbreeding was very low with many populations exhibiting a slight excess of heterozygotes. The population structure was weak, but F ST indicated more pronounced differentiation in the SGS populations. As expected for outcrossing natural populations, the genome-wide LD was low, but marker density was insufficient to deduce the decay rate. Numerous associations were found between various phenotypic traits and SNPs, but only a few remained significant after false positive correction. Signatures of diversifying and balancing selection were found in markers representing important biological functions. These results present the first step in the application of marker-assisted selection (MAS) to the Western Gulf Forest Tree Improvement Program (WGFTIP) for loblolly pine and will contribute to the knowledgebase necessary for genomic selection technology.  相似文献   

2.
Through stochastic simulations, estimates of breeding values accuracies and response to selection were assessed under traditional pedigree-based and genomic-based evaluation methods. More specifically, several key parameters such as the trait’s heritability (0.2 and 0.6), the number of QTLs underlying the trait (100 to 200), and the marker density (1 to 10 SNPs/cM) were evaluated. Additionally, impact of two contrasting mating designs (partial diallel vs. single-pair mating) was investigated. Response to selection was then assessed in a seed production population (seed orchard consisting of unrelated selections) for different effective population sizes (Ne?=?5 to 25). The simulated candidate population comprised a fixed size of 2050 individuals with fast linkage disequilibrium decay, generally found in forest tree populations. Following the genetic/genomic evaluation, top-ranked individuals were selected to meeting the predetermined effective population size in target production population. The combination of low h2, high Ne, and dense marker coverage resulted at maximum relative genomic prediction efficiency and the most efficient exploitation of the Mendelian sampling term (within-family additive genetic variance). Since genomic prediction of breeding values constitutes the methodological foundation of genomic selection, our results can be used to address important questions when similar scenarios are considered.  相似文献   

3.
Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated by taking different information sources from the reference population into account: (1) allele frequencies, (2) LD pattern, (3) haplotypes, (4) haploid chromosomes, and (5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (Me) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002 ± 0.0001 (allele frequencies), 0.022 ± 0.001 (LD pattern), 0.018 ± 0.001 (haplotypes), 0.100 ± 0.008 (haploid chromosomes), and 0.318 ± 0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated Me. So, reliabilities can be predicted accurately using empirically estimated Me and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length.  相似文献   

4.
Knowledge of linkage disequilibrium (LD) is important for effective genome-wide association studies and accurate genomic prediction. Chinese Merino (Xinjiang type) is well-known fine wool sheep breed. However, the extent of LD across the genome remains unexplored. In this study, we calculated autosomal LD based on genome-wide SNPs of 635 Chinese Merino (Xinjiang type) sheep by Illumina Ovine SNP50 BeadChip. A moderate level of LD (r 2?≥?0.25) across the whole genome was observed at short distances of 0–10 kb. Further, the ancestral effective population size (N e ) was analyzed by extent of LD and found that N e increased with the increase of generations and declined rapidly within the most recent 50 generations, which is consistent with the history of Chinese Merino sheep breeding, initiated in 1971. We also noted that even when the effective population size was estimated across different single chromosomes, N e only ranged from 140.36 to 183.33 at five generations in the past, exhibiting a rapid decrease compared with that at ten generations in the past. These results indicated that the genetic diversity in Chinese Merino sheep recently decreased and proper protective measures should be taken to maintain the diversity. Our datasets provided essential genetic information to track molecular variations which potentially contribute to phenotypic variation in Chinese Merino sheep.  相似文献   

5.
Genomic selection in forest tree breeding   总被引:2,自引:0,他引:2  
Genomic selection (GS) involves selection decisions based on genomic breeding values estimated as the sum of the effects of genome-wide markers capturing most quantitative trait loci (QTL) for the target trait(s). GS is revolutionizing breeding practice in domestic animals. The same approach and concepts can be readily applied to forest tree breeding where long generation times and late expressing complex traits are also a challenge. GS in forest trees would have additional advantages: large training populations can be easily assembled and accurately phenotyped for several traits, and the extent of linkage disequilibrium (LD) can be high in elite populations with small effective population size (N e) frequently used in advanced forest tree breeding programs. Deterministic equations were used to assess the impact of LD (modeled by N e and intermarker distance), the size of the training set, trait heritability, and the number of QTL on the predicted accuracy of GS. Results indicate that GS has the potential to radically improve the efficiency of tree breeding. The benchmark accuracy of conventional BLUP selection is reached by GS even at a marker density ~2 markers/cM when N e ≤ 30, while up to 20 markers/cM are necessary for larger N e. Shortening the breeding cycle by 50% with GS provides an increase ≥100% in selection efficiency. With the rapid technological advances and declining costs of genotyping, our cautiously optimistic outlook is that GS has great potential to accelerate tree breeding. However, further simulation studies and proof-of-concept experiments of GS are needed before recommending it for operational implementation.  相似文献   

6.

Background

Recent developments in SNP discovery and high throughput genotyping technology have made the use of high-density SNP markers to predict breeding values feasible. This involves estimation of the SNP effects in a training data set, and use of these estimates to evaluate the breeding values of other ''evaluation'' individuals. Simulation studies have shown that these predictions of breeding values can be accurate, when training and evaluation individuals are (closely) related. However, many general applications of genomic selection require the prediction of breeding values of ''unrelated'' individuals, i.e. individuals from the same population, but not particularly closely related to the training individuals.

Methods

Accuracy of selection was investigated by computer simulation of small populations. Using scaling arguments, the results were extended to different populations, training data sets and genome sizes, and different trait heritabilities.

Results

Prediction of breeding values of unrelated individuals required a substantially higher marker density and number of training records than when prediction individuals were offspring of training individuals. However, when the number of records was 2*Ne*L and the number of markers was 10*Ne*L, the breeding values of unrelated individuals could be predicted with accuracies of 0.88 – 0.93, where Ne is the effective population size and L the genome size in Morgan. Reducing this requirement to 1*Ne*L individuals, reduced prediction accuracies to 0.73–0.83.

Conclusion

For livestock populations, 1NeL requires about ~30,000 training records, but this may be reduced if training and evaluation animals are related. A prediction equation is presented, that predicts accuracy when training and evaluation individuals are related. For humans, 1NeL requires ~350,000 individuals, which means that human disease risk prediction is possible only for diseases that are determined by a limited number of genes. Otherwise, genotyping and phenotypic recording need to become very common in the future.  相似文献   

7.
The effective population size (Ne) is a key parameter to quantify the magnitude of genetic drift and inbreeding, with important implications in human evolution. The increasing availability of high-density genetic markers allows the estimation of historical changes in Ne across time using measures of genome diversity or linkage disequilibrium between markers. Directional selection is expected to reduce diversity and Ne, and this reduction is modulated by the heterogeneity of the genome in terms of recombination rate. Here we investigate by computer simulations the consequences of selection (both positive and negative) and recombination rate heterogeneity in the estimation of historical Ne. We also investigate the relationship between diversity parameters and Ne across the different regions of the genome using human marker data. We show that the estimates of historical Ne obtained from linkage disequilibrium between markers (NeLD) are virtually unaffected by selection. In contrast, those estimates obtained by coalescence mutation-recombination-based methods can be strongly affected by it, which could have important consequences for the estimation of human demography. The simulation results are supported by the analysis of human data. The estimates of NeLD obtained for particular genomic regions do not correlate, or they do it very weakly, with recombination rate, nucleotide diversity, proportion of polymorphic sites, background selection statistic, minor allele frequency of SNPs, loss of function and missense variants and gene density. This suggests that NeLD measures mainly reflect demographic changes in population size across generations.  相似文献   

8.
Genomic regions under high selective pressure present specific runs of homozygosity (ROH), which provide valuable information on the genetic mechanisms underlying the adaptation to environment imposed challenges. In broiler chickens, the adaptation to conventional production systems in tropical environments lead the animals with favorable genotypes to be naturally selected, increasing the frequency of these alleles in the next generations. In this study, ~1400 chickens from a paternal broiler line were genotyped with the 600 K Affymetrix® Axiom® high-density (HD) genotyping array for estimation of linkage disequilibrium (LD), effective population size (Ne), inbreeding and ROH. The average LD between adjacent single nucleotide polymorphisms (SNPs) in all autosomes was 0.37, and the LD decay was higher in microchromosomes followed by intermediate and macrochromosomes. The Ne of the ancestral population was high and declined over time maintaining a sufficient number of animals to keep the inbreeding coefficient of this population at low levels. The ROH analysis revealed genomic regions that harbor genes associated with homeostasis maintenance and immune system mechanisms, which may have been selected in response to heat stress. Our results give a comprehensive insight into the relationship between shared ROH regions and putative regions related to survival and production traits in a paternal broiler line selected for over 20 years. These findings contribute to the understanding of the effects of environmental and artificial selection in shaping the distribution of functional variants in the chicken genome.  相似文献   

9.
The fluctuation of population size has not been well studied in the previous studies of theoretical linkage disequilibrium (LD) expectation. In this study, an improved theoretical prediction of LD decay was derived to account for the effects of changes in effective population sizes. The equation was used to estimate effective population size (Ne) assuming a constant Ne and LD at equilibrium, and these Ne estimates implied the past changes of Ne for a certain number of generations until equilibrium, which differed based on recombination rate. As the influence of recent population history on the Ne estimates is larger than old population history, recent changes in population size can be inferred more accurately than old changes. The theoretical predictions based on this improved expression showed accurate agreement with the simulated values. When applied to human genome data, the detailed recent history of human populations was obtained. The inferred past population history of each population showed good correspondence with historical studies. Specifically, four populations (three African ancestries and one Mexican ancestry) showed population growth that was significantly less than that of other populations, and two populations originated from China showed prominent exponential growth. During the examination of overall LD decay in the human genome, a selection pressure on chromosome 14, the gephyrin gene, was observed in all populations.  相似文献   

10.
Whole-genome resequencing technology has improved rapidly during recent years and is expected to improve further such that the sequencing of an entire human genome sequence for $1000 is within reach. Our main aim here is to use whole-genome sequence data for the prediction of genetic values of individuals for complex traits and to explore the accuracy of such predictions. This is relevant for the fields of plant and animal breeding and, in human genetics, for the prediction of an individual''s risk for complex diseases. Here, population history and genomic architectures were simulated under the Wright–Fisher population and infinite-sites mutation model, and prediction of genetic value was by the genomic selection approach, where a Bayesian nonlinear model was used to predict the effects of individual SNPs. The Bayesian model assumed a priori that only few SNPs are causative, i.e., have an effect different from zero. When using whole-genome sequence data, accuracies of prediction of genetic value were >40% increased relative to the use of dense ∼30K SNP chips. At equal high density, the inclusion of the causative mutations yielded an extra increase of accuracy of 2.5–3.7%. Predictions of genetic value remained accurate even when the training and evaluation data were 10 generations apart. Best linear unbiased prediction (BLUP) of SNP effects does not take full advantage of the genome sequence data, and nonlinear predictions, such as the Bayesian method used here, are needed to achieve maximum accuracy. On the basis of theoretical work, the results could be extended to more realistic genome and population sizes.GENOME resequencing technologies are currently developing at a very rapid rate, which we for simplicity call genome sequencing even though it is used on a species with a reference sequence. The current generation sequencing technology is two orders of magnitude faster and more cost effective than the technologies used for the sequencing of the human genome (Shendure and Ji 2008; TenBosch and Grody 2008). Future technologies are expected to reduce cost by another 100-fold so that sequencing an entire human genome for $1000 is considered achievable in the near future (Mardis 2008). The question arises: How can we make best use of entire genome sequence data on many individuals? One use will be the ability to predict the genetic value of an individual for complex traits. In the fields of animal and plant breeding, this would be of great practical benefit because most important traits are complex, quantitative traits, i.e., traits that are affected by many genes and by the environment. In humans the promise of personalized medicine relies on the ability to predict an individual''s genetic risk for complex, multifactorial diseases, such as Crohn''s disease (Barrett et al. 2008), and the ability to predict response to alternative treatments. The first aim of this article is to explore the accuracy of this prediction using the full genome sequence of the individual.The use of high-density SNP genotype data to predict genetic value, called genomic selection, was first proposed by Meuwissen et al. (2001). In its most sophisticated form, a Bayesian model was used to predict the effects of thousands of SNPs on the total genetic value simultaneously, where a priori it was assumed that only few SNPs were useful for predicting the trait [because they were in linkage disequilibrium (LD) with mutations causing variation in the trait], while many SNPs were not useful. Even among the SNPs that were useful for prediction, it was assumed that the distribution of effects was not normal because there were occasionally SNPs in LD with quantitative trait loci (QTL) that may occasionally have very large effect. To model this, the distribution of SNP effects was assumed to follow a distribution with thicker tails than the normal distribution (e.g., the t-distribution is often used). In the case of whole-genome sequence data, the polymorphisms that are causing the genetic differences between the individuals are among those being analyzed. For the sake of simplicity we call all polymorphisms in the sequence data SNPs while recognizing that other types of polymorphisms such as indels will be included. Assuming that the causal SNPs are included in the analysis simplifies the prior distribution of the SNP effects, because the effects of all the other SNPs, even if they are in LD with the causal SNPs, are expected to disappear. Thus, the prior distribution simplifies to the fact that some SNPs are expected to be causative and have an effect drawn from the distribution of the gene effects. The distribution of gene effects is investigated extensively in the evolutionary and other literature and is reported to be gamma (Hayes and Goddard 2001) or exponentially distributed (Erickson et al. 2004; Rocha et al. 2004), where the latter is a special form of the gamma distribution. On the downside, whole-genome sequence data will contain millions of SNPs and it may be difficult for genomic selection to separate the relatively few causative SNPs from all the others.Meuwissen et al. (2001) also investigated a model in which all SNPs were assumed to have an effect drawn from the same normal distribution [the so-called genome-wide best linear unbiased prediction (GWBLUP) model]. Although this model seems biologically implausible, it has been found to perform well in data from dairy cattle (VanRaden et al. 2009). However, we hypothesize that with sequence level data the BLUP model will not perform as well as models that assume that only some causal SNPs need to be included in the model.The aims here are to investigate the following: how accurately genetic values for complex traits can be predicted by genomic selection when whole-genome sequence data are available on a large number of individuals; whether it makes a difference to have the whole-genome sequence available, including the causative mutations, vs. very dense SNP marker genotypes; whether the estimates of the SNP effects can be used on individuals that are many generations separated from the data set in which they were estimated; the effect of the statistical model used on accuracy of prediction; and how accurately causative mutations can be detected and mapped. Because whole-genome sequence data on many individuals are not yet available, and because we needed to know the true genetic values of the individuals, the aforementioned questions were investigated by computer simulations of whole-genome sequence data.  相似文献   

11.
The routine collection and use of genomic data are useful for effectively managing breeding programs for endangered populations. Linkage disequilibrium (LD) using high‐density DNA markers has been widely used to determine population structures and predict the genomic regions that are associated with economic traits in beef cattle. The extent of LD also provides information about historical events, including past effective population size (Ne), and it allows inferences on the genetic diversity of breeds. The objective of this study was to estimate the LD and Ne in three Korean cattle breeds that are genetically similar but have different coat colors (Brown, Brindle and Jeju Black Hanwoo). Brindle and Jeju Black are endangered breeds with small populations, whereas Brown Hanwoo is the main breeding population in Korea. DNA samples from these cattle breeds were genotyped using the Illumina BovineSNP50 Bead Chip. We examined 13 cattle breeds, including European taurines, African taurines and indicines, and hybrids to compare their LD values. Brown Hanwoo consistently had the lowest mean LD compared to Jeju Black, Brindle and the other 13 cattle breeds (0.13, 0.19, 0.21 and 0.15–0.22 respectively). The high LD values of Brindle and Jeju Black contributed to small Ne values (53 and 60 respectively), which were distinct from that of Brown Hanwoo (531) for 11 generations ago. The differences in LD and Ne for each breed reflect the breeding strategy applied. The Ne for these endangered cattle breeds remain low; thus, effort is needed to bring them back to a sustainable tract.  相似文献   

12.

Background

Next-generation sequencing techniques, such as genotyping-by-sequencing (GBS), provide alternatives to single nucleotide polymorphism (SNP) arrays. The aim of this work was to evaluate the potential of GBS compared to SNP array genotyping for genomic selection in livestock populations.

Methods

The value of GBS was quantified by simulation analyses in which three parameters were varied: (i) genome-wide sequence read depth (x) per individual from 0.01x to 20x or using SNP array genotyping; (ii) number of genotyped markers from 3000 to 300 000; and (iii) size of training and prediction sets from 500 to 50 000 individuals. The latter was achieved by distributing the total available x of 1000x, 5000x, or 10 000x per genotyped locus among the varying number of individuals. With SNP arrays, genotypes were called from sequence data directly. With GBS, genotypes were called from sequence reads that varied between loci and individuals according to a Poisson distribution with mean equal to x. Simulated data were analyzed with ridge regression and the accuracy and bias of genomic predictions and response to selection were quantified under the different scenarios.

Results

Accuracies of genomic predictions using GBS data or SNP array data were comparable when large numbers of markers were used and x per individual was ~1x or higher. The bias of genomic predictions was very high at a very low x. When the total available x was distributed among the training individuals, the accuracy of prediction was maximized when a large number of individuals was used that had GBS data with low x for a large number of markers. Similarly, response to selection was maximized under the same conditions due to increasing both accuracy and selection intensity.

Conclusions

GBS offers great potential for developing genomic selection in livestock populations because it makes it possible to cover large fractions of the genome and to vary the sequence read depth per individual. Thus, the accuracy of predictions is improved by increasing the size of training populations and the intensity of selection is increased by genotyping a larger number of selection candidates.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0102-z) contains supplementary material, which is available to authorized users.  相似文献   

13.
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.  相似文献   

14.

Background

The impact of additive-genetic relationships captured by single nucleotide polymorphisms (SNPs) on the accuracy of genomic breeding values (GEBVs) has been demonstrated, but recent studies on data obtained from Holstein populations have ignored this fact. However, this impact and the accuracy of GEBVs due to linkage disequilibrium (LD), which is fairly persistent over generations, must be known to implement future breeding programs.

Materials and methods

The data set used to investigate these questions consisted of 3,863 German Holstein bulls genotyped for 54,001 SNPs, their pedigree and daughter yield deviations for milk yield, fat yield, protein yield and somatic cell score. A cross-validation methodology was applied, where the maximum additive-genetic relationship (amax) between bulls in training and validation was controlled. GEBVs were estimated by a Bayesian model averaging approach (BayesB) and an animal model using the genomic relationship matrix (G-BLUP). The accuracy of GEBVs due to LD was estimated by a regression approach using accuracy of GEBVs and accuracy of pedigree-based BLUP-EBVs.

Results

Accuracy of GEBVs obtained by both BayesB and G-BLUP decreased with decreasing amax for all traits analyzed. The decay of accuracy tended to be larger for G-BLUP and with smaller training size. Differences between BayesB and G-BLUP became evident for the accuracy due to LD, where BayesB clearly outperformed G-BLUP with increasing training size.

Conclusions

GEBV accuracy of current selection candidates varies due to different additive-genetic relationships relative to the training data. Accuracy of future candidates can be lower than reported in previous studies because information from close relatives will not be available when selection on GEBVs is applied. A Bayesian model averaging approach exploits LD information considerably better than G-BLUP and thus is the most promising method. Cross-validations should account for family structure in the data to allow for long-lasting genomic based breeding plans in animal and plant breeding.  相似文献   

15.
We propose an extended Gaussian mixture model for the distribution of causal effects of common single nucleotide polymorphisms (SNPs) for human complex phenotypes that depends on linkage disequilibrium (LD) and heterozygosity (H), while also allowing for independent components for small and large effects. Using a precise methodology showing how genome-wide association studies (GWASs) summary statistics (z-scores) arise through LD with underlying causal SNPs, we applied the model to GWAS of multiple human phenotypes. Our findings indicated that causal effects are distributed with dependence on total LD and H, whereby SNPs with lower total LD and H are more likely to be causal with larger effects; this dependence is consistent with models of the influence of negative pressure from natural selection. Compared with the basic Gaussian mixture model it is built on, the extended model—primarily through quantification of selection pressure—reproduces with greater accuracy the empirical distributions of z-scores, thus providing better estimates of genetic quantities, such as polygenicity and heritability, that arise from the distribution of causal effects.  相似文献   

16.
The evolutionary transition from outcrossing to selfing can have important genomic consequences. Decreased effective population size and the reduced efficacy of selection are predicted to play an important role in the molecular evolution of the genomes of selfing species. We investigated evidence for molecular signatures of the genomic selfing syndrome using 66 species of Primula including distylous (outcrossing) and derived homostylous (selfing) taxa. We complemented our comparative analysis with a microevolutionary study of P. chungensis, which is polymorphic for mating system and consists of both distylous and homostylous populations. We generated chloroplast and nuclear genomic data sets for distylous, homostylous, and distylous–homostylous species and identified patterns of nonsynonymous to synonymous divergence (dN/dS) and polymorphism (πN/πS) in species or lineages with contrasting mating systems. Our analysis of coding sequence divergence and polymorphism detected strongly reduced genetic diversity and heterozygosity, decreased efficacy of purifying selection, purging of large-effect deleterious mutations, and lower rates of adaptive evolution in samples from homostylous compared with distylous populations, consistent with theoretical expectations of the genomic selfing syndrome. Our results demonstrate that self-fertilization is a major driver of molecular evolutionary processes with genomic signatures of selfing evident in both old and relatively young homostylous populations.  相似文献   

17.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.  相似文献   

18.

Background

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.

Results

Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.

Conclusions

Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.  相似文献   

19.
The transition from outcrossing to selfing is predicted to reduce the genome-wide efficacy of selection because of the lower effective population size (Ne) that accompanies this change in mating system. However, strongly recessive deleterious mutations exposed in the homozygous backgrounds of selfers should be under strong purifying selection. Here, we examine estimates of the distribution of fitness effects (DFE) and changes in the magnitude of effective selection coefficients (Nes) acting on mutations during the transition from outcrossing to selfing. Using forward simulations, we investigated the ability of a DFE inference approach to detect the joint influence of mating system and the dominance of deleterious mutations on selection efficacy. We investigated predictions from our simulations in the annual plant Eichhornia paniculata, in which selfing has evolved from outcrossing on multiple occasions. We used range-wide sampling to generate population genomic datasets and identified nonsynonymous and synonymous polymorphisms segregating in outcrossing and selfing populations. We found that the transition to selfing was accompanied by a change in the DFE, with a larger fraction of effectively neutral sites (Nes < 1), a result consistent with the effects of reduced Ne in selfers. Moreover, an increased proportion of sites in selfers were under strong purifying selection (Nes > 100), and simulations suggest that this is due to the exposure of recessive deleterious mutations. We conclude that the transition to selfing has been accompanied by the genome-wide influences of reduced Ne and strong purifying selection against deleterious recessive mutations, an example of purging at the molecular level.  相似文献   

20.
R J Haasl  B A Payseur 《Heredity》2011,106(1):158-171
Although growing numbers of single nucleotide polymorphisms (SNPs) and microsatellites (short tandem repeat polymorphisms or STRPs) are used to infer population structure, their relative properties in this context remain poorly understood. SNPs and STRPs mutate differently, suggesting multi-locus genotypes at these loci might differ in ability to detect population structure. Here, we use coalescent simulations to measure the power of sets of SNPs and STRPs to identify population structure. To maximize the applicability of our results to empirical studies, we focus on the popular STRUCTURE analysis and evaluate the role of several biological and practical factors in the detection of population structure. We find that: (1) fewer unlinked STRPs than SNPs are needed to detect structure at recent divergence times <0.3 Ne generations; (2) accurate estimation of the number of populations requires many fewer STRPs than SNPs; (3) for both marker types, declines in power due to modest gene flow (Nem=1.0) are largely negated by increasing marker number; (4) variation in the STRP mutational model affects power modestly; (5) SNP haplotypes (θ=1, no recombination) provide power comparable with STRP loci (θ=10); (6) ascertainment schemes that select highly variable STRP or SNP loci increase power to detect structure, though ascertained data may not be suitable to other inference; and (7) when samples are drawn from an admixed population and one of its parent populations, the reduction in power to detect two populations is greater for STRPs than SNPs. These results should assist the design of multi-locus studies to detect population structure in nature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号