首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.Subject terms: Quantitative trait, Genetic models  相似文献   

2.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.  相似文献   

3.

Background

The impact of additive-genetic relationships captured by single nucleotide polymorphisms (SNPs) on the accuracy of genomic breeding values (GEBVs) has been demonstrated, but recent studies on data obtained from Holstein populations have ignored this fact. However, this impact and the accuracy of GEBVs due to linkage disequilibrium (LD), which is fairly persistent over generations, must be known to implement future breeding programs.

Materials and methods

The data set used to investigate these questions consisted of 3,863 German Holstein bulls genotyped for 54,001 SNPs, their pedigree and daughter yield deviations for milk yield, fat yield, protein yield and somatic cell score. A cross-validation methodology was applied, where the maximum additive-genetic relationship (amax) between bulls in training and validation was controlled. GEBVs were estimated by a Bayesian model averaging approach (BayesB) and an animal model using the genomic relationship matrix (G-BLUP). The accuracy of GEBVs due to LD was estimated by a regression approach using accuracy of GEBVs and accuracy of pedigree-based BLUP-EBVs.

Results

Accuracy of GEBVs obtained by both BayesB and G-BLUP decreased with decreasing amax for all traits analyzed. The decay of accuracy tended to be larger for G-BLUP and with smaller training size. Differences between BayesB and G-BLUP became evident for the accuracy due to LD, where BayesB clearly outperformed G-BLUP with increasing training size.

Conclusions

GEBV accuracy of current selection candidates varies due to different additive-genetic relationships relative to the training data. Accuracy of future candidates can be lower than reported in previous studies because information from close relatives will not be available when selection on GEBVs is applied. A Bayesian model averaging approach exploits LD information considerably better than G-BLUP and thus is the most promising method. Cross-validations should account for family structure in the data to allow for long-lasting genomic based breeding plans in animal and plant breeding.  相似文献   

4.
Genomic prediction utilizes single nucleotide polymorphism (SNP) chip data to predict animal genetic merit. It has the advantage of potentially capturing the effects of the majority of loci that contribute to genetic variation in a trait, even when the effects of the individual loci are very small. To implement genomic prediction, marker effects are estimated with a training set, including individuals with marker genotypes and trait phenotypes; subsequently, genomic estimated breeding values (GEBV) for any genotyped individual in the population can be calculated using the estimated marker effects. In this study, we aimed to: (i) evaluate the potential of genomic prediction to predict GEBV for nematode resistance traits and BW in sheep, within and across populations; (ii) evaluate the accuracy of these predictions through within-population cross-validation; and (iii) explore the impact of population structure on the accuracy of prediction. Four data sets comprising 752 lambs from a Scottish Blackface population, 2371 from a Sarda×Lacaune backcross population, 1000 from a Martinik Black-Belly×Romane backcross population and 64 from a British Texel population were used in this study. Traits available for the analysis were faecal egg count for Nematodirus and Strongyles and BW at different ages or as average effect, depending on the population. Moreover, immunoglobulin A was also available for the Scottish Blackface population. Results show that GEBV had moderate to good within-population predictive accuracy, whereas across-population predictions had accuracies close to zero. This can be explained by our finding that in most cases the accuracy estimates were mostly because of additive genetic relatedness between animals, rather than linkage disequilibrium between SNP and quantitative trait loci. Therefore, our results suggest that genomic prediction for nematode resistance and BW may be of value in closely related animals, but that with the current SNP chip genomic predictions are unlikely to work across breeds.  相似文献   

5.
The prediction accuracies of genomic selection depend on several factors, including the genetic architecture of target traits, the number of traits considered at a given time, and the statistical models. Here, we assessed the potential of single-trait (ST) and multi-trait (MT) genomic prediction models for durum wheat on yield and quality traits using a breeding panel (BP) of 170 varieties and advanced breeding lines, and a doubled-haploid (DH) population of 154 lines. The two populations were genotyped with the Infinium iSelect 90K SNP assay and phenotyped for various traits. Six ST-GS models (RR-BLUP, G-BLUP, BayesA, BayesB, Bayesian LASSO, and RKHS) and three MT prediction approaches (MT-BayesA, MT-Matrix, and MT-SI approaches which use economic selection index as a trait value) were applied for predicting yield, protein content, gluten index, and alveograph measures. The ST prediction accuracies ranged from 0.5 to 0.8 for the various traits and models and revealed comparable prediction accuracies for most of the traits in both populations, except BayesA and BayesB, which better predicted gluten index, tenacity, and strength in the DH population. The MT-GS models were more accurate than the ST-GS models only for grain yield in the BP. Using BP as a training set to predict the DH population resulted in poor predictions. Overall, all the six ST-GS models appear to be applicable for GS of yield and gluten strength traits in durum wheat, but we recommend the simple computational models RR-BLUP or G-BLUP for predicating single trait and MT-SI for predicting yield and protein simultaneously.  相似文献   

6.

Background

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.

Results

Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.

Conclusions

Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.  相似文献   

7.
The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.  相似文献   

8.

Background

The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction.

Methods

Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we evaluated the following genomic prediction models: genome-enabled BLUP (GBLUP), ridge regression BLUP (RRBLUP), principal component analysis followed by ridge regression (RRPCA), BayesC and Bayesian stochastic search variable selection. Prediction accuracy was measured as the correlation between predicted breeding values and observed phenotypes divided by the square root of the heritability. The data used concerned laying hens with phenotypes for number of eggs in the first production period and known genotypes. The hens were from two closely-related brown layer lines (B1 and B2), and a third distantly-related white layer line (W1). Lines had 1004 to 1023 training animals and 238 to 240 validation animals. Training datasets consisted of animals of either single lines, or a combination of two or all three lines, and had 30 508 to 45 974 segregating single nucleotide polymorphisms.

Results

Genomic prediction models yielded 0.13 to 0.16 higher accuracies than pedigree-based BLUP. When excluding the line itself from the training dataset, genomic predictions were generally inaccurate. Use of multiple lines marginally improved prediction accuracy for B2 but did not affect or slightly decreased prediction accuracy for B1 and W1. Differences between models were generally small except for RRPCA which gave considerably higher accuracies for B2. Correlations between genomic predictions from different methods were higher than 0.96 for W1 and higher than 0.88 for B1 and B2. The greater differences between methods for B1 and B2 were probably due to the lower accuracy of predictions for B1 (~0.45) and B2 (~0.40) compared to W1 (~0.76).

Conclusions

Multi-line genomic prediction did not affect or slightly improved prediction accuracy for closely-related lines. For distantly-related lines, multi-line genomic prediction yielded similar or slightly lower accuracies than single-line genomic prediction. Bayesian variable selection and GBLUP generally gave similar accuracies. Overall, RRPCA yielded the greatest accuracies for two lines, suggesting that using PCA helps to alleviate the “n ≪ p” problem in genomic prediction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0057-5) contains supplementary material, which is available to authorized users.  相似文献   

9.
Fruit quality is polygenic; each component has variable heritability and is difficult to assess. Genomic selection, which allows the prediction of phenotypes based on the whole-genome genotype, could vastly help to improve fruit quality. The goal of this study is to evaluate the accuracy of genomic selection for several metabolomic and quality traits by cross-validation and to estimate the impact of different factors on its accuracy. We analyzed data from 45 phenotypic traits and genotypic data obtained from a previous study of genetic association on a collection of 163 tomato accessions. We tested the influence of (1) the size of training population, (2) the number and density of molecular markers and (3) individual relatedness on the accuracy of prediction. The prediction accuracy of phenotypic values was largely related to the heritability of the traits. The size of training population increased the accuracy of predictions. Using 122 accessions and 5995 single nucleotide polymorphisms (SNPs) was the optimal condition. The density of markers and their numbers also affected the accuracy of the prediction. Using 2313 SNP markers distributed 0.1 cM or more apart from each other reduced the accuracy of prediction, and no gain in prediction accuracy was found when more markers were used in the model. Additionally, the more accessions were related, the more accurate were the predictions. Finally, the structure of the population negatively affected the prediction accuracy. In conclusion, the results obtained by cross-validation illustrated the effect of several parameters on the accuracy of prediction and revealed the potential of genomic selection in tomato breeding programs.  相似文献   

10.
Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross‐validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross‐validations. The accuracies of GEBVs from random cross‐validations (range 0.17–0.61) were higher than were those from sire family cross‐validations (range 0.00–0.51). The GEBV accuracies of 0.41–0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy, it identified several candidate genes which are known to be associated with NLB and LSIZE. The approach provides a way to make use of all data available in genomic prediction for traits that have limited recording.  相似文献   

11.

Background

The increasing prevalence of bovine tuberculosis (bTB) in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs) for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates.

Methodology/Principal Findings

We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC) curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC-curve (AUC). The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls). All individuals (592 cases and 559 controls) were genotyped for 727,252 loci (Illumina Bead Chip). The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale) and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40). ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data.

Conclusions/Significance

These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking bTB phenotypes. However, a larger training population will be required to improve prediction accuracies.  相似文献   

12.

Background

In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.

Methods

Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.

Results

Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.

Conclusions

Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0149-x) contains supplementary material, which is available to authorized users.  相似文献   

13.
T Druet  I M Macleod  B J Hayes 《Heredity》2014,112(1):39-47
Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.  相似文献   

14.
The current approach to using machine learning (ML) algorithms in healthcare is to either require clinician oversight for every use case or use their predictions without any human oversight. We explore a middle ground that lets ML algorithms abstain from making a prediction to simultaneously improve their reliability and reduce the burden placed on human experts. To this end, we present a general penalized loss minimization framework for training selective prediction-set (SPS) models, which choose to either output a prediction set or abstain. The resulting models abstain when the outcome is difficult to predict accurately, such as on subjects who are too different from the training data, and achieve higher accuracy on those they do give predictions for. We then introduce a model-agnostic, statistical inference procedure for the coverage rate of an SPS model that ensembles individual models trained using K-fold cross-validation. We find that SPS ensembles attain prediction-set coverage rates closer to the nominal level and have narrower confidence intervals for its marginal coverage rate. We apply our method to train neural networks that abstain more for out-of-sample images on the MNIST digit prediction task and achieve higher predictive accuracy for ICU patients compared to existing approaches.  相似文献   

15.
Maize (Zea mays L.) serves as model plant for heterosis research and is the crop where hybrid breeding was pioneered. We analyzed genomic and phenotypic data of 1254 hybrids of a typical maize hybrid breeding program based on the important Dent × Flint heterotic pattern. Our main objectives were to investigate genome properties of the parental lines (e.g., allele frequencies, linkage disequilibrium, and phases) and examine the prospects of genomic prediction of hybrid performance. We found high consistency of linkage phases and large differences in allele frequencies between the Dent and Flint heterotic groups in pericentromeric regions. These results can be explained by the Hill–Robertson effect and support the hypothesis of differential fixation of alleles due to pseudo-overdominance in these regions. In pericentromeric regions we also found indications for consistent marker–QTL linkage between heterotic groups. With prediction methods GBLUP and BayesB, the cross-validation prediction accuracy ranged from 0.75 to 0.92 for grain yield and from 0.59 to 0.95 for grain moisture. The prediction accuracy of untested hybrids was highest, if both parents were parents of other hybrids in the training set, and lowest, if none of them were involved in any training set hybrid. Optimizing the composition of the training set in terms of number of lines and hybrids per line could further increase prediction accuracy. We conclude that genomic prediction facilitates a paradigm shift in hybrid breeding by focusing on the performance of experimental hybrids rather than the performance of parental lines in testcrosses.  相似文献   

16.

Background

A haplotype approach to genomic prediction using high density data in dairy cattle as an alternative to single-marker methods is presented. With the assumption that haplotypes are in stronger linkage disequilibrium (LD) with quantitative trait loci (QTL) than single markers, this study focuses on the use of haplotype blocks (haploblocks) as explanatory variables for genomic prediction. Haploblocks were built based on the LD between markers, which allowed variable reduction. The haploblocks were then used to predict three economically important traits (milk protein, fertility and mastitis) in the Nordic Holstein population.

Results

The haploblock approach improved prediction accuracy compared with the commonly used individual single nucleotide polymorphism (SNP) approach. Furthermore, using an average LD threshold to define the haploblocks (LD≥0.45 between any two markers) increased the prediction accuracies for all three traits, although the improvement was most significant for milk protein (up to 3.1 % improvement in prediction accuracy, compared with the individual SNP approach). Hotelling’s t-tests were performed, confirming the improvement in prediction accuracy for milk protein. Because the phenotypic values were in the form of de-regressed proofs, the improved accuracy for milk protein may be due to higher reliability of the data for this trait compared with the reliability of the mastitis and fertility data. Comparisons between best linear unbiased prediction (BLUP) and Bayesian mixture models also indicated that the Bayesian model produced the most accurate predictions in every scenario for the milk protein trait, and in some scenarios for fertility.

Conclusions

The haploblock approach to genomic prediction is a promising method for genomic selection in animal breeding. Building haploblocks based on LD reduced the number of variables without the loss of information. This method may play an important role in the future genomic prediction involving while genome sequences.  相似文献   

17.
18.

Background

Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which reference individuals and selection candidates are from different populations, and to investigate the impact of differences in allele substitution effects across populations and of the number of QTL underlying a trait on the accuracy.

Methods

A deterministic formula to estimate the accuracy of across-population genomic prediction was derived based on selection index theory. Moreover, accuracies were deterministically predicted using a formula based on population parameters and empirically calculated using simulated phenotypes and a GBLUP (genomic best linear unbiased prediction) model. Phenotypes of 1033 Holstein-Friesian, 105 Groninger White Headed and 147 Meuse-Rhine-Yssel cows were simulated by sampling 3000, 300, 30 or 3 QTL from the available high-density SNP (single nucleotide polymorphism) information of three chromosomes, assuming a correlation of 1.0, 0.8, 0.6, 0.4, or 0.2 between allele substitution effects across breeds. The simulated heritability was set to 0.95 to resemble the heritability of deregressed proofs of bulls.

Results

Accuracies estimated with the deterministic formula based on selection index theory were similar to empirical accuracies for all scenarios, while accuracies predicted with the formula based on population parameters overestimated empirical accuracies by ~25 to 30%. When the between-breed genetic correlation differed from 1, i.e. allele substitution effects differed across breeds, empirical and deterministic accuracies decreased in proportion to the genetic correlation. Using a multi-trait model, it was possible to accurately estimate the genetic correlation between the breeds based on phenotypes and high-density genotypes. The number of QTL underlying the simulated trait did not affect the accuracy.

Conclusions

The deterministic formula based on selection index theory estimated the accuracy of across-population genomic predictions well. The deterministic formula using population parameters overestimated the across-population genomic accuracy, but may still be useful because of its simplicity. Both formulas could accommodate for genetic correlations between populations lower than 1. The number of QTL underlying a trait did not affect the accuracy of across-population genomic prediction using a GBLUP method.  相似文献   

19.
Accuracy of predicting genomic breeding values for carcass merit traits including hot carcass weight, longissimus muscle area (REA), carcass average backfat thickness (AFAT), lean meat yield (LMY) and carcass marbling score (CMAR) was evaluated based on 543 Angus and 400 Charolais steers genotyped on the Illumina BovineSNP50 Beadchip. For the genomic prediction within Angus, the average accuracy was 0.35 with a range from 0.32 (LMY) to 0.37 (CMAR) across different training/validation data‐splitting strategies and statistical methods. The within‐breed genomic prediction for Charolais yielded an average accuracy of 0.36 with a range from 0.24 (REA) to 0.46 (AFAT). The across‐breed prediction had the lowest accuracy, which was on average near zero. When the data from the two breeds were combined to predict the breeding values of either breed, the prediction accuracy averaged 0.35 for Angus with a range from 0.33 (REA) to 0.39 (CMAR) and averaged 0.33 for Charolais with a range from 0.18 (REA) to 0.46 (AFAT). The prediction accuracy was slightly higher on average when the data were split by animal's birth year than when the data were split by sire family. These results demonstrate that the genetic relationship or relatedness of selection candidates with the training population has a great impact on the accuracy of predicting genomic breeding values under the density of the marker panel used in this study.  相似文献   

20.

Background

Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.

Methods

Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls.

Results

For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy.All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time.

Conclusions

The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号