首页 | 本学科首页   官方微博 | 高级检索  
     


Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers
Authors:José Crossa  Gustavo de los Campos  Paulino Pérez  Daniel Gianola  Juan Burgue?o  José Luis Araus  Dan Makumbi  Ravi P. Singh  Susanne Dreisigacker  Jianbing Yan  Vivi Arief  Marianne Banziger  Hans-Joachim Braun
Abstract:The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.PEDIGREE-BASED prediction of genetic values based on the additive infinitesimal model (Fisher 1918) has played a central role in genetic improvement of complex traits in plants and animals. Animal breeders have used this model for predicting breeding values either in a mixed model (best linear unbiased prediction, BLUP) (Henderson 1984) or in a Bayesian framework (Gianola and Fernando 1986). More recently, plant breeders have incorporated pedigree information into linear mixed models for predicting breeding values (Crossa et al. 2006, 2007; Oakey et al. 2006; Burgueño et al. 2007; Piepho et al. 2007).The availability of thousands of genome-wide molecular markers has made possible the use of genomic selection (GS) for prediction of genetic values (Meuwissen et al. 2001) in plants (e.g., Bernardo and Yu 2007; Piepho 2009; Jannink et al. 2010) and animals (Gonzalez-Recio et al. 2008; VanRaden et al. 2008; Hayes et al. 2009; de los Campos et al. 2009a). Implementing GS poses several statistical and computational challenges, such as how models can cope with the curse of dimensionality, colinearity between markers, or the complexity of quantitative traits. Parametric (e.g., Meuwissen et al. 2001) and semiparametric (e.g., Gianola et al. 2006; Gianola and van Kaam 2008) methods address these problems differently.In standard genetic models, phenotypic outcomes, , are viewed as the sum of a genetic value, , and a model residual, ; that is, . In parametric models for GS, is described as a regression on marker covariates (j = 1,  …  , p molecular markers) of the form , such that(or , in matrix notation), where is the regression of on the jth marker covariate .Estimation of via multiple regression by ordinary least squares (OLS) is not feasible when p > n. A commonly used alternative is to estimate marker effects jointly using penalized methods such as ridge regression (Hoerl and Kennard 1970) or the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) or their Bayesian counterpart. This approach yields greater accuracy of estimated genetic values and can be coupled with geostatistical techniques commonly used in plant breeding to model multienvironments trials (Piepho 2009).In ridge regression (or its Bayesian counterpart) the extent of shrinkage is homogeneous across markers, which may not be appropriate if some markers are located in regions that are not associated with genetic variance, while markers in other regions may be linked to QTL (Goddard and Hayes 2007). To overcome this limitation, many authors have proposed methods that use marker-specific shrinkage. In a Bayesian setting, this can be implemented using priors of marker effects that are mixtures of scaled-normal densities. Examples of this are methods Bayes A and Bayes B of Meuwissen et al. (2001) and the Bayesian LASSO of Park and Casella (2008).An alternative to parametric regressions is to use semiparametric methods such as reproducing kernel Hilbert spaces (RKHS) regression (Gianola and van Kaam 2008). The Bayesian RKHS regression regards genetic values as random variables coming from a Gaussian process centered at zero and with a (co)variance structure that is proportional to a kernel matrix K (de los Campos et al. 2009b); that is, , where , are vectors of marker genotypes for the ith and jth individuals, respectively, and is a positive definite function evaluated in marker genotypes. In a finite-dimensional setting this amounts to modeling the vector of genetic values, , as multivariate normal; that is, where is a variance parameter. One of the most attractive features of RKHS regression is that the methodology can be used with almost any information set (e.g., covariates, strings, images, graphs). A second advantage is that with RKHS the model is represented in terms of n unknowns, which gives RKHS a great computational advantage relative to some parametric methods, especially when pn.This study presents an evaluation of several methods for GS, using two extensive data sets. One contains phenotypic records of a series of wheat trials and recently generated genomic data. The other data set pertains to international maize trials in which different traits were measured in maize lines evaluated under severe drought and well-watered conditions.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号