首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bayesian LASSO for quantitative trait loci mapping   总被引:7,自引:1,他引:6       下载免费PDF全文
Yi N  Xu S 《Genetics》2008,179(2):1045-1055
The mapping of quantitative trait loci (QTL) is to identify molecular markers or genomic loci that influence the variation of complex traits. The problem is complicated by the facts that QTL data usually contain a large number of markers across the entire genome and most of them have little or no effect on the phenotype. In this article, we propose several Bayesian hierarchical models for mapping multiple QTL that simultaneously fit and estimate all possible genetic effects associated with all markers. The proposed models use prior distributions for the genetic effects that are scale mixtures of normal distributions with mean zero and variances distributed to give each effect a high probability of being near zero. We consider two types of priors for the variances, exponential and scaled inverse-chi(2) distributions, which result in a Bayesian version of the popular least absolute shrinkage and selection operator (LASSO) model and the well-known Student's t model, respectively. Unlike most applications where fixed values are preset for hyperparameters in the priors, we treat all hyperparameters as unknowns and estimate them along with other parameters. Markov chain Monte Carlo (MCMC) algorithms are developed to simulate the parameters from the posteriors. The methods are illustrated using well-known barley data.  相似文献   

2.

Background

The theory of genomic selection is based on the prediction of the effects of quantitative trait loci (QTL) in linkage disequilibrium (LD) with markers. However, there is increasing evidence that genomic selection also relies on "relationships" between individuals to accurately predict genetic values. Therefore, a better understanding of what genomic selection actually predicts is relevant so that appropriate methods of analysis are used in genomic evaluations.

Methods

Simulation was used to compare the performance of estimates of breeding values based on pedigree relationships (Best Linear Unbiased Prediction, BLUP), genomic relationships (gBLUP), and based on a Bayesian variable selection model (Bayes B) to estimate breeding values under a range of different underlying models of genetic variation. The effects of different marker densities and varying animal relationships were also examined.

Results

This study shows that genomic selection methods can predict a proportion of the additive genetic value when genetic variation is controlled by common quantitative trait loci (QTL model), rare loci (rare variant model), all loci (infinitesimal model) and a random association (a polygenic model). The Bayes B method was able to estimate breeding values more accurately than gBLUP under the QTL and rare variant models, for the alternative marker densities and reference populations. The Bayes B and gBLUP methods had similar accuracies under the infinitesimal model.

Conclusions

Our results suggest that Bayes B is superior to gBLUP to estimate breeding values from genomic data. The underlying model of genetic variation greatly affects the predictive ability of genomic selection methods, and the superiority of Bayes B over gBLUP is highly dependent on the presence of large QTL effects. The use of SNP sequence data will outperform the less dense marker panels. However, the size and distribution of QTL effects and the size of reference populations still greatly influence the effectiveness of using sequence data for genomic prediction.  相似文献   

3.
Large fruit size is a critical trait for any new sweet cherry (Prunus avium L.) cultivar, as it is directly related to grower profitability. Therefore, determining the genetic control of fruit size in relevant breeding germplasm is a high priority. The objectives of this study were (1) to determine the number and positions of quantitative trait loci (QTL) for sweet cherry fruit size utilizing data simultaneously from multiple families and their pedigreed ancestors, and (2) to estimate fruit size QTL genotype probabilities and genomic breeding values for the plant materials. The sweet cherry material used was a five-generation pedigree consisting of 23 founders and parents and 424 progeny individuals from four full-sib families, which were phenotyped for fruit size and genotyped with 78 RosCOS single nucleotide polymorphism and 86 simple sequence repeat markers. These data were analyzed by a Bayesian approach implemented in FlexQTL? software. Six QTL were identified: three on linkage group (G) 2 with one each on groups 1, 3, and 6. Of these QTL, the second G2 QTL and the G6 QTL were previously discovered while other QTL were novel. The predicted QTL genotypes show that some QTL were segregating in all families while other QTL were segregating in a subset of the families. The progeny varied for breeding value, with some progeny having higher breeding values than their parents. The results illustrate the use of multiple pedigree-linked families for integrated QTL mapping in an outbred crop to discover novel QTL and predict QTL genotypes and breeding values.  相似文献   

4.
The availability of high density panels of molecular markers has prompted the adoption of genomic selection (GS) methods in animal and plant breeding. In GS, parametric, semi-parametric and non-parametric regressions models are used for predicting quantitative traits. This article shows how to use neural networks with radial basis functions (RBFs) for prediction with dense molecular markers. We illustrate the use of the linear Bayesian LASSO regression model and of two non-linear regression models, reproducing kernel Hilbert spaces (RKHS) regression and radial basis function neural networks (RBFNN) on simulated data and real maize lines genotyped with 55,000 markers and evaluated for several trait-environment combinations. The empirical results of this study indicated that the three models showed similar overall prediction accuracy, with a slight and consistent superiority of RKHS and RBFNN over the additive Bayesian LASSO model. Results from the simulated data indicate that RKHS and RBFNN models captured epistatic effects; however, adding non-signal (redundant) predictors (interaction between markers) can adversely affect the predictive accuracy of the non-linear regression models.  相似文献   

5.

Key message

Proof of concept of Bayesian integrated QTL analyses across pedigree-related families from breeding programs of an outbreeding species. Results include QTL confidence intervals, individuals’ genotype probabilities and genomic breeding values.

Abstract

Bayesian QTL linkage mapping approaches offer the flexibility to study multiple full sib families with known pedigrees simultaneously. Such a joint analysis increases the probability of detecting these quantitative trait loci (QTL) and provide insight of the magnitude of QTL across different genetic backgrounds. Here, we present an improved Bayesian multi-QTL pedigree-based approach on an outcrossing species using progenies with different (complex) genetic relationships. Different modeling assumptions were studied in the QTL analyses, i.e., the a priori expected number of QTL varied and polygenic effects were considered. The inferences include number of QTL, additive QTL effect sizes and supporting credible intervals, posterior probabilities of QTL genotypes for all individuals in the dataset, and QTL-based as well as genome-wide breeding values. All these features have been implemented in the FlexQTL? software. We analyzed fruit firmness in a large apple dataset that comprised 1,347 individuals forming 27 full sib families and their known ancestral pedigrees, with genotypes for 87 SSR markers on 17 chromosomes. We report strong or positive evidence for 14 QTL for fruit firmness on eight chromosomes, validating our approach as several of these QTL were reported previously, though dispersed over a series of studies based on single mapping populations. Interpretation of linked QTL was possible via individuals’ QTL genotypes. The correlation between the genomic breeding values and phenotypes was on average 90 %, but varied with the number of detected QTL in a family. The detailed posterior knowledge on QTL of potential parents is critical for the efficiency of marker-assisted breeding.  相似文献   

6.
Xu S 《Biometrics》2007,63(2):513-521
Summary .   The genetic variance of a quantitative trait is often controlled by the segregation of multiple interacting loci. Linear model regression analysis is usually applied to estimating and testing effects of these quantitative trait loci (QTL). Including all the main effects and the effects of interaction (epistatic effects), the dimension of the linear model can be extremely high. Variable selection via stepwise regression or stochastic search variable selection (SSVS) is the common procedure for epistatic effect QTL analysis. These methods are computationally intensive, yet they may not be optimal. The LASSO (least absolute shrinkage and selection operator) method is computationally more efficient than the above methods. As a result, it has been widely used in regression analysis for large models. However, LASSO has never been applied to genetic mapping for epistatic QTL, where the number of model effects is typically many times larger than the sample size. In this study, we developed an empirical Bayes method (E-BAYES) to map epistatic QTL under the mixed model framework. We also tested the feasibility of using LASSO to estimate epistatic effects, examined the fully Bayesian SSVS, and reevaluated the penalized likelihood (PENAL) methods in mapping epistatic QTL. Simulation studies showed that all the above methods performed satisfactorily well. However, E-BAYES appears to outperform all other methods in terms of minimizing the mean-squared error (MSE) with relatively short computing time. Application of the new method to real data was demonstrated using a barley dataset.  相似文献   

7.
Genetic analysis across a whole plant genome based on pedigree information offers considerable potential for enhancing genetic gain from plant breeding programs through quantitative trait loci (QTL) mapping and marker-assisted selection. Here, we report its application for graphically genotyping varieties used in Chinese japonica rice (Oryza sativa L.) pedigree breeding programs. We identified 34 important chromosomal regions from the founder parent that are under selection in the breeding programs, and by comparing donor genomic regions that are under selection with QTL locations of agronomic traits, we found that QTL clustered in important genomic regions, in accordance with association analyses of natural populations and other previous studies. The convergence of genomic regions under selection with QTL locations suggests that donor genomic regions harboring key genes/QTL for important agronomic traits have been selected by plant breeders since the 1950s from the founder rice plants. The results provide better understanding of the effects of selection in breeding programs on the traits of rice cultivars. They also provide potentially valuable information for enhancing rice breeding programs through screening candidate parents for targeted molecular markers, improving crop yield potential and identifying suitable genetic material for use in future breeding programs.  相似文献   

8.
Sugar-related traits are of great importance in sugarcane breeding. In the present study, quantitative trait loci (QTL) mapping validated with association mapping was used to identify expressed sequence tag-simple sequence repeats (EST-SSRs) associated with sugar-related traits. For linkage mapping, 524 EST-SSRs, 241 Amplified Fragment Length Polymorphisms, and 10 genomic SSR markers were mapped using 283 F1 progenies derived from an interspecific cross. Six regions were identified using Multiple QTL Mapping, and 14 unlinked markers using single marker analysis. Association analysis was performed on a set of 200 accessions, based on the mixed linear model. Validation of the EST-SSR markers using association mapping within the target QTL genomic regions identified two EST-SSR markers showing a putative relationship with uridine diphosphate (UDP) glycosyltransferase, and beta-amylase, which are associated with pol and sugar yield. These functional markers can be used for marker-assisted selection of sugarcane.  相似文献   

9.
Genome-wide association and genomic selection in animal breeding   总被引:2,自引:0,他引:2  
Hayes B  Goddard M 《Génome》2010,53(11):876-883
Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.  相似文献   

10.
Genomic selection can increase genetic gain per generation through early selection. Genomic selection is expected to be particularly valuable for traits that are costly to phenotype and expressed late in the life cycle of long-lived species. Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties. Here the performance of four different original methods of genomic selection that differ with respect to assumptions regarding distribution of marker effects, including (i) ridge regression-best linear unbiased prediction (RR-BLUP), (ii) Bayes A, (iii) Bayes Cπ, and (iv) Bayesian LASSO are presented. In addition, a modified RR-BLUP (RR-BLUP B) that utilizes a selected subset of markers was evaluated. The accuracy of these methods was compared across 17 traits with distinct heritabilities and genetic architectures, including growth, development, and disease-resistance properties, measured in a Pinus taeda (loblolly pine) training population of 951 individuals genotyped with 4853 SNPs. The predictive ability of the methods was evaluated using a 10-fold, cross-validation approach, and differed only marginally for most method/trait combinations. Interestingly, for fusiform rust disease-resistance traits, Bayes Cπ, Bayes A, and RR-BLUB B had higher predictive ability than RR-BLUP and Bayesian LASSO. Fusiform rust is controlled by few genes of large effect. A limitation of RR-BLUP is the assumption of equal contribution of all markers to the observed variation. However, RR-BLUP B performed equally well as the Bayesian approaches.The genotypic and phenotypic data used in this study are publically available for comparative analysis of genomic selection prediction models.  相似文献   

11.

Background

Genomic selection makes it possible to reduce pedigree-based inbreeding over best linear unbiased prediction (BLUP) by increasing emphasis on own rather than family information. However, pedigree inbreeding might not accurately reflect loss of genetic variation and the true level of inbreeding due to changes in allele frequencies and hitch-hiking. This study aimed at understanding the impact of using long-term genomic selection on changes in allele frequencies, genetic variation and level of inbreeding.

Methods

Selection was performed in simulated scenarios with a population of 400 animals for 25 consecutive generations. Six genetic models were considered with different heritabilities and numbers of QTL (quantitative trait loci) affecting the trait. Four selection criteria were used, including selection on own phenotype and on estimated breeding values (EBV) derived using phenotype-BLUP, genomic BLUP and Bayesian Lasso. Changes in allele frequencies at QTL, markers and linked neutral loci were investigated for the different selection criteria and different scenarios, along with the loss of favourable alleles and the rate of inbreeding measured by pedigree and runs of homozygosity.

Results

For each selection criterion, hitch-hiking in the vicinity of the QTL appeared more extensive when accuracy of selection was higher and the number of QTL was lower. When inbreeding was measured by pedigree information, selection on genomic BLUP EBV resulted in lower levels of inbreeding than selection on phenotype BLUP EBV, but this did not always apply when inbreeding was measured by runs of homozygosity. Compared to genomic BLUP, selection on EBV from Bayesian Lasso led to less genetic drift, reduced loss of favourable alleles and more effectively controlled the rate of both pedigree and genomic inbreeding in all simulated scenarios. In addition, selection on EBV from Bayesian Lasso showed a higher selection differential for mendelian sampling terms than selection on genomic BLUP EBV.

Conclusions

Neutral variation can be shaped to a great extent by the hitch-hiking effects associated with selection, rather than just by genetic drift. When implementing long-term genomic selection, strategies for genomic control of inbreeding are essential, due to a considerable hitch-hiking effect, regardless of the method that is used for prediction of EBV.  相似文献   

12.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.  相似文献   

13.

Key message

Using newly developed euchromatin-derived genomic SSR markers and a flexible Bayesian mapping method, 13 significant agricultural QTLs were identified in a segregating population derived from a four-way cross of tomato.

Abstract

So far, many QTL mapping studies in tomato have been performed for progeny obtained from crosses between two genetically distant parents, e.g., domesticated tomatoes and wild relatives. However, QTL information of quantitative traits related to yield (e.g., flower or fruit number, and total or average weight of fruits) in such intercross populations would be of limited use for breeding commercial tomato cultivars because individuals in the populations have specific genetic backgrounds underlying extremely different phenotypes between the parents such as large fruit in domesticated tomatoes and small fruit in wild relatives, which may not be reflective of the genetic variation in tomato breeding populations. In this study, we constructed F2 population derived from a cross between two commercial F1 cultivars in tomato to extract QTL information practical for tomato breeding. This cross corresponded to a four-way cross, because the four parental lines of the two F1 cultivars were considered to be the founders. We developed 2510 new expressed sequence tag (EST)-based (euchromatin-derived) genomic SSR markers and selected 262 markers from these new SSR markers and publicly available SSR markers to construct a linkage map. QTL analysis for ten agricultural traits of tomato was performed based on the phenotypes and marker genotypes of F2 plants using a flexible Bayesian method. As results, 13 QTL regions were detected for six traits by the Bayesian method developed in this study.
  相似文献   

14.
Historically in plant breeding a large number of statistical models has been developed and used for studying genotype × environment interaction. These models have helped plant breeders to assess the stability of economically important traits and to predict the performance of newly developed genotypes evaluated under varying environmental conditions. In the last decade, the use of relatively low numbers of markers has facilitated the mapping of chromosome regions associated with phenotypic variability (e.g., QTL mapping) and, to a lesser extent, revealed the differetial response of these chromosome regions across environments (i.e., QTL × environment interaction). QTL technology has been useful for marker-assisted selection of simple traits; however, it has not been efficient for predicting complex traits affected by a large number of loci. Recently the appearance of cheap, abundant markers has made it possible to saturate the genome with high density markers and use marker information to predict genomic breeding values, thus increasing the precision of genetic value prediction over that achieved with the traditional use of pedigree information. Genomic data also allow assessing chromosome regions through marker effects and studying the pattern of covariablity of marker effects across differential environmental conditions. In this review, we outline the most important models for assessing genotype × environment interaction, QTL × environment interaction, and marker effect (gene) × environment interaction. Since analyzing genetic and genomic data is one of the most challenging statistical problems researchers currently face, different models from different areas of statistical research must be attempted in order to make significant progress in understanding genetic effects and their interaction with environment.  相似文献   

15.
Genomic best linear unbiased prediction (BLUP) is a statistical method that uses relationships between individuals calculated from single-nucleotide polymorphisms (SNPs) to capture relationships at quantitative trait loci (QTL). We show that genomic BLUP exploits not only linkage disequilibrium (LD) and additive-genetic relationships, but also cosegregation to capture relationships at QTL. Simulations were used to study the contributions of those types of information to accuracy of genomic estimated breeding values (GEBVs), their persistence over generations without retraining, and their effect on the correlation of GEBVs within families. We show that accuracy of GEBVs based on additive-genetic relationships can decline with increasing training data size and speculate that modeling polygenic effects via pedigree relationships jointly with genomic breeding values using Bayesian methods may prevent that decline. Cosegregation information from half sibs contributes little to accuracy of GEBVs in current dairy cattle breeding schemes but from full sibs it contributes considerably to accuracy within family in corn breeding. Cosegregation information also declines with increasing training data size, and its persistence over generations is lower than that of LD, suggesting the need to model LD and cosegregation explicitly. The correlation between GEBVs within families depends largely on additive-genetic relationship information, which is determined by the effective number of SNPs and training data size. As genomic BLUP cannot capture short-range LD information well, we recommend Bayesian methods with t-distributed priors.  相似文献   

16.
Fine mapping of quantitative trait loci (QTL) from previous linkage studies was performed on pig chromosomes 1, 4, 7, 8, 17, and X which were known to harbor QTL. Traits were divided into: growth performance, carcass, internal organs, cut yields, and meat quality. Fifty families were used of a F2 population produced by crossing local Brazilian Piau boars with commercial sows. The linkage map consisted of 237 SNP and 37 microsatellite markers covering 866 centimorgans. QTL were identified by regression interval mapping using GridQTL. Individual marker effects were estimated by Bayesian LASSO regression using R. In total, 32 QTL affecting the evaluated traits were detected along the chromosomes studied. Seven of the QTL were known from previous studies using our F2 population, and 25 novel QTL resulted from the increased marker coverage. Six of the seven QTL that were significant at the 5% genome-wide level had SNPs within their confidence interval whose effects were among the 5% largest effects. The combined use of microsatellites along with SNP markers increased the saturation of the genome map and led to smaller confidence intervals of the QTL. The results showed that the tested models yield similar improvements in QTL mapping accuracy.  相似文献   

17.

Background

Genomic selection has become an important tool in the genetic improvement of animals and plants. The objective of this study was to investigate the impacts of breeding value estimation method, reference population structure, and trait genetic architecture, on long-term response to genomic selection without updating marker effects.

Methods

Three methods were used to estimate genomic breeding values: a BLUP method with relationships estimated from genome-wide markers (GBLUP), a Bayesian method, and a partial least squares regression method (PLSR). A shallow (individuals from one generation) or deep reference population (individuals from five generations) was used with each method. The effects of the different selection approaches were compared under four different genetic architectures for the trait under selection. Selection was based on one of the three genomic breeding values, on pedigree BLUP breeding values, or performed at random. Selection continued for ten generations.

Results

Differences in long-term selection response were small. For a genetic architecture with a very small number of three to four quantitative trait loci (QTL), the Bayesian method achieved a response that was 0.05 to 0.1 genetic standard deviation higher than other methods in generation 10. For genetic architectures with approximately 30 to 300 QTL, PLSR (shallow reference) or GBLUP (deep reference) had an average advantage of 0.2 genetic standard deviation over the Bayesian method in generation 10. GBLUP resulted in 0.6% and 0.9% less inbreeding than PLSR and BM and on average a one third smaller reduction of genetic variance. Responses in early generations were greater with the shallow reference population while long-term response was not affected by reference population structure.

Conclusions

The ranking of estimation methods was different with than without selection. Under selection, applying GBLUP led to lower inbreeding and a smaller reduction of genetic variance while a similar response to selection was achieved. The reference population structure had a limited effect on long-term accuracy and response. Use of a shallow reference population, most closely related to the selection candidates, gave early benefits while in later generations, when marker effects were not updated, the estimation of marker effects based on a deeper reference population did not pay off.  相似文献   

18.
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.  相似文献   

19.
Fruit quality and repeat flowering are two major foci of several strawberry breeding programs. The identification of quantitative trait loci (QTL) and molecular markers linked to these traits could improve breeding efficiency. In this work, an F1 population derived from the cross ‘Delmarvel’ × ‘Selva’ was used to develop a genetic linkage map for QTL analyses of fruit-quality traits and number of weeks of flowering. Some QTL for fruit-quality traits were identified on the same homoeologous groups found in previous studies, supporting trait association in multiple genetic backgrounds and utility in multiple breeding programs. None of the QTL for soluble solids colocated with a QTL for titratable acids, and, although the total soluble solid contents were significantly and positively correlated with titratable acids, the correlation coefficient value of 0.2452 and independence of QTL indicate that selection for high soluble solids can be practiced independently of selection for low acidity. One genomic region associated with the total number of weeks of flowering was identified quantitatively on LG IV-S-1. The most significant marker, FxaACAO2I8C-145S, explained 43.3 % of the phenotypic variation. The repeat-flowering trait, scored qualitatively, mapped to the same region as the QTL. Dominance of the repeat-flowering allele was demonstrated by the determination that the repeat-flowering parent was heterozygous. This genomic region appears to be the same region identified in multiple mapping populations and testing environments. Markers linked in multiple populations and testing environments to fruit-quality traits and repeat flowering should be tested widely for use in marker-assisted breeding.  相似文献   

20.
The Bayesian LASSO (BL) has been pointed out to be an effective approach to sparse model representation and successfully applied to quantitative trait loci (QTL) mapping and genomic breeding value (GBV) estimation using genome-wide dense sets of markers. However, the BL relies on a single parameter known as the regularization parameter to simultaneously control the overall model sparsity and the shrinkage of individual covariate effects. This may be idealistic when dealing with a large number of predictors whose effect sizes may differ by orders of magnitude. Here we propose the extended Bayesian LASSO (EBL) for QTL mapping and unobserved phenotype prediction, which introduces an additional level to the hierarchical specification of the BL to explicitly separate out these two model features. Compared to the adaptiveness of the BL, the EBL is “doubly adaptive” and thus, more robust to tuning. In simulations, the EBL outperformed the BL in regard to the accuracy of both effect size estimates and phenotypic value predictions, with comparable computational time. Moreover, the EBL proved to be less sensitive to tuning than the related Bayesian adaptive LASSO (BAL), which introduces locus-specific regularization parameters as well, but involves no mechanism for distinguishing between model sparsity and parameter shrinkage. Consequently, the EBL seems to point to a new direction for QTL mapping, phenotype prediction, and GBV estimation.REGULARIZATION or shrinkage methods are gaining increasing recognition as a valuable alternative to variable selection techniques in dealing with oversaturated or otherwise ill-defined regression problems in both the classical and Bayesian frameworks (e.g., O''hara and Sillanpää 2009). Many studies (e.g., Xu 2003; Wang et al. 2005; Zhang and Xu 2005; De los Campos et al. 2009; Usai et al. 2009; Wu et al. 2009; Xu et al. 2009) have documented the potential of shrinkage methods for quantitative trait locus (QTL) mapping and genomic breeding value (GBV) estimation using genome-wide dense sets of markers. Lee et al. (2008) make a clear connection between phenotype prediction and GBV estimation, suggesting that methods developed for one are also applicable to the other. We thus use the two concepts interchangeably throughout this article.Regularized regression methods, such as ridge regression (Hoerl and Kennard 1970) or the least absolute shrinkage and selection operator (LASSO) (Tibshirani 1996), are essentially penalized likelihood procedures, where suitable penalty functions are added to the negative log-likelihood to automatically shrink spurious effects (effects of redundant covariates) toward zero, while allowing relevant effects to take values farther from zero.It has been pointed out that these non-Bayesian shrinkage methods are not suitable for oversaturated models. Zou and Hastie (2005) and Park and Casella (2008) noted that the LASSO cannot select a number of nonzero effects exceeding the sample size. Xu (2003) found that for ridge regression to work, the number of model effects should be in the same order as the number of observations. This is impractical for genomic selection, which capitalizes on the variation due to small-marker effects, the number of which can exceed the sample size, by contrast to QTL mapping where interest lies mostly in a small subset of loci with large effects on the focal phenotype. In connection with the LASSO, the Bayesian LASSO (BL) (Park and Casella 2008; Yi and Xu 2008) has been proposed to overcome this limitation by imposing a selective shrinkage across regression parameters. Xu (2003) also proposed a Bayesian shrinkage method for QTL mapping, which extends ridge regression in a similar fashion.Although the BL has been successfully applied to QTL mapping (e.g., Yi and Xu 2008) and to GBV estimation (e.g., De los Campos et al. 2009), it relies on a single parameter known as the regularization parameter to simultaneously regulate the overall model sparsity and the extent to which individual regression coefficients are shrunken. However, this is unrealistic when dealing with a large number of predictors whose effect sizes may differ by orders of magnitude. It is therefore natural to ask whether this practice can be relaxed and how such an attempt may impinge on the model performance (e.g., Sun et al. 2010).Here we propose an extension to the Bayesian LASSO for QTL mapping and unobserved phenotype prediction. Our method, the extended Bayesian LASSO (EBL), introduces locus-specific regularization parameters and utilizes a parameterization that clearly separates the overall model sparsity from the degree of shrinkage of individual regression parameters. We use simulated data to investigate the performance of the EBL relative to the Bayesian LASSO in mapping QTL and in predicting unobserved phenotypes. We also compare the performance of the EBL to the Bayesian adaptive LASSO (BAL) recently proposed by Sun et al. (2010), which also assumes locus-specific regularization parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号