首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A Two-Stage Approximation for Analysis of Mixture Genetic Models in Large Pedigrees
Authors:D Habier  L R Totir  R L Fernando
Institution:*Institute of Animal Breeding and Husbandry, Christian-Albrechts University of Kiel, 24098 Kiel, Germany, Pioneer Hi-Bred International, Johnston, Iowa 50131 and Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, Iowa 50011
Abstract:Information from cosegregation of marker and QTL alleles, in addition to linkage disequilibrium (LD), can improve genomic selection. Variance components linear models have been proposed for this purpose, but accommodating dominance and epistasis is not straightforward with them. A full-Bayesian analysis of a mixture genetic model is favorable in this respect, but is computationally infeasible for whole-genome analyses. Thus, we propose an approximate two-step approach that neglects information from trait phenotypes in inferring ordered genotypes and segregation indicators of markers. Quantitative trait loci (QTL) fine-mapping scenarios, using high-density markers and pedigrees of five generations without genotyped females, were simulated to test this strategy against an exact full-Bayesian approach. The latter performed better in estimating QTL genotypes, but precision of QTL location and accuracy of genomic breeding values (GEBVs) did not differ for the two methods at realistically low LD. If, however, LD was higher, the exact approach resulted in a slightly higher accuracy of GEBVs. In conclusion, the two-step approach makes mixture genetic models computationally feasible for high-density markers and large pedigrees. Furthermore, markers need to be sampled only once and results can be used for the analysis of all traits. Further research is needed to evaluate the two-step approach for complex pedigrees and to analyze alternative strategies for modeling LD between QTL and markers.DUE to advances in molecular genetics, high-density single-nucleotide polymorphisms (SNPs) are becoming available in animal and plant breeding. These can be used for whole-genome analyses such as prediction of genomic breeding values (GEBVs) and fine mapping of quantitative trait loci (QTL). Genomic selection (GS) (Meuwissen et al. 2001) is promising to improve response to selection by exploiting linkage disequilibrium (LD) between SNPs and QTL (Hayes et al. 2009; Vanraden et al. 2009), but the accuracy of GEBVs depends on additive-genetic relationships between the individuals used to estimate SNP effects and selection candidates (Habier et al. 2007, 2010). Use of cosegregation information, in addition to LD, may reduce this dependency and improve GS. Calus et al. (2008) used a variance components linear model for this purpose in which random QTL effects are modeled conditional on marker haplotypes. The covariance between founder haplotypes allows one to include LD (Meuwissen and Goddard 2000), and the covariance between nonfounder haplotypes computed as in Fernando and Grossman (1989) allows one to include cosegregation. The resulting covariance matrices, however, can be nonpositive definite, which necessitates bending with the effect that information can be lost (Legarra and Fernando 2009). Furthermore, accommodating dominance and epistasis is not straightforward with linear models, especially for crossbred data. In contrast with mixture genetic models, genetic covariance matrices do not enter into the analysis, and accommodating dominance and epistasis is more straightforward (Goddard 1998; Pong-Wong et al. 1998; Stricker and Fernando 1998; Du et al. 1999; Du and Hoeschele 2000; Hoeschele 2001; Yi and Xu 2002; Pérez-Enciso 2003; Yi et al. 2003, 2005).Mixture model analyses, however, are more computationally demanding because the unknowns to be estimated in these analyses include the effects of unobservable QTL genotypes. In linear model analyses, in contrast, it is effects of observable marker genotypes that are estimated. The mixture model analysis can be thought of as a weighted sum of linear model analyses corresponding to each possible state for the unobservable QTL genotypes, where the weights are the probabilities of the QTL genotype states conditional on the observed marker genotypes and trait phenotypes. In practice, the analysis needs to consider all possible haplotypes at the markers also because even when all marker genotypes are observed, some of the marker haplotypes may not be known. As a result, the computational burden of these analyses stems from the number of unknown genotype and haplotype states that need to be summed over being exponentially related to the number of individuals in the pedigree and the number of loci.It can be shown that conditional on the genotypes of their parents, genotypes of offspring are independent of the genotypes of all their ancestors. This conditional independence can be exploited to efficiently compute the weighted summation in the mixture model analysis, provided the pedigree is not too complex (Lauritzen and Sheehan 2003). In genetics, this strategy is called peeling (Elston and Stewart 1971; Cannings et al. 1978) and is equivalent to variable elimination in graphical models (Lauritzen and Sheehan 2003). This approach, however, becomes infeasible when the pedigree is complex and the number of loci is large. Another strategy for analysis of mixture models is based on using Markov chain Monte Carlo (MCMC) theory to draw samples of QTL genotypes and marker haplotypes conditional on the observed marker genotypes and trait phenotypes. Pérez-Enciso (2003) developed an MCMC-based Bayesian analysis for a mixture genetic model that uses information from both LD and cosegregation to fine map a single QTL, but this approach becomes computationally infeasible for whole-genome analyses without approximations.In this article, we investigate a two-stage, approximate analysis that uses information from both LD and cosegregation. In the first stage, ordered genotypes of markers are sampled conditional only on the observed, unordered marker genotypes, ignoring information from the trait phenotypes. These samples are drawn using a Gibbs sampler with overlapping blocks (Thomas et al. 2000; Abraham et al. 2007) in which peeling is performed within a block while conditioning on variables outside the block. From these samples, founder haplotype probabilities and segregation probabilities for the QTL, also called probabilities of descent of QTL (PDQs) alleles, are calculated. In the second stage, these probabilities are used to sample QTL genotypes conditional on the trait phenotypes. In this analysis, information from LD is incorporated by allowing the QTL allele frequencies in founders to be dependent on the marker haplotypes, and information from cosegregation is incorporated by using the PDQs from the first stage to sample QTL alleles in nonfounders. The approximation comes from ignoring trait phenotypes in sampling ordered marker genotypes. A major advantage of the two-step approach is that markers have to be sampled only once and can then be used to analyze all quantitative traits with a mixture model.The objective of this study is to test the hypothesis that this approximation is negligible given high-density SNPs. To test this hypothesis, results from the two-stage, approximate analysis are compared to a full-Bayesian analysis that does not ignore the information from the trait phenotypes in sampling the ordered marker genotypes. The full-Bayesian approach was selected, because it is considered to be the ideal statistical model as it accounts for all uncertainties (Hoeschele 2001). Because the full-Bayesian approach is computationally too demanding for application to GS, the approximate and full-Bayesian analyses are used to fine map within a simulated chromosomal region that is known to contain a QTL to make the comparison computationally feasible. If the consequences of ignoring trait phenotypes to sample ordered marker genotypes are negligible, further research to apply mixture genetic models to GS and comparisons with linear models are justifiable.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号