Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

Authors:	Gustavo de los Campos Ana I. Vazquez Rohan Fernando Yann C. Klimentidis Daniel Sorensen

Affiliation:	1.Biostatistics Department, University of Alabama at Birmingham, Birmingham, Alabama, United States of America;2.Animal Science Department, Iowa State University, Ames, Iowa, United States of America;3.Division of Epidemiology and Biostatistics, The University of Arizona, Tucson, Arizona, United States of America;4.Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark;University of Melbourne, Australia

Abstract:	Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R²) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R² based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)², where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R². However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R². Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R² may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏