Genotype Imputation with Millions of Reference Samples期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Genotype Imputation with Millions of Reference Samples

Authors:	Brian?L Browning Sharon?R Browning

Institution:	1.Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA;2.Department of Biostatistics, University of Washington, Seattle, WA 98195, USA

Abstract:	We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle’s throughput was more than 100× greater than Impute2’s throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏