Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods,sample size and population structure期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods,sample size and population structure

Authors:	Bea Angelica Andersson Wei Zhao Benjamin C. Haller Åke Brännström Xiao-Ru Wang

Affiliation:	1. Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden;2. Department of Computational Biology, Cornell University, Ithaca, New York, USA;3. Department of Mathematics and Mathematical Statistics, Umeå University, Umeå, Sweden Advancing Systems Analysis Program, International Institute for Applied Systems Analysis, Laxenburg, Austria Complexity Science and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Kunigami, Japan

Abstract:	The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods—downsampling, imputation and subsampling—with sample sizes of 4–100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.

Keywords:	DFE missing-data treatment population structure sample size SLiM simulation

设为首页 | 免责声明 | 关于勤云 | 加入收藏