首页 | 本学科首页   官方微博 | 高级检索  
     


Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods,sample size and population structure
Authors:Bea Angelica Andersson  Wei Zhao  Benjamin C. Haller  Åke Brännström  Xiao-Ru Wang
Affiliation:1. Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden;2. Department of Computational Biology, Cornell University, Ithaca, New York, USA;3. Department of Mathematics and Mathematical Statistics, Umeå University, Umeå, Sweden

Advancing Systems Analysis Program, International Institute for Applied Systems Analysis, Laxenburg, Austria

Complexity Science and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Kunigami, Japan

Abstract:The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods—downsampling, imputation and subsampling—with sample sizes of 4–100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.
Keywords:DFE  missing-data treatment  population structure  sample size  SLiM simulation
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号