首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
Authors:Andrei-Alin Popescu  Andrea L Harper  Martin Trick  Ian Bancroft  Katharina T Huber
Institution:*School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk NR4 7TJ, United Kingdom;Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, United Kingdom;Department of Computational and Systems Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, United Kingdom
Abstract:Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko.
Keywords:admixture inference  kernel-PCA  population structure  genome-wide association studies  Q-matrix
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号