首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20
Authors:Darst  Burcu  Engelman  Corinne D  Tian  Ye  Lorenzo Bermejo  Justo
Institution:1.Division of Biomedical Statistics and Informatics, Department of Health Sciences Research,Rochester,USA;2.Division of Statistical Genomics, Department of Genetics, Center for Genome Sciences and Systems Biology,Washington University School of Medicine,Saint Louis,USA;3.Department of Biostatistics,Boston University School of Public Health, Boston,Boston,USA;4.Department of Electrical Engineering and Computer Science,Case Western Reserve University,Cleveland,USA;5.Lunenfeld-Tanenbaum Research Institute, Sinai Health System,University of Toronto,Toronto,Canada;6.Department of Biology, Dordt College,Sioux Center,USA;7.Department of Mathematics and Statistics, Dordt College,Sioux Center,USA;8.Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care,the Chinese University of Hong Kong,Hong Kong,China;9.CUHK Shenzhen Research Institute,Shenzhen,China;10.Department of Statistics,Columbia University,New York,USA
Abstract:

Background

GAW20 working group 5 brought together researchers who contributed 7 papers with the aim of evaluating methods to detect genetic by epigenetic interactions. GAW20 distributed real data from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study, including single-nucleotide polymorphism (SNP) markers, methylation (cytosine-phosphate-guanine CpG]) markers, and phenotype information on up to 995 individuals. In addition, a simulated data set based on the real data was provided.

Results

The 7 contributed papers analyzed these data sets with a number of different statistical methods, including generalized linear mixed models, mediation analysis, machine learning, W-test, and sparsity-inducing regularized regression. These methods generally appeared to perform well. Several papers confirmed a number of causative SNPs in either the large number of simulation sets or the real data on chromosome 11. Findings were also reported for different SNPs, CpG sites, and SNP–CpG site interaction pairs.

Conclusions

In the simulation (200 replications), power appeared generally good for large interaction effects, but smaller effects will require larger studies or consortium collaboration for realizing a sufficient power.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号