Deep learning prediction of attention-deficit hyperactivity disorder in African Americans by copy number variation |
| |
Authors: | Yichuan Liu Hui-Qi Qu Xiao Chang Kenny Nguyen Jingchun Qu Lifeng Tian Joseph Glessner Patrick MA Sleiman Hakon Hakonarson |
| |
Affiliation: | 1.Center for Applied Genomics, Children''s Hospital of Philadelphia, Philadelphia, PA 19104, USA; 2.Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; 3.Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; 4.Division of Pulmonary Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA |
| |
Abstract: | Current understanding of the underlying molecular network and mechanism for attention-deficit hyperactivity disorder (ADHD) is lacking and incomplete. Previous studies suggest that genomic structural variations play an important role in the pathogenesis of ADHD. For effective modeling, deep learning approaches have become a method of choice, with ability to predict the impact of genetic variations involving complicated mechanisms. In this study, we examined copy number variation in whole genome sequencing from 116 African Americans ADHD children and 408 African American controls. We divided the human genome into 150 regions, and the variation intensity in each region was applied as feature vectors for deep learning modeling to classify ADHD patients. The accuracy of deep learning for predicting ADHD diagnosis is consistently around 78% in a two-fold shuffle test, compared with ∼50% by traditional k-mean clustering methods. Additional whole genome sequencing data from 351 European Americans children, including 89 ADHD cases and 262 controls, were applied as independent validation using feature vectors obtained from the African American ethnicity analysis. The accuracy of ADHD labeling was lower in this setting (∼70–75%) but still above the results from traditional methods. The regions with highest weight overlapped with the previously reported ADHD-associated copy number variation regions, including genes such as GRM1 and GRM8, key drivers of metabotropic glutamate receptor signaling. A notable discovery is that structural variations in non-coding genomic (intronic/intergenic) regions show prediction weights that can be as high as prediction weight from variations in coding regions, results that were unexpected. |
| |
Keywords: | Deep learning African Americans attention-deficit hyperactivity disorder copy number variations whole genome sequencing |
|
|