SNP characteristics predict replication success in association studies |
| |
Authors: | Ivan P. Gorlov Jason H. Moore Bo Peng Jennifer L. Jin Olga Y. Gorlova Christopher I. Amos |
| |
Affiliation: | 1. Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, 74 College Street Vail 7th Floor, HB 7260 Vail, Hanover, NH, 03755, USA 2. The Geisel School of Medicine, Dartmouth College, HB 7937, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, 03756, USA 3. Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, Unit 1410, Houston, TX, 77030, USA 4. Department of Mathematics, Dartmouth College, HB 1819, Hanover, NH, 03755, USA
|
| |
Abstract: | Successful independent replication is the most direct approach for distinguishing real genotype–disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that ?Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of ?Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|