Background
GAW20 working group 5 brought together researchers who contributed 7 papers with the aim of evaluating methods to detect genetic by epigenetic interactions. GAW20 distributed real data from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study, including single-nucleotide polymorphism (SNP) markers, methylation (cytosine-phosphate-guanine [CpG]) markers, and phenotype information on up to 995 individuals. In addition, a simulated data set based on the real data was provided.Results
The 7 contributed papers analyzed these data sets with a number of different statistical methods, including generalized linear mixed models, mediation analysis, machine learning, W-test, and sparsity-inducing regularized regression. These methods generally appeared to perform well. Several papers confirmed a number of causative SNPs in either the large number of simulation sets or the real data on chromosome 11. Findings were also reported for different SNPs, CpG sites, and SNP–CpG site interaction pairs.Conclusions
In the simulation (200 replications), power appeared generally good for large interaction effects, but smaller effects will require larger studies or consortium collaboration for realizing a sufficient power.Background
This paper summarizes the contributions from the Genome-wide Association Study group (GWAS group) of the GAW20. The GWAS group contributions focused on topics such as association tests, phenotype imputation, and application of empirical kinships. The goals of the GWAS group contributions were varied. A real or a simulated data set based on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was employed by different methods. Different outcomes and covariates were considered, and quality control procedures varied throughout the contributions.Results
The consideration of heritability and family structure played a major role in some contributions. The inclusion of family information and adaptive weights based on data were found to improve power in genome-wide association studies. It was proven that gene-level approaches are more powerful than single-marker analysis. Other contributions focused on the comparison between pedigree-based kinship and empirical kinship matrices, and investigated similar results in heritability estimation, association mapping, and genomic prediction. A new approach for linkage mapping of triglyceride levels was able to identify a novel linkage signal.Conclusions
This summary paper reports on promising statistical approaches and findings of the members of the GWAS group applied on real and simulated data which encompass the current topics of epigenetic and pharmacogenomics.Background
X chromosome inactivation (XCI) is an important gene regulation mechanism in females to equalize the expression levels of X chromosome between two sexes. Generally, one of two X chromosomes in females is randomly chosen to be inactivated. Nonrandom XCI (XCI skewing) is also observed in females, which has been reported to play an important role in many X-linked diseases. However, there is no statistical measure available for the degree of the XCI skewing based on family data in population genetics.Results
In this article, we propose a statistical approach to measure the degree of the XCI skewing based on family trios, which is represented by a ratio of two genotypic relative risks in females. The point estimate of the ratio is obtained from the maximum likelihood estimates of two genotypic relative risks. When parental genotypes are missing in some family trios, the expectation-conditional-maximization algorithm is adopted to obtain the corresponding maximum likelihood estimates. Further, the confidence interval of the ratio is derived based on the likelihood ratio test. Simulation results show that the likelihood-based confidence interval has an accurate coverage probability under the situations considered. Also, we apply our proposed method to the rheumatoid arthritis data from USA for its practical use, and find out that a locus, rs2238907, may undergo the XCI skewing against the at-risk allele. But this needs to be further confirmed by molecular genetics.Conclusions
The proposed statistical measure for the skewness of XCI is applicable to complete family trio data or family trio data with some paternal genotypes missing. The likelihood-based confidence interval has an accurate coverage probability under the situations considered. Therefore, our proposed statistical measure is generally recommended in practice for discovering the potential loci which undergo the XCI skewing.Determination of genome-wide DNA methylation is significant for both basic research and drug development. As a key epigenetic modification, this biochemical process can modulate gene expression to influence the cell differentiation which can possibly lead to cancer. Due to the involuted biochemical mechanism of DNA methylation, obtaining a precise prediction is a considerably tough challenge. Existing approaches have yielded good predictions, but the methods either need to combine plenty of features and prerequisites or deal with only hypermethylation and hypomethylation.
ResultsIn this paper, we propose a deep learning method for prediction of the genome-wide DNA methylation, in which the Methylation Regression is implemented by Convolutional Neural Networks (MRCNN). Through minimizing the continuous loss function, experiments show that our model is convergent and more precise than the state-of-art method (DeepCpG) according to results of the evaluation. MRCNN also achieves the discovery of de novo motifs by analysis of features from the training process.
ConclusionsGenome-wide DNA methylation could be evaluated based on the corresponding local DNA sequences of target CpG loci. With the autonomous learning pattern of deep learning, MRCNN enables accurate predictions of genome-wide DNA methylation status without predefined features and discovers some de novo methylation-related motifs that match known motifs by extracting sequence patterns.
相似文献Background
Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials.Methods
We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level.Results
Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets.Conclusions
By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.Background
The rise in popularity and accessibility of DNA methylation data to evaluate epigenetic associations with disease has led to numerous methodological questions. As part of GAW20, our working group of 8 research groups focused on gene searching methods.Results
Although the methods were varied, we identified 3 main themes within our group. First, many groups tackled the question of how best to use pedigree information in downstream analyses, finding that (a) the use of kinship matrices is common practice, (b) ascertainment corrections may be necessary, and (c) pedigree information may be useful for identifying parent-of-origin effects. Second, many groups also considered multimarker versus single-marker tests. Multimarker tests had modestly improved power versus single-marker methods on simulated data, and on real data identified additional associations that were not identified with single-marker methods, including identification of a gene with a strong biological interpretation. Finally, some of the groups explored methods to combine single-nucleotide polymorphism (SNP) and DNA methylation into a single association analysis.Conclusions
A causal inference method showed promise at discovering new mechanisms of SNP activity; gene-based methods of summarizing SNP and DNA methylation data also showed promise. Even though numerous questions still remain in the analysis of DNA methylation data, our discussions at GAW20 suggest some emerging best practices.This study was aimed to construct classification and regression tree (CART) model of glycosaminoglycans (GAGs) for the differential diagnosis of Mucopolysaccharidoses (MPS). Two-dimensional electrophoresis and liquid chromatography–tandem mass spectrometry (LC–MS/MS) were used for the qualitative and quantitative analysis of GAGs. Specific enzyme assays and targeted gene sequencing were performed to confirm the diagnosis. Machine learning tools were used to develop CART model based on GAG profile. Qualitative and quantitative CART models showed 96.3% and 98.3% accuracy, respectively, in the differential diagnosis of MPS. The thresholds of different GAGs diagnostic of specific MPS types were established. In 60 MPS positive cases, 46 different mutations were identified in six specific genes. Among 31 different mutations identified in IDUA, nine were nonsense mutations and two were gross deletions while the remaining were missense mutations. In IDS gene, four missense, two frameshift, and one deletion were identified. In NAGLU gene, c.1693C?>?T and c.1914_1914insT were the most common mutations. Two ARSB, one case each of SGSH and GALNS mutations were observed. LC–MS/MS-based GAG pattern showed higher accuracy in the differential diagnosis of MPS. The mutation spectrum of MPS, specifically in IDUA and IDS genes, is highly heterogeneous among the cases studied.
相似文献