首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Xia  Xiaoxuan  Weng  Haoyi  Men  Ruoting  Sun  Rui  Zee  Benny Chung Ying  Chong  Ka Chun  Wang  Maggie Haitian 《BMC genetics》2018,19(1):67-37

Background

Association studies using a single type of omics data have been successful in identifying disease-associated genetic markers, but the underlying mechanisms are unaddressed. To provide a possible explanation of how these genetic factors affect the disease phenotype, integration of multiple omics data is needed.

Results

We propose a novel method, LIPID (likelihood inference proposal for indirect estimation), that uses both single nucleotide polymorphism (SNP) and DNA methylation data jointly to analyze the association between a trait and SNPs. The total effect of SNPs is decomposed into direct and indirect effects, where the indirect effects are the focus of our investigation. Simulation studies show that LIPID performs better in various scenarios than existing methods. Application to the GAW20 data also leads to encouraging results, as the genes identified appear to be biologically relevant to the phenotype studied.

Conclusions

The proposed LIPID method is shown to be meritorious in extensive simulations and in real-data analyses.
  相似文献   

2.

Background

The rise in popularity and accessibility of DNA methylation data to evaluate epigenetic associations with disease has led to numerous methodological questions. As part of GAW20, our working group of 8 research groups focused on gene searching methods.

Results

Although the methods were varied, we identified 3 main themes within our group. First, many groups tackled the question of how best to use pedigree information in downstream analyses, finding that (a) the use of kinship matrices is common practice, (b) ascertainment corrections may be necessary, and (c) pedigree information may be useful for identifying parent-of-origin effects. Second, many groups also considered multimarker versus single-marker tests. Multimarker tests had modestly improved power versus single-marker methods on simulated data, and on real data identified additional associations that were not identified with single-marker methods, including identification of a gene with a strong biological interpretation. Finally, some of the groups explored methods to combine single-nucleotide polymorphism (SNP) and DNA methylation into a single association analysis.

Conclusions

A causal inference method showed promise at discovering new mechanisms of SNP activity; gene-based methods of summarizing SNP and DNA methylation data also showed promise. Even though numerous questions still remain in the analysis of DNA methylation data, our discussions at GAW20 suggest some emerging best practices.
  相似文献   

3.

Background

Fenofibrate (Fb) is a known treatment for elevated triglyceride (TG) levels. The Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was designed to investigate potential contributors to the effects of Fb on TG levels. Here, we summarize the analyses of 8 papers whose authors had access to the GOLDN data and were grouped together because they pursued investigations into Fb treatment responses as part of GAW20. These papers report explorations of a variety of genetics, epigenetics, and study design questions. Data regarding treatment with 160 mg of micronized Fb per day for 3 weeks included pretreatment and posttreatment TG and methylation levels (ML) at approximately 450,000 epigenetic markers (cytosine-phosphate-guanine [CpG] sites). In addition, approximately 1 million single-nucleotide polymorphisms (SNPs) were genotyped or imputed in each of the study participants, drawn from 188 pedigrees.

Results

The analyses of a variety of subsets of the GOLDN data used a number of analytic approaches such as linear mixed models, a kernel score test, penalized regression, and artificial neural networks.

Conclusions

Results indicate that (a) CpG ML are responsive to Fb; (b) CpG ML should be included in models predicting the TG level responses to Fb; (c) common and rare variants are associated with TG responses to Fb; (d) the interactions of common variants and CpG ML should be included in models predicting the TG response; and (e) sample size is a critical factor in the successful construction of predictive models representing the response to Fb.
  相似文献   

4.

Background

An important feature in many genomic studies is quality control and normalization. This is particularly important when analyzing epigenetic data, where the process of obtaining measurements can be bias prone. The GAW20 data was from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), a study with multigeneration families, where DNA cytosine-phosphate-guanine (CpG) methylation was measured pre- and posttreatment with fenofibrate. We performed quality control assessment of the GAW20 DNA methylation data, including normalization, assessment of batch effects and detection of sample swaps.

Results

We show that even after normalization, the GOLDN methylation data has systematic differences pre- and posttreatment. Through investigation of (a) CpGs sites containing a single nucleotide polymorphism, (b) the stability of breeding values for methylation across time points, and (c) autosomal gender-associated CpGs, 13 sample swaps were detected, 11 of which were posttreatment.

Conclusions

This paper demonstrates several ways to perform quality control of methylation data in the absence of raw data files and highlights the importance of normalization and quality control of the GAW20 methylation data from the GOLDN study.
  相似文献   

5.
6.

Background

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.

Methods

The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.

Results

Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.

Conclusions

The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
  相似文献   

7.

Background

An accumulation of evidence has revealed the important role of epigenetic factors in explaining the etiopathogenesis of human diseases. Several empirical studies have successfully incorporated methylation data into models for disease prediction. However, it is still a challenge to integrate different types of omics data into prediction models, and the contribution of methylation information to prediction remains to be fully clarified.

Results

A stratified drug-response prediction model was built based on an artificial neural network to predict the change in the circulating triglyceride level after fenofibrate intervention. Associated single-nucleotide polymorphisms (SNPs), methylation of selected cytosine-phosphate-guanine (CpG) sites, age, sex, and smoking status, were included as predictors. The model with selected SNPs achieved a mean 5-fold cross-validation prediction error rate of 43.65%. After adding methylation information into the model, the error rate dropped to 41.92%. The combination of significant SNPs, CpG sites, age, sex, and smoking status, achieved the lowest prediction error rate of 41.54%.

Conclusions

Compared to using SNP data only, adding methylation data in prediction models slightly improved the error rate; further prediction error reduction is achieved by a combination of genome, methylation genome, and environmental factors.
  相似文献   

8.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

9.

Background

While continental level ancestry is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge.

Methods

We study the problem of predicting human biogeographical ancestry from genomic data under resource constraints. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We propose methods to construct such ancestry informative SNP panels using correlation-based and outlier-based methods.

Results

We accessed the performance of the proposed SNP panels derived from just one chromosome, using data from the 1000 Genome Project, Phase 3. For continental-level ancestry classification, we achieved an overall classification rate of 96.75% using 206 single nucleotide polymorphisms (SNPs). For sub-population level ancestry prediction, we achieved an average pairwise binary classification rates as follows: subpopulations in Europe: 76.6% (58 SNPs); Africa: 87.02% (87 SNPs); East Asia: 73.30% (68 SNPs); South Asia: 81.14% (75 SNPs); America: 85.85% (68 SNPs).

Conclusion

Our results demonstrate that one single chromosome (in particular, Chromosome 1), if carefully analyzed, could hold enough information for accurate prediction of human biogeographical ancestry. This has significant implications in terms of the computational resources required for analysis of ancestry, and in the applications of such analyses, such as in studies of genetic diseases, forensics, and soft biometrics.
  相似文献   

10.

Background

We analyzed 143 pedigrees (364 nuclear families) in the Collaborative Study on the Genetics of Alcoholism (COGA) data provided to the participants in the Genetic Analysis Workshop 14 (GAW14) with the goal of comparing results obtained from genome linkage analysis using microsatellite and with results obtained using SNP markers for two measures of alcoholism (maximum number of drinks -MAXDRINK and an electrophysiological measure from EEG -TTTH1). First, we constructed haplotype blocks by using the entire set of single-nucleotide polymorphisms (SNP) in chromosomes 1, 4, and 7. These chromosomes have shown linkage signals for MAXDRINK or EEG-TTTH1 in previous reports. Second, we randomly selected one, two, three, four, and five SNPs from each block (referred to as Rep1 – Rep5, respectively) to conduct linkage analysis using variance component approach. Finally, results of all SNP analyses were compared with those obtained using microsatellite markers.

Results

The LOD scores obtained from SNPs were slightly higher but the curves were not radically different from those obtained from microsatellite analyses. The peaks of linkage regions from SNP sets were slightly shifted to the left when compared to those from microsatellite markers. The reduced sets of SNPs provide signals in the same linkage regions but with a smaller LOD score suggesting a significant impact of the decrease in information content on linkage results. The widths of 1 LOD support interval of linkage regions from SNP sets were smaller when compared to those of microsatellite markers. However, two linkage regions obtained from the microsatellite linkage analysis on chromosome 7 for LOG of TTTH1 were not detected in the SNP based analyses.

Conclusion

The linkage results from SNPs showed narrower linkage regions and slightly higher LOD scores when compared to those of microsatellite markers. The different builds of the genetic maps used in microsatellite and SNPs markers or/and errors in genotyping may account for the microsatellite linkage signals on chromosome 7 that were not identified using SNPs. Also, unresolved map issues between SNPs and microsatellite markers may be partly responsible for the shifted linkage peaks when comparing the two types of markers.
  相似文献   

11.

Background

Systematic evaluation and study of single nucleotide polymorphisms (SNPs) made possible by high throughput genotyping technologies and bioinformatics promises to provide breakthroughs in the understanding of complex diseases. Understanding how the millions of SNPs in the human genome are involved in conferring susceptibility or resistance to disease, or in rendering a drug efficacious or toxic in the individual is a major goal of the relatively new fields of pharmacogenomics. Esophageal squamous cell carcinoma is a high-mortality cancer with complex etiology and progression involving both genetic and environmental factors. We examined the association between esophageal cancer risk and patterns of 61 SNPs in a case-control study for a population from Shanxi Province in North Central China that has among the highest rates of esophageal squamous cell carcinoma in the world.

Methods

High-throughput Masscode mass spectrometry genotyping was done on genomic DNA from 574 individuals (394 cases and 180 age-frequency matched controls). SNPs were chosen from among genes involving DNA repair enzymes, and Phase I and Phase II enzymes.We developed a novel adaptation of the Decision Forest pattern recognition method named Decision Forest for SNPs (DF-SNPs). The method was designated to analyze the SNP data.

Results

The classifier in separating the cases from the controls developed with DF-SNPs gave concordance, sensitivity and specificity, of 94.7%, 99.0% and 85.1%, respectively; suggesting its usefulness for hypothesizing what SNPs or combinations of SNPs could be involved in susceptibility to esophageal cancer. Importantly, the DF-SNPs algorithm incorporated a randomization test for assessing the relevance (or importance) of individual SNPs, SNP types (Homozygous common, heterozygous and homozygous variant) and patterns of SNP types (SNP patterns) that differentiate cases from controls. For example, we found that the different genotypes of SNP GADD45B E1122 are all associated with cancer risk.

Conclusion

The DF-SNPs method can be used to differentiate esophageal squamous cell carcinoma cases from controls based on individual SNPs, SNP types and SNP patterns. The method could be useful to identify potential biomarkers from the SNP data and complement existing methods for genotype analyses.
  相似文献   

12.

Background

Recent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. There is increasing recognition of the importance of SNPs for medicine and developmental biology. However, SNP data set typically has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease data set) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data.

Results

In this paper, we use shrunken dissimilarity measure to analyze and select relevant SNPs for classification problems. Examples of HapMap data and Parkinson disease (PD) data are given to demonstrate the effectiveness of the proposed method, and illustrate it has a potential to become a useful analysis tool for SNP data sets. We use Parkinson disease data as an example, and perform a whole genome analysis. For the 367440 SNPs with less than 1% missing percentage from all 22 chromosomes, we can select 357 SNPs from this data set. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that has a similarity value being greater than a threshold are selected to construct several groups of genes. For the SNPs that involved in these groups of genes, a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and SNPs with significance of P < 0.01 are chosen to identify SNPs networks based on their P values. Here SNPs networks are constructed based on Gene Ontology knowledge, and therefore each SNP network plays a role in the biological process. An analysis shows that such networks have relationships directly or indirectly to Parkinson disease.

Conclusions

Experimental results show that our approach is suitable to handle genetic variations, and provide useful knowledge in a genome-wide SNP study.
  相似文献   

13.

Background

Transgenerational epigenetic inheritance has been posited as a possible contributor to the observed heritability of metabolic syndrome (MetS). Yet the extent to which estimates of epigenetic inheritance for DNA methylation sites are inflated by environmental and genetic covariance within families is still unclear. We applied current methods to quantify the environmental and genetic contributors to the observed heritability and familial correlations of four previously associated MetS methylation sites at three genes (CPT1A, SOCS3 and ABCG1) using real data made available through the GAW20.

Results

Our findings support the role of both shared environment and genetic variation in explaining the heritability of MetS and the four MetS cytosine-phosphate-guanine (CpG) sites, although the resulting heritability estimates were indistinguishable from one another. Familial correlations by type of relative pair generally followed our expectation based on relatedness, but in the case of sister and parent pairs we observed nonsignificant trends toward greater correlation than expected, as would be consistent with the role of shared environmental factors in the inflation of our estimated correlations.

Conclusions

Our work provides an interesting and flexible statistical framework for testing models of epigenetic inheritance in the context of human family studies. Future work should endeavor to replicate our findings and advance these methods to more robustly describe epigenetic inheritance patterns in human populations.
  相似文献   

14.

Background

Using the dataset provided for Genetic Analysis Workshop 14 by the Collaborative Study on the Genetics of Alcoholism, we performed genome-wide linkage analysis of age at onset of alcoholism to compare the utility of microsatellites and single-nucleotide polymorphisms (SNPs) in genetic linkage study.

Methods

A multipoint nonparametric variance component linkage analysis method was applied to the survival distribution function obtained from semiparametric proportional hazards model of the age at onset phenotype of alcoholism. Three separate linkage analyses were carried out using 315 microsatellites, 2,467 and 9,467 SNPs, spanning the 22 autosomal chromosomes.

Results

Heritability of age at onset was estimated to be approximately 12% (p < 0.001). We observed weak correlation, both in trend and strength, of genome-wide linkage signals between microsatellites and SNPs. Results from SNPs revealed more and stronger linkage signals across the genome compared with those from microsatellites. The only suggestive evidence of linkage from microsatellites was on chromosome 1 (LOD of 1.43). Differences in map densities between the two sets of SNPs used in this study did not appear to confer an advantage in terms of strength of linkage signals.

Conclusion

Our study provided support for better performance of dense SNP maps compared with the sparse mirosatellite maps currently available for linkage analysis of quantitative traits. This better performance could be attributable to precise definition and high map resolutions achievable with dense SNP maps, thus resulting in increased power to detect possible loci affecting given trait or disease.
  相似文献   

15.
16.

Background

Longitudinal data and repeated measurements in epigenome-wide association studies (EWAS) provide a rich resource for understanding epigenetics. We summarize 7 analytical approaches to the GAW20 data sets that addressed challenges and potential applications of phenotypic and epigenetic data. All contributions used the GAW20 real data set and employed either linear mixed effect (LME) models or marginal models through generalized estimating equations (GEE). These contributions were subdivided into 3 categories: (a) quality control (QC) methods for DNA methylation data; (b) heritability estimates pretreatment and posttreatment with fenofibrate; and (c) impact of drug response pretreatment and posttreatment with fenofibrate on DNA methylation and blood lipids.

Results

Two contributions addressed QC and identified large statistical differences with pretreatment and posttreatment DNA methylation, possibly a result of batch effects. Two contributions compared epigenome-wide heritability estimates pretreatment and posttreatment, with one employing a Bayesian LME and the other using a variance-component LME. Density curves comparing these studies indicated these heritability estimates were similar. Another contribution used a variance-component LME to depict the proportion of heritability resulting from a genetic and shared environment. By including environmental exposures as random effects, the authors found heritability estimates became more stable but not significantly different. Two contributions investigated treatment response. One estimated drug-associated methylation effects on triglyceride levels as the response, and identified 11 significant cytosine-phosphate-guanine (CpG) sites with or without adjusting for high-density lipoprotein. The second contribution performed weighted gene coexpression network analysis and identified 6 significant modules of at least 30 CpG sites, including 3 modules with topological differences pretreatment and posttreatment.

Conclusions

Four conclusions from this GAW20 working group are: (a) QC measures are an important consideration for EWAS studies that are investigating multiple time points or repeated measurements; (b) application of heritability estimates between time points for individual CpG sites is a useful QC measure for DNA methylation studies; (c) drug intervention demonstrated strong epigenome-wide DNA methylation patterns across the 2 time points; and (d) new statistical methods are required to account for the environmental contributions of DNA methylation across time. These contributions demonstrate numerous opportunities exist for the analysis of longitudinal data in future epigenetic studies.
  相似文献   

17.

Background

The GAW20 group formed on the theme of methods for association analyses of repeated measures comprised 4sets of investigators. The provided “real” data set included genotypes obtained from a human whole-genome association study based on longitudinal measurements of triglycerides (TGs) and high-density lipoprotein in addition to methylation levels before and after administration of fenofibrate. The simulated data set contained 200 replications of methylation levels and posttreatment TGs, mimicking the real data set.

Results

The different investigators in the group focused on the statistical challenges unique to family-based association analyses of phenotypes measured longitudinally and applied a wide spectrum of statistical methods such as linear mixed models, generalized estimating equations, and quasi-likelihood–based regression models. This article discusses the varying strategies explored by the group’s investigators with the common goal of improving the power to detect association with repeated measures of a phenotype.

Conclusions

Although it is difficult to identify a common message emanating from the different contributions because of the diversity in the issues addressed, the unifying theme of the contributions lie in the search for novel analytic strategies to circumvent the limitations of existing methodologies to detect genetic association.
  相似文献   

18.

Background

In recent years, both single-nucleotide polymorphism (SNP) array and functional magnetic resonance imaging (fMRI) have been widely used for the study of schizophrenia (SCZ). In addition, a few studies have been reported integrating both SNPs data and fMRI data for comprehensive analysis.

Methods

In this study, a novel sparse representation based variable selection (SRVS) method has been proposed and tested on a simulation data set to demonstrate its multi-resolution properties. Then the SRVS method was applied to an integrative analysis of two different SCZ data sets, a Single-nucleotide polymorphism (SNP) data set and a functional resonance imaging (fMRI) data set, including 92 cases and 116 controls. Biomarkers for the disease were identified and validated with a multivariate classification approach followed by a leave one out (LOO) cross-validation. Then we compared the results with that of a previously reported sparse representation based feature selection method.

Results

Results showed that biomarkers from our proposed SRVS method gave significantly higher classification accuracy in discriminating SCZ patients from healthy controls than that of the previous reported sparse representation method. Furthermore, using biomarkers from both data sets led to better classification accuracy than using single type of biomarkers, which suggests the advantage of integrative analysis of different types of data.

Conclusions

The proposed SRVS algorithm is effective in identifying significant biomarkers for complicated disease as SCZ. Integrating different types of data (e.g. SNP and fMRI data) may identify complementary biomarkers benefitting the diagnosis accuracy of the disease.
  相似文献   

19.

Background

Metabolic syndrome is a risk factor for type 2 diabetes and cardiovascular disease. We identified common genetic variants that alter the risk for metabolic syndrome in the Korean population. To isolate these variants, we conducted a multiple-genotype and multiple-phenotype genome-wide association analysis using the family-based quasi-likelihood score (MFQLS) test. For this analysis, we used 7211 and 2838 genotyped study subjects for discovery and replication, respectively. We also performed a multiple-genotype and multiple-phenotype analysis of a gene-based single-nucleotide polymorphism (SNP) set.

Results

We found an association between metabolic syndrome and an intronic SNP pair, rs7107152 and rs1242229, in SIDT2 gene at 11q23.3. Both SNPs correlate with the expression of SIDT2 and TAGLN, whose products promote insulin secretion and lipid metabolism, respectively. This SNP pair showed statistical significance at the replication stage.

Conclusions

Our findings provide insight into an underlying mechanism that contributes to metabolic syndrome.
  相似文献   

20.

Background

This paper summarizes the contributions from the Genome-wide Association Study group (GWAS group) of the GAW20. The GWAS group contributions focused on topics such as association tests, phenotype imputation, and application of empirical kinships. The goals of the GWAS group contributions were varied. A real or a simulated data set based on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was employed by different methods. Different outcomes and covariates were considered, and quality control procedures varied throughout the contributions.

Results

The consideration of heritability and family structure played a major role in some contributions. The inclusion of family information and adaptive weights based on data were found to improve power in genome-wide association studies. It was proven that gene-level approaches are more powerful than single-marker analysis. Other contributions focused on the comparison between pedigree-based kinship and empirical kinship matrices, and investigated similar results in heritability estimation, association mapping, and genomic prediction. A new approach for linkage mapping of triglyceride levels was able to identify a novel linkage signal.

Conclusions

This summary paper reports on promising statistical approaches and findings of the members of the GWAS group applied on real and simulated data which encompass the current topics of epigenetic and pharmacogenomics.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号