共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Genome-wide association studies (GWAS) have evolved over the last ten years into a powerful tool for investigating the genetic architecture of human disease. In this work, we review the key concepts underlying GWAS, including the architecture of common diseases, the structure of common human genetic variation, technologies for capturing genetic information, study designs, and the statistical methods used for data analysis. We also look forward to the future beyond GWAS.
What to Learn in This Chapter
- Basic genetic concepts that drive genome-wide association studies
- Genotyping technologies and common study designs
- Statistical concepts for GWAS analysis
- Replication, interpretation, and follow-up of association results
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.相似文献
3.
The improvement of meat quality and production traits has high priority in the pork industry. Many of these traits show a low to moderate heritability and are difficult and expensive to measure. Their improvement by targeted breeding programs is challenging and requires knowledge of the genetic and molecular background. For this study we genotyped 192 artificial insemination boars of a commercial line derived from the Swiss Large White breed using the PorcineSNP60 BeadChip with 62,163 evenly spaced SNPs across the pig genome. We obtained 26 estimated breeding values (EBVs) for various traits including exterior, meat quality, reproduction, and production. The subsequent genome-wide association analysis allowed us to identify four QTL with suggestive significance for three of these traits (p-values ranging from 4.99×10−6 to 2.73×10−5). Single QTL for the EBVs pH one hour post mortem (pH1) and carcass length were on pig chromosome (SSC) 14 and SSC 2, respectively. Two QTL for the EBV rear view hind legs were on SSC 10 and SSC 16. 相似文献
4.
5.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. 相似文献
6.
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/. 相似文献
7.
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES''s false positive rate is correct, and that TATES''s statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. 相似文献
8.
The primary goal of genome-wide association studies (GWAS) is to discover variants that could lead, in isolation or in combination, to a particular trait or disease. Standard approaches to GWAS, however, are usually based on univariate hypothesis tests and therefore can account neither for correlations due to linkage disequilibrium nor for combinations of several markers. To discover and leverage such potential multivariate interactions, we propose in this work an extension of the Random Forest algorithm tailored for structured GWAS data. In terms of risk prediction, we show empirically on several GWAS datasets that the proposed T-Trees method significantly outperforms both the original Random Forest algorithm and standard linear models, thereby suggesting the actual existence of multivariate non-linear effects due to the combinations of several SNPs. We also demonstrate that variable importances as derived from our method can help identify relevant loci. Finally, we highlight the strong impact that quality control procedures may have, both in terms of predictive power and loci identification. Variable importance results and T-Trees source code are all available at www.montefiore.ulg.ac.be/~botta/ttrees/ and github.com/0asa/TTree-source respectively. 相似文献
9.
Emily R. Davenport Darren A. Cusanovich Katelyn Michelini Luis B. Barreiro Carole Ober Yoav Gilad 《PloS one》2015,10(11)
The bacterial composition of the human fecal microbiome is influenced by many lifestyle factors, notably diet. It is less clear, however, what role host genetics plays in dictating the composition of bacteria living in the gut. In this study, we examined the association of ~200K host genotypes with the relative abundance of fecal bacterial taxa in a founder population, the Hutterites, during two seasons (n = 91 summer, n = 93 winter, n = 57 individuals collected in both). These individuals live and eat communally, minimizing variation due to environmental exposures, including diet, which could potentially mask small genetic effects. Using a GWAS approach that takes into account the relatedness between subjects, we identified at least 8 bacterial taxa whose abundances were associated with single nucleotide polymorphisms in the host genome in each season (at genome-wide FDR of 20%). For example, we identified an association between a taxon known to affect obesity (genus Akkermansia) and a variant near PLD1, a gene previously associated with body mass index. Moreover, we replicate a previously reported association from a quantitative trait locus (QTL) mapping study of fecal microbiome abundance in mice (genus Lactococcus, rs3747113, P = 3.13 x 10−7). Finally, based on the significance distribution of the associated microbiome QTLs in our study with respect to chromatin accessibility profiles, we identified tissues in which host genetic variation may be acting to influence bacterial abundance in the gut. 相似文献
10.
Jizhun Zhang Kewei Jiang Liang Lv Hui Wang Zhanlong Shen Zhidong Gao Bo Wang Yang Yang Yingjiang Ye Shan Wang 《PloS one》2015,10(3)
Although genome-wide association studies have identified many risk loci associated with colorectal cancer, the molecular basis of these associations are still unclear. We aimed to infer biological insights and highlight candidate genes of interest within GWAS risk loci. We used an in silico pipeline based on functional annotation, quantitative trait loci mapping of cis-acting gene, PubMed text-mining, protein-protein interaction studies, genetic overlaps with cancer somatic mutations and knockout mouse phenotypes, and functional enrichment analysis to prioritize the candidate genes at the colorectal cancer risk loci. Based on these analyses, we observed that these genes were the targets of approved therapies for colorectal cancer, and suggested that drugs approved for other indications may be repurposed for the treatment of colorectal cancer. This study highlights the use of publicly available data as a cost effective solution to derive biological insights, and provides an empirical evidence that the molecular basis of colorectal cancer can provide important leads for the discovery of new drugs. 相似文献
11.
Li Zhang Jiasen Liu Fuping Zhao Hangxing Ren Lingyang Xu Jian Lu Shifang Zhang Xiaoning Zhang Caihong Wei Guobin Lu Youmin Zheng Lixin Du 《PloS one》2013,8(6)
Background
Growth and meat production traits are significant economic traits in sheep. The aim of the study is to identify candidate genes affecting growth and meat production traits at genome level with high throughput single nucleotide polymorphisms (SNP) genotyping technologies.Methodology and Results
Using Illumina OvineSNP50 BeadChip, we performed a GWA study in 329 purebred sheep for 11 growth and meat production traits (birth weight, weaning weight, 6-month weight, eye muscle area, fat thickness, pre-weaning gain, post-weaning gain, daily weight gain, height at withers, chest girth, and shin circumference). After quality control, 319 sheep and 48,198 SNPs were analyzed by TASSEL program in a mixed linear model (MLM). 36 significant SNPs were identified for 7 traits, and 10 of them reached genome-wise significance level for post-weaning gain. Gene annotation was implemented with the latest sheep genome Ovis_aries_v3.1 (released October 2012). More than one-third SNPs (14 out of 36) were located within ovine genes, others were located close to ovine genes (878bp-398,165bp apart). The strongest new finding is 5 genes were thought to be the most crucial candidate genes associated with post-weaning gain: s58995.1 was located within the ovine genes MEF2B and RFXANK, OAR3_84073899.1, OAR3_115712045.1 and OAR9_91721507.1 were located within CAMKMT, TRHDE, and RIPK2 respectively. GRM1, POL, MBD5, UBR2, RPL7 and SMC2 were thought to be the important candidate genes affecting post-weaning gain too. Additionally, 25 genes at chromosome-wise significance level were also forecasted to be the promising genes that influencing sheep growth and meat production traits.Conclusions
The results will contribute to the similar studies and facilitate the potential utilization of genes involved in growth and meat production traits in sheep in future. 相似文献12.
K. Bodi A. G. Perera P. S. Adams D. Bintzler K. Dewar D. S. Grove J. Kieleczawa R. H. Lyons T. A. Neubert A. C. Noll S. Singh R. Steen M. Zianni 《Journal of biomolecular techniques》2013,24(2):73-86
Isolating high-priority segments of genomes greatly enhances the efficiency of next-generation sequencing (NGS) by allowing researchers to focus on their regions of interest. For the 2010–11 DNA Sequencing Research Group (DSRG) study, we compared outcomes from two leading companies, Agilent Technologies (Santa Clara, CA, USA) and Roche NimbleGen (Madison, WI, USA), which offer custom-targeted genomic enrichment methods. Both companies were provided with the same genomic sample and challenged to capture identical genomic locations for DNA NGS. The target region totaled 3.5 Mb and included 31 individual genes and a 2-Mb contiguous interval. Each company was asked to design its best assay, perform the capture in replicates, and return the captured material to the DSRG-participating laboratories. Sequencing was performed in two different laboratories on Genome Analyzer IIx systems (Illumina, San Diego, CA, USA). Sequencing data were analyzed for sensitivity, specificity, and coverage of the desired regions. The success of the enrichment was highly dependent on the design of the capture probes. Overall, coverage variability was higher for the Agilent samples. As variant discovery is the ultimate goal for a typical targeted sequencing project, we compared samples for their ability to sequence single-nucleotide polymorphisms (SNPs) as a test of the ability to capture both chromosomes from the sample. In the targeted regions, we detected 2546 SNPs with the NimbleGen samples and 2071 with Agilent''s. When limited to the regions that both companies included as baits, the number of SNPs was ∼1000 for each, with Agilent and NimbleGen finding a small number of unique SNPs not found by the other. 相似文献
13.
Evaluation of Two Commercially Available Media for Detection of Bacteremia 总被引:13,自引:3,他引:10 下载免费PDF全文
Analysis of the results of 13,162 blood cultures during a 9-month interval has shown that Pseudomonas aeruginosa statistically was recovered more frequently from Trypticase soy broth (TSB) than from Thioglycollate-135C and that contaminants, including Staphylococcus epidermidis and aerobic and anaerobic Corynebacterium species, were isolated with statistically greater frequency from Thioglycollate-135C than from TSB. No other statistically significant differences were found. 相似文献
14.
Comparison of Two Commercially Available Media for Detection of Bacteremia 总被引:18,自引:8,他引:18 下载免费PDF全文
An analysis of 3,795 positive blood cultures obtained from 1,718 patients in a 2.5-year evaluation of Tryptic Soy Broth (TSB) and Thiol Broth is reported. Isolation rates of Actinobacillus and Pseudomonas were significantly greater in TSB, whereas isolation rates of Streptococcus and Corynebacterium (aerobic and anaerobic) were significantly greater in Thiol. Otherwise, the two media were similar. Disregarding contaminants, anaerobic bacteria represented 11% of positive cultures and 20% of patients with bacteremia. Eleven per cent of the patients had polymicrobial bacteremia. 相似文献
15.
Cristina?T. Vicente Stacey?L. Edwards Kristine?M. Hillman Susanne Kaufmann Hayley Mitchell Lisa Bain Dylan?M. Glubb Jason?S. Lee Juliet?D. French Manuel?A.R. Ferreira 《American journal of human genetics》2015,96(2):329-339
In recent years, a number of large-scale genome-wide association studies have been published for human traits adjusted for other correlated traits with a genetic basis. In most studies, the motivation for such an adjustment is to discover genetic variants associated with the primary outcome independently of the correlated trait. In this report, we contend that this objective is fulfilled when the tested variants have no effect on the covariate or when the correlation between the covariate and the outcome is fully explained by a direct effect of the covariate on the outcome. For all other scenarios, an unintended bias is introduced with respect to the primary outcome as a result of the adjustment, and this bias might lead to false positives. Here, we illustrate this point by providing examples from published genome-wide association studies, including large meta-analysis of waist-to-hip ratio and waist circumference adjusted for body mass index (BMI), where genetic effects might be biased as a result of adjustment for body mass index. Using both theory and simulations, we explore this phenomenon in detail and discuss the ramifications for future genome-wide association studies of correlated traits and diseases. 相似文献
16.
Toshiyuki Masuzawa Toru Kurita Hiroki Kawabata Harumi Suzuki Yasutake Yanagihara 《Microbiology and immunology》1994,38(4):263-268
Outbred ddY mice inoculated with live cells of Borrelia burgdorferi strain 297 into hind footpad displayed swelling of the footpad at days 7 to 11 after inoculation. Marked neutrophilic infiltration was observed in the subcutaneous tissue and the part of bone tissue which was partially destroyed, and synovial layer of articular capsule was thickened and protruded into the joint space in the histopathological examination of footpad inoculated with live Borrelia cells. The inflammation peaked at day 7 and B. burgdorferi was cultured from bladder and heart of the mice at day 14 after inoculation. The mice inoculated with heat-inactivated cells at 56 C for 30 min did not show any significant histopathological change. In this mice model, nontreated littermates were not infected in contact with infected littermates for 14 days of experimental period. The outbred ddY mice model is useful for evaluating the effectiveness of vaccination against Lyme disease. 相似文献
17.
Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies
Genome-wide association studies (GWASs) have recently revealed many genetic associations that are shared between different diseases. We propose a method, disPCA, for genome-wide characterization of shared and distinct risk factors between and within disease classes. It flips the conventional GWAS paradigm by analyzing the diseases themselves, across GWAS datasets, to explore their “shared pathogenetics”. The method applies principal component analysis (PCA) to gene-level significance scores across all genes and across GWASs, thereby revealing shared pathogenetics between diseases in an unsupervised fashion. Importantly, it adjusts for potential sources of heterogeneity present between GWAS which can confound investigation of shared disease etiology. We applied disPCA to 31 GWASs, including autoimmune diseases, cancers, psychiatric disorders, and neurological disorders. The leading principal components separate these disease classes, as well as inflammatory bowel diseases from other autoimmune diseases. Generally, distinct diseases from the same class tend to be less separated, which is in line with their increased shared etiology. Enrichment analysis of genes contributing to leading principal components revealed pathways that are implicated in the immune system, while also pointing to pathways that have yet to be explored before in this context. Our results point to the potential of disPCA in going beyond epidemiological findings of the co-occurrence of distinct diseases, to highlighting novel genes and pathways that unsupervised learning suggest to be key players in the variability across diseases. 相似文献
18.
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. 相似文献
19.
Genome-wide association studies (GWAS) are designed to identify the portion of single-nucleotide polymorphisms (SNPs) in genome sequences associated with a complex trait. Strategies based on the gene list enrichment concept are currently applied for the functional analysis of GWAS, according to which a significant overrepresentation of candidate genes associated with a biological pathway is used as a proxy to infer overrepresentation of candidate SNPs in the pathway. Here we show that such inference is not always valid and introduce the program SNP2GO, which implements a new method to properly test for the overrepresentation of candidate SNPs in biological pathways. 相似文献
20.
It is widely acknowledged that genome-wide association studies (GWAS) of complex human disease fail to explain a large portion of heritability, primarily due to lack of statistical power—a problem that is exacerbated when seeking detection of interactions of multiple genomic loci. An untapped source of information that is already widely available, and that is expected to grow in coming years, is population samples. Such samples contain genetic marker data for additional individuals, but not their relevant phenotypes. In this article we develop a highly efficient testing framework based on a constrained maximum-likelihood estimate in a case–control–population setting. We leverage the available population data and optional modeling assumptions, such as Hardy–Weinberg equilibrium (HWE) in the population and linkage equilibrium (LE) between distal loci, to substantially improve power of association and interaction tests. We demonstrate, via simulation and application to actual GWAS data sets, that our approach is substantially more powerful and robust than standard testing approaches that ignore or make naive use of the population sample. We report several novel and credible pairwise interactions, in bipolar disorder, coronary artery disease, Crohn’s disease, and rheumatoid arthritis. 相似文献