期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs

Zheng-Zheng Tang Dan-Yu Lin 《American journal of human genetics》2015,97(1):35-53

There is heightened interest in using next-generation sequencing technologies to identify rare variants that influence complex human diseases and traits. Meta-analysis is essential to this endeavor because large sample sizes are required for detecting associations with rare variants. In this article, we provide a comprehensive overview of statistical methods for meta-analysis of sequencing studies for discovering rare-variant associations. Specifically, we discuss the calculation of relevant summary statistics from participating studies, the construction of gene-level association tests, the choice of transformation for quantitative traits, the use of fixed-effects versus random-effects models, and the removal of shadow association signals through conditional analysis. We also show that meta-analysis based on properly calculated summary statistics is as powerful as joint analysis of individual-participant data. In addition, we demonstrate the performance of different meta-analysis methods by using both simulated and empirical data. We then compare four major software packages for meta-analysis of rare-variant associations—MASS, RAREMETAL, MetaSKAT, and seqMeta—in terms of the underlying statistical methodology, analysis pipeline, and software interface. Finally, we present PreMeta, a software interface that integrates the four meta-analysis packages and allows a consortium to combine otherwise incompatible summary statistics. 相似文献

2.

A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer

Debashree Ray Nilanjan Chatterjee 《PLoS genetics》2020,16(12)

There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236). 相似文献

3.

Meta-analysis of Correlated Traits via Summary Statistics from GWASs with an Application in Hypertension

Xiaofeng Zhu Tao Feng Bamidele?O. Tayo Jingjing Liang J.?Hunter Young Nora Franceschini Jennifer?A. Smith Lisa?R. Yanek Yan?V. Sun Todd?L. Edwards Wei Chen Mike Nalls Ervin Fox Michele Sale Erwin Bottinger Charles Rotimi The COGENT BP Consortium Yongmei Liu Barbara McKnight Kiang Liu Donna?K. Arnett Aravinda Chakravati Richard?S. Cooper Susan Redline 《American journal of human genetics》2015,96(1):21-36

Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple—even distinct—traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10⁻⁸) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10⁻⁷) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. 相似文献

4.

A Flexible Approach for the Analysis of Rare Variants Allowing for a Mixture of Effects on Binary or Quantitative Traits

Geraldine M. Clarke Manuel A. Rivas Andrew P. Morris 《PLoS genetics》2013,9(8)

Multiple rare variants either within or across genes have been hypothesised to collectively influence complex human traits. The increasing availability of high throughput sequencing technologies offers the opportunity to study the effect of rare variants on these traits. However, appropriate and computationally efficient analytical methods are required to account for collections of rare variants that display a combination of protective, deleterious and null effects on the trait. We have developed a novel method for the analysis of rare genetic variation in a gene, region or pathway that, by simply aggregating summary statistics at each variant, can: (i) test for the presence of a mixture of effects on a trait; (ii) be applied to both binary and quantitative traits in population-based and family-based data; (iii) adjust for covariates to allow for non-genetic risk factors and; (iv) incorporate imputed genetic variation. In addition, for preliminary identification of promising genes, the method can be applied to association summary statistics, available from meta-analysis of published data, for example, without the need for individual level genotype data. Through simulation, we show that our method is immune to the presence of bi-directional effects, with no apparent loss in power across a range of different mixtures, and can achieve greater power than existing approaches as long as summary statistics at each variant are robust. We apply our method to investigate association of type-1 diabetes with imputed rare variants within genes in the major histocompatibility complex using genotype data from the Wellcome Trust Case Control Consortium. 相似文献

5.

Leveraging auxiliary data from arbitrary distributions to boost GWAS discovery with Flexible cFDR

Anna Hutchinson Guillermo Reales Thomas Willis Chris Wallace 《PLoS genetics》2021,17(10)

Genome-wide association studies (GWAS) have identified thousands of genetic variants that are associated with complex traits. However, a stringent significance threshold is required to identify robust genetic associations. Leveraging relevant auxiliary covariates has the potential to boost statistical power to exceed the significance threshold. Particularly, abundant pleiotropy and the non-random distribution of SNPs across various functional categories suggests that leveraging GWAS test statistics from related traits and/or functional genomic data may boost GWAS discovery. While type 1 error rate control has become standard in GWAS, control of the false discovery rate can be a more powerful approach. The conditional false discovery rate (cFDR) extends the standard FDR framework by conditioning on auxiliary data to call significant associations, but current implementations are restricted to auxiliary data satisfying specific parametric distributions, typically GWAS p-values for related traits. We relax these distributional assumptions, enabling an extension of the cFDR framework that supports auxiliary covariates from arbitrary continuous distributions (“Flexible cFDR”). Our method can be applied iteratively, thereby supporting multi-dimensional covariate data. Through simulations we show that Flexible cFDR increases sensitivity whilst controlling FDR after one or several iterations. We further demonstrate its practical potential through application to an asthma GWAS, leveraging various functional genomic data to find additional genetic associations for asthma, which we validate in the larger, independent, UK Biobank data resource. 相似文献

6.

Genome survey on invasive veined rapa whelk (<Emphasis Type="Italic">Rapana venosa</Emphasis>) and development of microsatellite loci on large scale

Hao Song Mei-jie Yang Jing-chun Sun Tao Zhang Hai-Yan Wang 《Journal of genetics》2018,97(1):79-85

Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants. 相似文献

7.

Pleiotropy informed adaptive association test of multiple traits using genome‐wide association study summary data

Maria Masotti Bin Guo Baolin Wu 《Biometrics》2019,75(4):1076-1085

Genetic variants associated with disease outcomes can be used to develop personalized treatment. To reach this precision medicine goal, hundreds of large‐scale genome‐wide association studies (GWAS) have been conducted in the past decade to search for promising genetic variants associated with various traits. They have successfully identified tens of thousands of disease‐related variants. However, in total these identified variants explain only part of the variation for most complex traits. There remain many genetic variants with small effect sizes to be discovered, which calls for the development of (a) GWAS with more samples and more comprehensively genotyped variants, for example, the NHLBI Trans‐Omics for Precision Medicine (TOPMed) Program is planning to conduct whole genome sequencing on over 100 000 individuals; and (b) novel and more powerful statistical analysis methods. The current dominating GWAS analysis approach is the “single trait” association test, despite the fact that many GWAS are conducted in deeply phenotyped cohorts including many correlated and well‐characterized outcomes, which can help improve the power to detect novel variants if properly analyzed, as suggested by increasing evidence that pleiotropy, where a genetic variant affects multiple traits, is the norm in genome‐phenome associations. We aim to develop pleiotropy informed powerful association test methods across multiple traits for GWAS. Since it is generally very hard to access individual‐level GWAS phenotype and genotype data for those existing GWAS, due to privacy concerns and various logistical considerations, we develop rigorous statistical methods for pleiotropy informed adaptive multitrait association test methods that need only summary association statistics publicly available from most GWAS. We first develop a pleiotropy test, which has powerful performance for truly pleiotropic variants but is sensitive to the pleiotropy assumption. We then develop a pleiotropy informed adaptive test that has robust and powerful performance under various genetic models. We develop accurate and efficient numerical algorithms to compute the analytical P‐value for the proposed adaptive test without the need of resampling or permutation. We illustrate the performance of proposed methods through application to joint association test of GWAS meta‐analysis summary data for several glycemic traits. Our proposed adaptive test identified several novel loci missed by individual trait based GWAS meta‐analysis. All the proposed methods are implemented in a publicly available R package. 相似文献

8.

General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

Seunggeun Lee Tanya?M. Teslovich Michael Boehnke Xihong Lin 《American journal of human genetics》2013,93(1):42-53

We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. 相似文献

9.

So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests 总被引：3，自引：1，他引：2

下载免费PDF全文

Conneely KN Boehnke M 《American journal of human genetics》2007,81(6):1158-1168

Contemporary genetic association studies may test hundreds of thousands of genetic variants for association, often with multiple binary and continuous traits or under more than one model of inheritance. Many of these association tests may be correlated with one another because of linkage disequilibrium between nearby markers and correlation between traits and models. Permutation tests and simulation-based methods are often employed to adjust groups of correlated tests for multiple testing, since conventional methods such as Bonferroni correction are overly conservative when tests are correlated. We present here a method of computing P values adjusted for correlated tests (P_ACT) that attains the accuracy of permutation or simulation-based tests in much less computation time, and we show that our method applies to many common association tests that are based on multiple traits, markers, and genetic models. Simulation demonstrates that P_ACT attains the power of permutation testing and provides a valid adjustment for hundreds of correlated association tests. In data analyzed as part of the Finland–United States Investigation of NIDDM Genetics (FUSION) study, we observe a near one-to-one relationship (r²>.999) between P_ACT and the corresponding permutation-based P values, achieving the same precision as permutation testing but thousands of times faster. 相似文献

10.

Testing Hardy-Weinberg proportions in a frequency-matched case-control genetic association study

Wang J Shete S 《PloS one》2011,6(11):e27642

In case-control genetic association studies, cases are subjects with the disease and controls are subjects without the disease. At the time of case-control data collection, information about secondary phenotypes is also collected. In addition to studies of primary diseases, there has been some interest in studying genetic variants associated with secondary phenotypes. In genetic association studies, the deviation from Hardy-Weinberg proportion (HWP) of each genetic marker is assessed as an initial quality check to identify questionable genotypes. Generally, HWP tests are performed based on the controls for the primary disease or secondary phenotype. However, when the disease or phenotype of interest is common, the controls do not represent the general population. Therefore, using only controls for testing HWP can result in a highly inflated type I error rate for the disease- and/or phenotype-associated variants. Recently, two approaches, the likelihood ratio test (LRT) approach and the mixture HWP (mHWP) exact test were proposed for testing HWP in samples from case-control studies. Here, we show that these two approaches result in inflated type I error rates and could lead to the removal from further analysis of potential causal genetic variants associated with the primary disease and/or secondary phenotype when the study of primary disease is frequency-matched on the secondary phenotype. Therefore, we proposed alternative approaches, which extend the LRT and mHWP approaches, for assessing HWP that account for frequency matching. The goal was to maintain more (possible causative) single-nucleotide polymorphisms in the sample for further analysis. Our simulation results showed that both extended approaches could control type I error probabilities. We also applied the proposed approaches to test HWP for SNPs from a genome-wide association study of lung cancer that was frequency-matched on smoking status and found that the proposed approaches can keep more genetic variants for association studies. 相似文献

11.

Contribution of Large Region Joint Associations to Complex Traits Genetics

Guillaume Paré Senay Asma Wei Q. Deng 《PLoS genetics》2015,11(4)

A polygenic model of inheritance, whereby hundreds or thousands of weakly associated variants contribute to a trait’s heritability, has been proposed to underlie the genetic architecture of complex traits. However, relatively few genetic variants have been positively identified so far and they collectively explain only a small fraction of the predicted heritability. We hypothesized that joint association of multiple weakly associated variants over large chromosomal regions contributes to complex traits variance. Confirmation of such regional associations can help identify new loci and lead to a better understanding of known ones. To test this hypothesis, we first characterized the ability of commonly used genetic association models to identify large region joint associations. Through theoretical derivation and simulation, we showed that multivariate linear models where multiple SNPs are included as independent predictors have the most favorable association profile. Based on these results, we tested for large region association with height in 3,740 European participants from the Health and Retirement Study (HRS) study. Adjusting for SNPs with known association with height, we demonstrated clustering of weak associations (p = 2x10^-4) in regions extending up to 433.0 Kb from known height loci. The contribution of regional associations to phenotypic variance was estimated at 0.172 (95% CI 0.063-0.279; p < 0.001), which compared favorably to 0.129 explained by known height variants. Conversely, we showed that suggestively associated regions are enriched for known height loci. To extend our findings to other traits, we also tested BMI, HDLc and CRP for large region associations, with consistent results for CRP. Our results demonstrate the presence of large region joint associations and suggest these can be used to pinpoint weakly associated SNPs. 相似文献

12.

An Efficient Stepwise Statistical Test to Identify Multiple Linked Human Genetic Variants Associated with Specific Phenotypic Traits

Iksoo Huh Min-Seok Kwon Taesung Park 《PloS one》2015,10(9)

Recent advances in genotyping methodologies have allowed genome-wide association studies (GWAS) to accurately identify genetic variants that associate with common or pathological complex traits. Although most GWAS have focused on associations with single genetic variants, joint identification of multiple genetic variants, and how they interact, is essential for understanding the genetic architecture of complex phenotypic traits. Here, we propose an efficient stepwise method based on the Cochran-Mantel-Haenszel test (for stratified categorical data) to identify causal joint multiple genetic variants in GWAS. This method combines the CMH statistic with a stepwise procedure to detect multiple genetic variants associated with specific categorical traits, using a series of associated I × J contingency tables and a null hypothesis of no phenotype association. Through a new stratification scheme based on the sum of minor allele count criteria, we make the method more feasible for GWAS data having sample sizes of several thousands. We also examine the properties of the proposed stepwise method via simulation studies, and show that the stepwise CMH test performs better than other existing methods (e.g., logistic regression and detection of associations by Markov blanket) for identifying multiple genetic variants. Finally, we apply the proposed approach to two genomic sequencing datasets to detect linked genetic variants associated with bipolar disorder and obesity, respectively. 相似文献

13.

Lipoprotein lipase gene sequencing and plasma lipid profile

Dilek Pirim Xingbin Wang Zaheda H. Radwan Vipavee Niemsiri John E. Hokanson Richard F. Hamman M. Michael Barmada F. Yesim Demirci M. Ilyas Kamboh 《Journal of lipid research》2014,55(1):85-93

Lipoprotein lipase (LPL) plays a crucial role in lipid metabolism by hydrolyzing triglyceride (TG)-rich particles and affecting HDL cholesterol (HDL-C) levels. In this study, the entire LPL gene plus flanking regions were resequenced in individuals with extreme HDL-C/TG levels (n = 95), selected from a population-based sample of 623 US non-Hispanic White (NHW) individuals. A total of 176 sequencing variants were identified, including 28 novel variants. A subset of 64 variants [common tag single nucleotide polymorphisms (tagSNP) and selected rare variants] were genotyped in the total sample, followed by association analyses with major lipid traits. A gene-based association test including all genotyped variants revealed significant association with HDL-C (P = 0.024) and TG (P = 0.006). Our single-site analysis revealed seven independent signals (P < 0.05; r² < 0.40) with either HDL-C or TG. The most significant association was for the SNP rs295 exerting opposite effects on TG and HDL-C levels with P values of 7.5.10⁻⁴ and 0.002, respectively. Our work highlights some common variants and haplotypes in LPL with significant associations with lipid traits; however, the analysis of rare variants using burden tests and SKAT-O method revealed negligible effects on lipid traits. Comprehensive resequencing of LPL in larger samples is warranted to further test the role of rare variants in affecting plasma lipid levels. 相似文献

14.

Detection of parent-of-origin effects for quantitative traits using general pedigree data

HAI-QIANG HE WEI-GAO MAO DONGDONG PAN JI-YUAN ZHOU PING-YAN CHEN WING KAM FUNG 《Journal of genetics》2014,93(2):339-347

Genomic imprinting is a genetic phenomenon in which certain alleles are differentially expressed in a parent-of-origin-specific manner, and plays an important role in the study of complex traits. For a diallelic marker locus in human, the parental-asymmetry tests Q-PAT(c) with any constant c were developed to detect parent-of-origin effects for quantitative traits. However, these methods can only be applied to deal with nuclear families and thus are not suitable for extended pedigrees. In this study, by making no assumption about the distribution of the quantitative trait, we first propose the pedigree parental-asymmetry tests Q-PPAT(c) with any constant c for quantitative traits to test for parent-of-origin effects based on nuclear families with complete information from general pedigree data, in the presence of association between marker alleles under study and quantitative traits. When there are any genotypes missing in pedigrees, we utilize Monte Carlo (MC) sampling and estimation and develop the Q-MCPPAT(c) statistics to test for parent-of-origin effects. Various simulation studies are conducted to assess the performance of the proposed methods, for different sample sizes, genotype missing rates, degrees of imprinting effects and population models. Simulation results show that the proposed methods control the size well under the null hypothesis of no parent-of-origin effects and Q-PPAT(c) are robust to population stratification. In addition, the power comparison demonstrates that Q-PPAT(c) and Q-MCPPAT(c) for pedigree data are much more powerful than Q-PAT(c) only using two-generation nuclear families selected from extended pedigrees. 相似文献

15.

Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method

Qi Yan Daniel E. Weeks Juan C. Celedón Hemant K. Tiwari Bingshan Li Xiaojing Wang Wan-Yu Lin Xiang-Yang Lou Guimin Gao Wei Chen Nianjun Liu 《Genetics》2015,201(4):1329-1339

The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study. 相似文献

16.

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

Kinnamon DD Hershberger RE Martin ER 《PloS one》2012,7(2):e30238

Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants. 相似文献

17.

Meta-analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics

Yi-Juan Hu Sonja?I. Berndt Stefan Gustafsson Andrea Ganna Genetic Investigation of ANthropometric Traits Consortium Joel Hirschhorn Kari E. North Erik Ingelsson Dan-Yu Lin 《American journal of human genetics》2013,93(2):236-248

Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying “causal” rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. 相似文献

18.

Underestimated effect sizes in GWAS: fundamental limitations of single SNP analysis for dichotomous phenotypes

Stringer S Wray NR Kahn RS Derks EM 《PloS one》2011,6(11):e27964

Complex diseases are often highly heritable. However, for many complex traits only a small proportion of the heritability can be explained by observed genetic variants in traditional genome-wide association (GWA) studies. Moreover, for some of those traits few significant SNPs have been identified. Single SNP association methods test for association at a single SNP, ignoring the effect of other SNPs. We show using a simple multi-locus odds model of complex disease that moderate to large effect sizes of causal variants may be estimated as relatively small effect sizes in single SNP association testing. This underestimation effect is most severe for diseases influenced by numerous risk variants. We relate the underestimation effect to the concept of non-collapsibility found in the statistics literature. As described, continuous phenotypes generated with linear genetic models are not affected by this underestimation effect. Since many GWA studies apply single SNP analysis to dichotomous phenotypes, previously reported results potentially underestimate true effect sizes, thereby impeding identification of true effect SNPs. Therefore, when a multi-locus model of disease risk is assumed, a multi SNP analysis may be more appropriate. 相似文献

19.

A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans

Ana I. Vazquez Gustavo de los Campos Yann C. Klimentidis Guilherme J. M. Rosa Daniel Gianola Nengjun Yi David B. Allison 《Genetics》2012,192(4):1493-1502

Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk. 相似文献

20.

Filtering genetic variants and placing informative <Emphasis Type="Italic">priors</Emphasis> based on putative biological function

Stefanie?Friedrichs D?rthe?Malzahn Elizabeth?W.?Pugh Marcio?Almeida Xiao?Qing?Liu Julia?N.?Bailey Email author 《BMC genetics》2016,17(Z2):S8

High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure. 相似文献