首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.  相似文献   

2.
Genotype imputation has the potential to assess human genetic variation at a lower cost than assaying the variants using laboratory techniques. The performance of imputation for rare variants has not been comprehensively studied. We utilized 8865 human samples with high depth resequencing data for the exons and flanking regions of 202 genes and Genome-Wide Association Study (GWAS) data to characterize the performance of genotype imputation for rare variants. We evaluated reference sets ranging from 100 to 3713 subjects for imputing into samples typed for the Affymetrix (500K and 6.0) and Illumina 550K GWAS panels. The proportion of variants that could be well imputed (true r2>0.7) with a reference panel of 3713 individuals was: 31% (Illumina 550K) or 25% (Affymetrix 500K) with MAF (Minor Allele Frequency) less than or equal 0.001, 48% or 35% with 0.0010.05. The performance for common SNPs (MAF>0.05) within exons and flanking regions is comparable to imputation of more uniformly distributed SNPs. The performance for rare SNPs (0.01相似文献   

3.
Genetic association studies require that the genotype data from a given person can be correctly linked to the phenotype data from the same person. However, sample misidentification errors sometimes happen, whereby the link becomes invalid for some of the subjects in a study. This can have substantial consequences in terms of power to detect truly associated variants. In family-based studies, Mendelian inconsistencies can be used to detect sample misidentification. Genome-wide association studies (GWAS), however, typically use unrelated individuals, making error detection more problematic. Here we present a method for identifying potential sample misidentifications in GWAS and other genetic association studies building on ideas from forensic sciences. A widely used ad-hoc method for error detection is to check if the sex of an individual matches its X-linked genotype. We generalize this idea to less stringent associations between known genotypes and phenotypes, and show that if several known associations are combined, the power to detect misidentifications increases substantially. Individuals with an unlikely set of phenotypes given their genotypes are flagged as potential errors. We provide analytical and simulation results comparing the odds that the genotype and phenotype are both from the same individual for different numbers of available genotype-p henotype associations and for different information content of the associations. Our method has good sensitivity and specificity with as few as ten moderately informative genotype-phenotype associations. We apply the method to GWAS data from the Danish National Birth Cohort.  相似文献   

4.
Common variants explain little of the variance of most common disease,prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases.Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power.To estimate the performance of imputation of rare variants,we imputed 153 individuals,each of whom was genotyped on 3 different genotype arrays including 317k,610k and 1 million single nucleotide polymorphisms(SNPs),to two different reference panels:HapMap2 and 1000 Genomes pilot March 2010 release (lKGpilot) by using IMPUTE version 2.We found that more than 94%and 84%of all SNPs yield acceptable accuracy(info > 0.4) in HapMap2 and lKGpilot-based imputation,respectively.For rare variants(minor allele frequency(MAF) <5%),the proportion of wellimputed SNPs increased as the MAF increased from 0.3%to 5%across all 3 genome-wide association study(GWAS) datasets.The proportion of well-imputed SNPs was 69%,60%and 49%for SNPs with a MAF from 0.3%to 5%for 1M,610k and 317k,respectively. None of the very rare variants(MAF < 0.3%) were well imputed.We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small.Variants with lower MAF are more difficult to impute.These findings have important implications in the design and replication of large-scale sequencing studies.  相似文献   

5.
6.

Background

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.

Methodology

A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.

Conclusions

Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.  相似文献   

7.
Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.  相似文献   

8.
9.
Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The “missing heritability” has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called “Epistasis Test in Meta-Analysis” (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis. We defined a series of conditions to generate simulation data and tested the power and type I error rates in ETMA, individual data analysis and conventional meta-regression-based method. ETMA not only successfully facilitated consistency of evidence but also yielded acceptable type I error and higher power than conventional meta-regression. We applied ETMA to three real meta-analysis data sets. We found significant gene–gene interactions in the renin–angiotensin system and the polycyclic aromatic hydrocarbon metabolism pathway, with strong supporting evidence. In addition, glutathione S-transferase (GST) mu 1 and theta 1 were confirmed to exert independent effects on cancer. We concluded that the application of ETMA to real meta-analysis data was successful. Finally, we developed an R package, etma, for the detection of epistasis in meta-analysis [etma is available via the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/etma/index.html].  相似文献   

10.
ABSTRACT: BACKGROUND: Single nucleotide polymorphism (SNP) genotyping assays normally give rise to certain percents of no-calls; the problem becomes severe when the target organisms, such as cattle, do not have a high resolution genomic sequence. Missing SNP genotypes, when related to target traits, would confound downstream data analyses such as genome-wide association studies (GWAS). Existing methods for recovering the missing values are successful to some extent --- either accurate but not fast enough or fast but not accurate enough. RESULTS: To a target missing genotype, we take only the SNP loci within a genetic distance vicinity and only the samples within a similarity vicinity into our local imputation process. For missing genotype imputation, the comparative performance evaluations through extensive simulation studies using real human and cattle genotype datasets demonstrated that our nearest neighbor based local imputation method was one of the most efficient methods, and outperformed existing methods except the time-consuming fastPHASE; for missing haplotype allele imputation, the comparative performance evaluations using real mouse haplotype datasets demonstrated that our method was not only one of the most efficient methods, but also one of the most accurate methods. CONCLUSIONS: Given that fastPHASE requires a long imputation time on medium to high density datasets, and that our nearest neighbor based local imputation method only performed slightly worse, yet better than all other methods, one might want to adopt our method as an alternative missing SNP genotype or missing haplotype allele imputation method.  相似文献   

11.
An increasing number of field studies have shown that the phenotype of an individual plant depends not only on its genotype but also on those of neighboring plants; however, this fact is not taken into consideration in genome-wide association studies (GWAS). Based on the Ising model of ferromagnetism, we incorporated neighbor genotypic identity into a regression model, named “Neighbor GWAS”. Our simulations showed that the effective range of neighbor effects could be estimated using an observed phenotype when the proportion of phenotypic variation explained (PVE) by neighbor effects peaked. The spatial scale of the first nearest neighbors gave the maximum power to detect the causal variants responsible for neighbor effects, unless their effective range was too broad. However, if the effective range of the neighbor effects was broad and minor allele frequencies were low, there was collinearity between the self and neighbor effects. To suppress the false positive detection of neighbor effects, the fixed effect and variance components involved in the neighbor effects should be tested in comparison with a standard GWAS model. We applied neighbor GWAS to field herbivory data from 199 accessions of Arabidopsis thaliana and found that neighbor effects explained 8% more of the PVE of the observed damage than standard GWAS. The neighbor GWAS method provides a novel tool that could facilitate the analysis of complex traits in spatially structured environments and is available as an R package at CRAN (https://cran.rproject.org/package=rNeighborGWAS).Subject terms: Quantitative trait, Plant ecology, Ecological genetics  相似文献   

12.
Browning SR 《Human genetics》2008,124(5):439-450
Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance.  相似文献   

13.
ABSTRACT: BACKGROUND: Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increasethe power to detect strong or weak genotype effects or b) as a result verification method. As a consequence ofdiffering SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia toavoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enablescross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. RESULTS: Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying onimputation. This is accomplished by using reference linkage disequilibrium data from 1,000Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least onestudy. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Ouralgorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applicationscan be used as input files for MA, tremendously speeding up MA compared to the conventional imputationapproach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possiblyproviding an incentive for follow-up studies. We propose our method as a quick screening step prior toimputation-based MA, as well as an additional main approach for studies without available reference datamatching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II DiabetesGWAS and found that the proxy algorithm clearly outperforms naive MA on the P-value level: for 17 out of23 we observe an improvement on the p-value level by a factor of more than two, and a maximumimprovement by a factor of 2127. CONCLUSIONS: YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventionalMA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss. MA with YAMAScan be readily conducted as YAMAS provides a generic parser for heterogeneous tabulated file formats withinthe GWAS field and avoids cumbersome setups. In this way, it supplements the meta-analysis process.  相似文献   

14.
15.
Maria Masotti  Bin Guo  Baolin Wu 《Biometrics》2019,75(4):1076-1085
Genetic variants associated with disease outcomes can be used to develop personalized treatment. To reach this precision medicine goal, hundreds of large‐scale genome‐wide association studies (GWAS) have been conducted in the past decade to search for promising genetic variants associated with various traits. They have successfully identified tens of thousands of disease‐related variants. However, in total these identified variants explain only part of the variation for most complex traits. There remain many genetic variants with small effect sizes to be discovered, which calls for the development of (a) GWAS with more samples and more comprehensively genotyped variants, for example, the NHLBI Trans‐Omics for Precision Medicine (TOPMed) Program is planning to conduct whole genome sequencing on over 100 000 individuals; and (b) novel and more powerful statistical analysis methods. The current dominating GWAS analysis approach is the “single trait” association test, despite the fact that many GWAS are conducted in deeply phenotyped cohorts including many correlated and well‐characterized outcomes, which can help improve the power to detect novel variants if properly analyzed, as suggested by increasing evidence that pleiotropy, where a genetic variant affects multiple traits, is the norm in genome‐phenome associations. We aim to develop pleiotropy informed powerful association test methods across multiple traits for GWAS. Since it is generally very hard to access individual‐level GWAS phenotype and genotype data for those existing GWAS, due to privacy concerns and various logistical considerations, we develop rigorous statistical methods for pleiotropy informed adaptive multitrait association test methods that need only summary association statistics publicly available from most GWAS. We first develop a pleiotropy test, which has powerful performance for truly pleiotropic variants but is sensitive to the pleiotropy assumption. We then develop a pleiotropy informed adaptive test that has robust and powerful performance under various genetic models. We develop accurate and efficient numerical algorithms to compute the analytical P‐value for the proposed adaptive test without the need of resampling or permutation. We illustrate the performance of proposed methods through application to joint association test of GWAS meta‐analysis summary data for several glycemic traits. Our proposed adaptive test identified several novel loci missed by individual trait based GWAS meta‐analysis. All the proposed methods are implemented in a publicly available R package.  相似文献   

16.
原发性高血压全基因组关联研究进展   总被引:2,自引:0,他引:2  
Xu RW  Yan WL 《遗传》2012,34(7):793-809
原发性高血压是一种由遗传与环境因素共同导致的复杂疾病,具有高度的遗传异质性。自2007年首个高血压全基因组关联研究(Genome-wide association studies,GWAS)报道以来,许多GWAS相继开展。文章首先对2007年1月至2011年9月期间报道的24篇血压/高血压易感基因的GWAS按人种与染色体位置对其结果进行汇总,经统计位点rs17249754、rs1378942和rs11191548报道频数最多。其次介绍了GWAS方法学的研究进展,包括选择高质量的数量表型和选择多阶段研究设计来增加研究发现阳性关联的机会。统计分析方面,除强调了已经报道过的多重比较和重复(验证)研究等问题外,文章还介绍了通过Meta分析对GWAS数据进行深度发掘,并应用基因型填补法对缺失数据进行填补可以提高全基因组遗传标记的覆盖率的方法。尽管GWAS发现了许多我们未知的基因与疾病表型的关联,为了解高血压的发病机制提供了更多线索,但是目前GWAS发现的血压/高血压相关变异多为对人群血压的影响极其微弱的常见变异。因此今后的研究中可加强深度功能学研究对易感基因精细定位和外显子组测序技术的应用,结合GWAS的成果进行生物信息学通路分析和表观遗传学机制研究等,逐步揭示高血压的遗传机制。  相似文献   

17.
Meta-analysis is an important tool in linkage analysis. The pooling of results across primary linkage studies allows greater statistical power to detect quantitative-trait loci (QTLs) and more-precise estimation of their genetic effects and, hence, yields conclusions that are stronger relative to those of individual studies. Previous methods for the meta-analysis of linkage studies have been proposed, and, although some methods address the problem of between-study heterogeneity, most methods still require linkage analysis at the same marker or set of markers across studies, whereas others do not result in an estimate of genetic variance. In this study, we present a meta-analytic procedure to evaluate evidence from several studies that report Haseman-Elston statistics for linkage to a QTL at multiple, possibly distinct, markers on a chromosome. This technique accounts for between-study heterogeneity and estimates both the location of the QTL and the magnitude of the genetic effect more precisely than does an individual study. We also provide standard errors for the genetic effect and for the location (in cM) of the QTL, using a resampling method. The approach can be applied under other conditions, provided that the various studies use the same linkage statistic.  相似文献   

18.
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.  相似文献   

19.
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available.  相似文献   

20.
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号