首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
罗旭红刘志芳  董长征 《遗传》2013,35(9):1065-1071
全基因组关联研究(Genome wide association study, GWAS)已经在国内外的医学遗传学研究中得到广泛应用, 但是GWAS数据中所蕴含的与多基因复杂性状疾病机制相关的丰富信息尚未得到深度挖掘。近年来, 研究者采用生物网络分析和生物通路分析等生物信息学和生物统计学手段分析GWAS数据, 并探索潜在的疾病机制。生物网络分析和生物通路分析主要是以基因为单位进行的, 因此必须在分析前将基因上全部或者部分单个单核苷酸多态性(Single nucleotide polymorphism, SNP)的遗传关联结果综合起来, 即基因水平的关联分析。基因水平的关联分析需要考虑单个SNP的遗传关联、基因上SNP数量和SNP之间的连锁不平衡结构等多种因素, 因此不仅在遗传学的概念上也在统计方法方面具有一定的复杂性和挑战性。文章对基因水平的关联分析的研究进展、原理和应用进行了综述。  相似文献   

2.
Gene-based association analysis is an effective gene-mapping tool. Many gene-based methods have been proposed recently. However, their power depends on the underlying genetic architecture, which is rarely known in complex traits, and so it is likely that a combination of such methods could serve as a universal approach. Several frameworks combining different gene-based methods have been developed. However, they all imply a fixed set of methods, weights and functional annotations. Moreover, most of them use individual phenotypes and genotypes as input data. Here, we introduce sumSTAAR, a framework for gene-based association analysis using summary statistics obtained from genome-wide association studies (GWAS). It is an extended and modified version of STAAR framework proposed by Li and colleagues in 2020. The sumSTAAR framework offers a wider range of gene-based methods to combine. It allows the user to arbitrarily define a set of these methods, weighting functions and probabilities of genetic variants being causal. The methods used in the framework were adapted to analyse genes with large number of SNPs to decrease the running time. The framework includes the polygene pruning procedure to guard against the influence of the strong GWAS signals outside the gene. We also present new improved matrices of correlations between the genotypes of variants within genes. These matrices estimated on a sample of 265,000 individuals are a state-of-the-art replacement of widely used matrices based on the 1000 Genomes Project data.  相似文献   

3.
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster‐based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single‐ and multilocus models that can efficiently conduct the association tests on such high‐dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.  相似文献   

4.
Genome-wide association studies (GWAS) using family data involve association analyses between hundreds of thousands of markers and a trait for a large number of related individuals. The correlations among relatives bring statistical and computational challenges when performing these large-scale association analyses. Recently, several rapid methods accounting for both within- and between-family variation have been proposed. However, these techniques mostly model the phenotypic similarities in terms of genetic relatedness. The familial resemblances in many family-based studies such as twin studies are not only due to the genetic relatedness, but also derive from shared environmental effects and assortative mating. In this paper, we propose 2 generalized least squares (GLS) models for rapid association analysis of family-based GWAS, which accommodate both genetic and environmental contributions to familial resemblance. In our first model, we estimated the joint genetic and environmental variations. In our second model, we estimated the genetic and environmental components separately. Through simulation studies, we demonstrated that our proposed approaches are more powerful and computationally efficient than a number of existing methods are. We show that estimating the residual variance-covariance matrix in the GLS models without SNP effects does not lead to an appreciable bias in the p values as long as the SNP effect is small (i.e. accounting for no more than 1% of trait variance).  相似文献   

5.
Genome-wide association studies (GWAS) are designed to identify the portion of single-nucleotide polymorphisms (SNPs) in genome sequences associated with a complex trait. Strategies based on the gene list enrichment concept are currently applied for the functional analysis of GWAS, according to which a significant overrepresentation of candidate genes associated with a biological pathway is used as a proxy to infer overrepresentation of candidate SNPs in the pathway. Here we show that such inference is not always valid and introduce the program SNP2GO, which implements a new method to properly test for the overrepresentation of candidate SNPs in biological pathways.  相似文献   

6.
Genome-wide association studies (GWAS) have successfully identified many genetic variants associated with complex diseases and traits. However, functional consequence of genetic variants studied in GWAS is not yet fully investigated, which would hinder the application of GWAS. We therefore performed a systematic functional analysis of HapMap SNPs, which have been most commonly used as the reference panel for GWAS. Our study highlights several characteristics of HapMap SNPs and identifies subsets of genetic variants with interesting functional implication. The results show that HapMap SNPs have good coverage within RefSeq genes, especially within known disease-related genes. On the other hand, only a small percentage of SNPs are non-synonymous SNPs while many SNPs are actually located at gene deserts. Moreover, many functionally important variants are not yet still interrogated. A redesigned SNP reference panel with additional functionally important variants would be useful to identify disease-causal variants in the future genome-wide studies.  相似文献   

7.
Lehne B  Lewis CM  Schlitt T 《PloS one》2011,6(6):e20133
Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes.  相似文献   

8.
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.  相似文献   

9.
Ma L  Han S  Yang J  Da Y 《PloS one》2010,5(11):e15006
Complex diseases or phenotypes may involve multiple genetic variants and interactions between genetic, environmental and other factors. Current genome-wide association studies (GWAS) mostly used single-locus analysis and had identified genetic effects with multiple confirmations. Such confirmed single-nucleotide polymorphism (SNP) effects were likely to be true genetic effects and ignoring this information in testing new effects of the same phenotype results in decreased statistical power due to increased residual variance that has a component of the omitted effects. In this study, a multi-locus association test (MLT) was proposed for GWAS analysis conditional on SNPs with confirmed effects to improve statistical power. Analytical formulae for statistical power were derived and were verified by simulation for MLT accounting for confirmed SNPs and for single-locus test (SLT) without accounting for confirmed SNPs. Statistical power of the two methods was compared by case studies with simulated and the Framingham Heart Study (FHS) GWAS data. Results showed that the MLT method had increased statistical power over SLT. In the GWAS case study on four cholesterol phenotypes and serum metabolites, the MLT method improved statistical power by 5% to 38% depending on the number and effect sizes of the conditional SNPs. For the analysis of HDL cholesterol (HDL-C) and total cholesterol (TC) of the FHS data, the MLT method conditional on confirmed SNPs from GWAS catalog and NCBI had considerably more significant results than SLT.  相似文献   

10.

Background

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.

Methodology

A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.

Conclusions

Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.  相似文献   

11.
GCTA: a tool for genome-wide complex trait analysis   总被引:7,自引:0,他引:7  
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.  相似文献   

12.

Background  

Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs.  相似文献   

13.
Genome-wide association studies (GWAS) examine the entire human genome with the goal of identifying genetic variants (usually single nucleotide polymorphisms (SNPs)) that are associated with phenotypic traits such as disease status and drug response. The discordance of significantly associated SNPs for the same disease identified from different GWAS indicates that false associations exist in such results. In addition to the possible sources of spurious associations that have been investigated and discussed intensively, such as sample size and population stratification, an accurate and reproducible genotype calling algorithm is required for concordant GWAS results from different studies. However, variations of genotype calling of an algorithm and their effects on significantly associated SNPs identified in downstream association analyses have not been systematically investigated. In this paper, the variations of genotype calling using the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm and the resulting influence on the lists of significantly associated SNPs were evaluated using the raw data of 270 HapMap samples analysed with the Affymetrix Human Mapping 500K Array Set (Affy500K) by changing algorithmic parameters. Modified were the Dynamic Model (DM) call confidence threshold (threshold) and the number of randomly selected SNPs (size). Comparative analysis of the calling results and the corresponding lists of significantly associated SNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs. Both the threshold and the size affected the called genotypes and the lists of significantly associated SNPs in association analysis. The effect of the threshold was much larger than the effect of the size. Moreover, the heterozygous calls had lower consistency compared to the homozygous calls.  相似文献   

14.
Liu LY  Schaub MA  Sirota M  Butte AJ 《Human genetics》2012,131(3):353-364
Men and women differ in susceptibility to many diseases and in responses to treatment. Recent advances in genome-wide association studies (GWAS) provide a wealth of data for associating genetic profiles with disease risk; however, in general, these data have not been systematically probed for sex differences in gene-disease associations. Incorporating sex into the analysis of GWAS results can elucidate new relationships between single nucleotide polymorphisms (SNPs) and human disease. In this study, we performed a sex-differentiated analysis on significant SNPs from GWAS data of the seven common diseases studied by the Wellcome Trust Case Control Consortium. We employed and compared three methods: logistic regression, Woolf’s test of heterogeneity, and a novel statistical metric that we developed called permutation method to assess sex effects (PMASE). After correction for false discovery, PMASE finds SNPs that are significantly associated with disease in only one sex. These sexually dimorphic SNP-disease associations occur in Coronary Artery Disease and Crohn’s Disease. GWAS analyses that fail to consider sex-specific effects may miss discovering sexual dimorphism in SNP-disease associations that give new insights into differences in disease mechanism between men and women.  相似文献   

15.
Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn''s disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.  相似文献   

16.
Kuo CL  Zaykin DV 《Genetics》2011,189(1):329-340
In recent years, genome-wide association studies (GWAS) have uncovered a large number of susceptibility variants. Nevertheless, GWAS findings provide only tentative evidence of association, and replication studies are required to establish their validity. Due to this uncertainty, researchers often focus on top-ranking SNPs, instead of considering strict significance thresholds to guide replication efforts. The number of SNPs for replication is often determined ad hoc. We show how the rank-based approach can be used for sample size allocation in GWAS as well as for deciding on a number of SNPs for replication. The basis of this approach is the "ranking probability": chances that at least j true associations will rank among top u SNPs, when SNPs are sorted by P-value. By employing simple but accurate approximations for ranking probabilities, we accommodate linkage disequilibrium (LD) and evaluate consequences of ignoring LD. Further, we relate ranking probabilities to the proportion of false discoveries among top u SNPs. A study-specific proportion can be estimated from P-values, and its expected value can be predicted for study design applications.  相似文献   

17.
Long non-coding RNAs are a new class of non-coding RNAs that are at the crosshairs in many human diseases such as cancers, cardiovascular disorders, inflammatory and autoimmune disease like Inflammatory Bowel Disease (IBD) and Type 1 Diabetes (T1D). Nearly 90% of the phenotype-associated single-nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) lie outside of the protein coding regions, and map to the non-coding intervals. However, the relationship between phenotype-associated loci and the non-coding regions including the long non-coding RNAs (lncRNAs) is poorly understood. Here, we systemically identified all annotated IBD and T1D loci-associated lncRNAs, and mapped nominally significant GWAS/ImmunoChip SNPs for IBD and T1D within these lncRNAs. Additionally, we identified tissue-specific cis-eQTLs, and strong linkage disequilibrium (LD) signals associated with these SNPs. We explored sequence and structure based attributes of these lncRNAs, and also predicted the structural effects of mapped SNPs within them. We also identified lncRNAs in IBD and T1D that are under recent positive selection. Our analysis identified putative lncRNA secondary structure-disruptive SNPs within and in close proximity (+/−5 kb flanking regions) of IBD and T1D loci-associated candidate genes, suggesting that these RNA conformation-altering polymorphisms might be associated with diseased-phenotype. Disruption of lncRNA secondary structure due to presence of GWAS SNPs provides valuable information that could be potentially useful for future structure-function studies on lncRNAs.  相似文献   

18.
全基因组关联分析的进展与反思   总被引:1,自引:0,他引:1  
Tu X  Shi LS  Wang F  Wang Q 《生理科学进展》2010,41(2):87-94
全基因组关联分析(genomewide association study,GWAS)是应用人类基因组中数以百万计的单核苷酸多态性(single nucleotide polymorphism,SNP)为标记进行病例-对照关联分析,以期发现影响复杂性疾病发生的遗传特征的一种新策略。近年来,随着人类基因组计划和基因组单倍体图谱计划的实施,人们已通过GWAS方法发现并鉴定了大量与人类性状或复杂性疾病关联的遗传变异,为进一步了解控制人类复杂性疾病发生的遗传特征提供了重要的线索。然而,由于造成复杂性疾病/性状的因素较多,而且GWAS研究系统较为复杂,因此目前GWAS本身亦存在诸多的问题。本文将从研究方式、研究对象、遗传标记,以及统计分析等方面,探讨GWAS的研究现状以及存在的潜在问题,并展望GWAS今后的发展方向。  相似文献   

19.
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.  相似文献   

20.
Casto AM  Feldman MW 《PLoS genetics》2011,7(1):e1001266
Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号