首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an ‘adaptive combination of P-values method’ (abbreviated as ‘ADA’). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders'' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.  相似文献   

2.
Untranslated gene regions (UTRs) play an important role in controlling gene expression. 3′-UTRs are primarily targeted by microRNA (miRNA) molecules that form complex gene regulatory networks. Cancer genomes are replete with non-coding mutations, many of which are connected to changes in tumor gene expression that accompany the development of cancer and are associated with resistance to therapy. Therefore, variants that occurred in 3′-UTR under cancer progression should be analysed to predict their phenotypic effect on gene expression, e.g., by evaluating their impact on miRNA target sites. Here, we analyze 3′-UTR variants in DICER1 and DROSHA genes in the context of myelodysplastic syndrome (MDS) development. The key features of this analysis include an assessment of both “canonical” and “non-canonical” types of mRNA-miRNA binding and tissue-specific profiling of miRNA interactions with wild-type and mutated genes. As a result, we obtained a list of DICER1 and DROSHA variants likely altering the miRNA sites and, therefore, potentially leading to the observed tissue-specific gene downregulation. All identified variants have low population frequency consistent with their potential association with pathology progression.  相似文献   

3.
The HOXB13 gene has been implicated in prostate cancer (PrCa) susceptibility. We performed a high resolution fine-mapping analysis to comprehensively evaluate the association between common genetic variation across the HOXB genetic locus at 17q21 and PrCa risk. This involved genotyping 700 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of 3195 SNPs in 20,440 PrCa cases and 21,469 controls in The PRACTICAL consortium. We identified a cluster of highly correlated common variants situated within or closely upstream of HOXB13 that were significantly associated with PrCa risk, described by rs117576373 (OR 1.30, P = 2.62×10−14). Additional genotyping, conditional regression and haplotype analyses indicated that the newly identified common variants tag a rare, partially correlated coding variant in the HOXB13 gene (G84E, rs138213197), which has been identified recently as a moderate penetrance PrCa susceptibility allele. The potential for GWAS associations detected through common SNPs to be driven by rare causal variants with higher relative risks has long been proposed; however, to our knowledge this is the first experimental evidence for this phenomenon of synthetic association contributing to cancer susceptibility.  相似文献   

4.
The obesity epidemic is responsible for a substantial economic burden in developed countries and is a major risk factor for type 2 diabetes and cardiovascular disease. The disease is the result not only of several environmental risk factors, but also of genetic predisposition. To take advantage of recent advances in gene-mapping technology, we executed a genome-wide association scan to identify genetic variants associated with obesity-related quantitative traits in the genetically isolated population of Sardinia. Initial analysis suggested that several SNPs in the FTO and PFKP genes were associated with increased BMI, hip circumference, and weight. Within the FTO gene, rs9930506 showed the strongest association with BMI (p = 8.6 ×107), hip circumference (p = 3.4 × 108), and weight (p = 9.1 × 107). In Sardinia, homozygotes for the rare “G” allele of this SNP (minor allele frequency = 0.46) were 1.3 BMI units heavier than homozygotes for the common “A” allele. Within the PFKP gene, rs6602024 showed very strong association with BMI (p = 4.9 × 106). Homozygotes for the rare “A” allele of this SNP (minor allele frequency = 0.12) were 1.8 BMI units heavier than homozygotes for the common “G” allele. To replicate our findings, we genotyped these two SNPs in the GenNet study. In European Americans (N = 1,496) and in Hispanic Americans (N = 839), we replicated significant association between rs9930506 in the FTO gene and BMI (p-value for meta-analysis of European American and Hispanic American follow-up samples, p = 0.001), weight (p = 0.001), and hip circumference (p = 0.0005). We did not replicate association between rs6602024 and obesity-related traits in the GenNet sample, although we found that in European Americans, Hispanic Americans, and African Americans, homozygotes for the rare “A” allele were, on average, 1.0–3.0 BMI units heavier than homozygotes for the more common “G” allele. In summary, we have completed a whole genome–association scan for three obesity-related quantitative traits and report that common genetic variants in the FTO gene are associated with substantial changes in BMI, hip circumference, and body weight. These changes could have a significant impact on the risk of obesity-related morbidity in the general population.  相似文献   

5.
6.
To date, genome-wide association studies have identified thousands of statistically-significant associations between genetic variants, and phenotypes related to a myriad of traits and diseases. A key goal for human-genetics research is to translate these associations into functional mechanisms. Popular gene-set analysis tools, like MAGMA, map variants to genes they might affect, and then integrate genome-wide association study data (that is, variant-level associations for a phenotype) to score genes for association with a phenotype. Gene scores are subsequently used in competitive gene-set analyses to identify biological processes that are enriched for phenotype association. By default, variants are mapped to genes in their proximity. However, many variants that affect phenotypes are thought to act at regulatory elements, which can be hundreds of kilobases away from their target genes. Thus, we explored the idea of augmenting a proximity-based mapping scheme with publicly-available datasets of regulatory interactions. We used MAGMA to analyze genome-wide association study data for ten different phenotypes, and evaluated the effects of augmentation by comparing numbers, and identities, of genes and gene sets detected as statistically significant between mappings. We detected several pitfalls and confounders of such “augmented analyses”, and introduced ways to control for them. Using these controls, we demonstrated that augmentation with datasets of regulatory interactions only occasionally strengthened the enrichment for phenotype association amongst (biologically-relevant) gene sets for different phenotypes. Still, in such cases, genes and regulatory elements responsible for the improvement could be pinpointed. For instance, using brain regulatory-interactions for augmentation, we were able to implicate two acetylcholine receptor subunits involved in post-synaptic chemical transmission, namely CHRNB2 and CHRNE, in schizophrenia. Collectively, our study presents a critical approach for integrating regulatory interactions into gene-set analyses for genome-wide association study data, by introducing various controls to distinguish genuine results from spurious discoveries.  相似文献   

7.
8.
Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create “synthetic associations” by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of “blocks” of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.  相似文献   

9.

Background

Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson’s disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects.

Results

We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson’s disease case–control dataset as a model to demonstrate the application of our method.Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson’s disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson’s disease.

Conclusions

Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson’s disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.

Electronic supplementary material

The online version of this article (doi:10.1186/s12929-014-0088-9) contains supplementary material, which is available to authorized users.  相似文献   

10.
11.
Major depressive disorder (MDD) is a psychiatric disorder, characterized by periods of low mood of more than two weeks, loss of interest in normally enjoyable activities and behavioral changes. MDD is a complex disorder and does not have a single genetic cause. In 2009 a genome wide association study (GWAS) was performed on the Dutch GAIN-MDD cohort. Many of the top signals of this GWAS mapped to a region spanning the gene PCLO, and the non-synonymous coding single nucleotide polymorphism (SNP) rs2522833 in the PCLO gene became genome wide significant after post-hoc analysis. We performed resequencing of PCLO, GRM7, and SLC6A4 in 50 control samples from the GAIN-MDD cohort, to detect new genomic variants. Subsequently, we genotyped these variants in the entire GAIN-MDD cohort and performed association analysis to investigate if rs2522833 is the causal variant or simply in linkage disequilibrium with a more associated variant. GRM7 and SLC6A4 are both candidate genes for MDD from literature. We aimed to gather more evidence that rs2522833 is indeed the causal variant in the GAIN-MDD cohort or to find a previously undetected common variant in either PCLO, GRM7, or SLC6A4 with a higher association in this cohort. After next generation sequencing and association analysis we excluded the possibility of an undetected common variant to be more associated. For neither PCLO nor GRM7 we found a more associated variant. For SLC6A4, we found a new SNP that showed a lower P-value (P = 0.07) than in the GAIN-MDD GWAS (P = 0.09). However, no evidence for genome-wide significance was found. Although we did not take into account rare variants, we conclude that our results provide further support for the hypothesis that the non-synonymous coding SNP rs2522833 in the PCLO gene is indeed likely to be the causal variant in the GAIN-MDD cohort.  相似文献   

12.
Sequencing studies have been discovering a numerous number of rare variants, allowing the identification of the effects of rare variants on disease susceptibility. As a method to increase the statistical power of studies on rare variants, several groupwise association tests that group rare variants in genes and detect associations between genes and diseases have been proposed. One major challenge in these methods is to determine which variants are causal in a group, and to overcome this challenge, previous methods used prior information that specifies how likely each variant is causal. Another source of information that can be used to determine causal variants is the observed data because case individuals are likely to have more causal variants than control individuals. In this article, we introduce a likelihood ratio test (LRT) that uses both data and prior information to infer which variants are causal and uses this finding to determine whether a group of variants is involved in a disease. We demonstrate through simulations that LRT achieves higher power than previous methods. We also evaluate our method on mutation screening data of the susceptibility gene for ataxia telangiectasia, and show that LRT can detect an association in real data. To increase the computational speed of our method, we show how we can decompose the computation of LRT, and propose an efficient permutation test. With this optimization, we can efficiently compute an LRT statistic and its significance at a genome-wide level. The software for our method is publicly available at http://genetics.cs.ucla.edu/rarevariants .  相似文献   

13.
Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.  相似文献   

14.
Lipoprotein lipase (LPL) plays a crucial role in lipid metabolism by hydrolyzing triglyceride (TG)-rich particles and affecting HDL cholesterol (HDL-C) levels. In this study, the entire LPL gene plus flanking regions were resequenced in individuals with extreme HDL-C/TG levels (n = 95), selected from a population-based sample of 623 US non-Hispanic White (NHW) individuals. A total of 176 sequencing variants were identified, including 28 novel variants. A subset of 64 variants [common tag single nucleotide polymorphisms (tagSNP) and selected rare variants] were genotyped in the total sample, followed by association analyses with major lipid traits. A gene-based association test including all genotyped variants revealed significant association with HDL-C (P = 0.024) and TG (P = 0.006). Our single-site analysis revealed seven independent signals (P < 0.05; r2 < 0.40) with either HDL-C or TG. The most significant association was for the SNP rs295 exerting opposite effects on TG and HDL-C levels with P values of 7.5.10−4 and 0.002, respectively. Our work highlights some common variants and haplotypes in LPL with significant associations with lipid traits; however, the analysis of rare variants using burden tests and SKAT-O method revealed negligible effects on lipid traits. Comprehensive resequencing of LPL in larger samples is warranted to further test the role of rare variants in affecting plasma lipid levels.  相似文献   

15.

Background

Vulnerabilities to dependence on addictive substances are substantially heritable complex disorders whose underlying genetic architecture is likely to be polygenic, with modest contributions from variants in many individual genes. “Nontemplate” genome wide association (GWA) approaches can identity groups of chromosomal regions and genes that, taken together, are much more likely to contain allelic variants that alter vulnerability to substance dependence than expected by chance.

Methodology/Principal Findings

We report pooled “nontemplate” genome-wide association studies of two independent samples of substance dependent vs control research volunteers (n = 1620), one European-American and the other African-American using 1 million SNP (single nucleotide polymorphism) Affymetrix genotyping arrays. We assess convergence between results from these two samples using two related methods that seek clustering of nominally-positive results and assess significance levels with Monte Carlo and permutation approaches. Both “converge then cluster” and “cluster then converge” analyses document convergence between the results obtained from these two independent datasets in ways that are virtually never found by chance. The genes identified in this fashion are also identified by individually-genotyped dbGAP data that compare allele frequencies in cocaine dependent vs control individuals.

Conclusions/Significance

These overlapping results identify small chromosomal regions that are also identified by genome wide data from studies of other relevant samples to extents much greater than chance. These chromosomal regions contain more genes related to “cell adhesion” processes than expected by chance. They also contain a number of genes that encode potential targets for anti-addiction pharmacotherapeutics. “Nontemplate” GWA approaches that seek chromosomal regions in which nominally-positive associations are found in multiple independent samples are likely to complement classical, “template” GWA approaches in which “genome wide” levels of significance are sought for SNP data from single case vs control comparisons.  相似文献   

16.
Genome-wide association studies (GWAS) have become a standard approach for exploring the genetic basis of phenotypic variation. However, correlation is not causation, and only a tiny fraction of all associations have been experimentally confirmed. One practical problem is that a peak of association does not always pinpoint a causal gene, but may instead be tagging multiple causal variants. In this study, we reanalyze a previously reported peak associated with flowering time traits in Swedish Arabidopsis thaliana population. The peak appeared to pinpoint the AOP2/AOP3 cluster of glucosinolate biosynthesis genes, which is known to be responsible for natural variation in herbivore resistance. Here we propose an alternative hypothesis, by demonstrating that the AOP2/AOP3 flowering association can be wholly accounted for by allelic variation in two flanking genes with clear roles in regulating flowering: NDX1, a regulator of the main flowering time controller FLC, and GA1, which plays a central role in gibberellin synthesis and is required for flowering under some conditions. In other words, we propose that the AOP2/AOP3 flowering-time association may be yet another example of a spurious, “synthetic” association, arising from trying to fit a single-locus model in the presence of two statistically associated causative loci. We conclude that caution is needed when using GWAS for fine-mapping.Subject terms: Genome-wide association studies, Quantitative trait  相似文献   

17.

Background

Multiple studies have provided compelling evidence that the FTO gene variants are associated with obesity measures. The objective of the study was to investigate whether FTO variants are associated with a broad range of obesity related anthropometric traits in an island population.

Methodology/Principal Findings

We examined genetic association between 29 FTO SNPs and a comprehensive set of anthropometric traits in 843 unrelated individuals from an island population in the eastern Adriatic coast of Croatia. The traits include 11 anthropometrics (height, weight, waist circumference, hip circumference, bicondilar upper arm width, upper arm circumference, and biceps, triceps, subscapular, suprailiac and abdominal skin-fold thicknesses) and two derived measures (BMI and WHR). Using single locus score tests, 15 common SNPs were found to be significantly associated with “body fatness” measures such as weight, BMI, hip and waist circumferences with P-values ranging from 0.0004 to 0.01. Similar but less significant associations were also observed between these markers and bicondilar upper arm width and upper arm circumference. Most of these significant findings could be explained by a mediating effect of “body fatness”. However, one unique association signal between upper arm width and rs16952517 (P-value = 0.00156) could not be explained by this mediating effect. In addition, using a principle component analysis and conditional association tests adjusted for “body fatness”, two novel association signals were identified between upper arm circumference and rs11075986 (P-value = 0.00211) and rs16945088 (P-value = 0.00203).

Conclusions/Significance

The current study confirmed the association of common variants of FTO gene with “body fatness” measures in an isolated island population. We also observed evidence of pleiotropic effects of FTO gene on fat-free mass, such as frame size and muscle mass assessed by bicondilar upper arm width and upper arm circumference respectively and these pleiotropic effects might be influenced by variants that are different from the ones associated with “body fatness”.  相似文献   

18.
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method—Tango’s statistic—to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios.  相似文献   

19.
20.
Sul JH  Han B  He D  Eskin E 《Genetics》2011,188(1):181-188
The advent of next generation sequencing technologies allows one to discover nearly all rare variants in a genomic region of interest. This technological development increases the need for an effective statistical method for testing the aggregated effect of rare variants in a gene on disease susceptibility. The idea behind this approach is that if a certain gene is involved in a disease, many rare variants within the gene will disrupt the function of the gene and are associated with the disease. In this article, we present the rare variant weighted aggregate statistic (RWAS), a method that groups rare variants and computes a weighted sum of differences between case and control mutation counts. We show that our method outperforms the groupwise association test of Madsen and Browning in the disease-risk model that assumes that each variant makes an equally small contribution to disease risk. In addition, we can incorporate prior information into our method of which variants are likely causal. By using simulated data and real mutation screening data of the susceptibility gene for ataxia telangiectasia, we demonstrate that prior information has a substantial influence on the statistical power of association studies. Our method is publicly available at http://genetics.cs.ucla.edu/rarevariants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号