首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Murphy A  Weiss ST  Lange C 《PLoS genetics》2008,4(9):e1000197
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.  相似文献   

2.
The power of genome-wide SNP association studies is limited, among others, by the large number of false positive test results. To provide a remedy, we combined SNP association analysis with the pathway-driven gene set enrichment analysis (GSEA), recently developed to facilitate handling of genome-wide gene expression data. The resulting GSEA-SNP method rests on the assumption that SNPs underlying a disease phenotype are enriched in genes constituting a signaling pathway or those with a common regulation. Besides improving power for association mapping, GSEA-SNP may facilitate the identification of disease-associated SNPs and pathways, as well as the understanding of the underlying biological mechanisms. GSEA-SNP may also help to identify markers with weak effects, undetectable in association studies without pathway consideration. The program is freely available and can be downloaded from our website.  相似文献   

3.
MOTIVATION: Using simulation studies for quantitative trait loci (QTL), we evaluate the prediction quality of regression models that include as covariates single-nucleotide polymorphism (SNP) genetic markers which did not achieve genome-wide significance in the original genome-wide association study, but were among the SNPs with the smallest P-value for the selected association test. We compare the results of such regression models to the standard approach which is to include only SNPs that achieve genome-wide significance. Using mean square prediction error as the model metric, our simulation results suggest that by using the coefficient of determination (R(2)) value as a guideline to increase or reduce the number of SNPs included in the regression model, we can achieve better prediction quality than the standard approach. However, important parameters such as trait heritability, the approximate number of QTLs, etc. have to be determined from previous studies or have to be estimated accurately.  相似文献   

4.
Sha Q  Zhang Z  Zhang S 《PloS one》2011,6(7):e21957
In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683-691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209-213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification.  相似文献   

5.
《PloS one》2012,7(12)
A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers.  相似文献   

6.
R Abo  GD Jenkins  L Wang  BL Fridley 《PloS one》2012,7(8):e43301
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.  相似文献   

7.
For genetic association studies with multiple phenotypes, we propose a new strategy for multiple testing with family-based association tests (FBATs). The strategy increases the power by both using all available family data and reducing the number of hypotheses tested while being robust against population admixture and stratification. By use of conditional power calculations, the approach screens all possible null hypotheses without biasing the nominal significance level, and it identifies the subset of phenotypes that has optimal power when tested for association by either univariate or multivariate FBATs. An application of our strategy to an asthma study shows the practical relevance of the proposed methodology. In simulation studies, we compare our testing strategy with standard methodology for family studies. Furthermore, the proposed principle of using all data without biasing the nominal significance in an analysis prior to the computation of the test statistic has broad and powerful applications in many areas of family-based association studies.  相似文献   

8.
Technological developments allow increasing numbers of markers to be deployed in case-control studies searching for genetic factors that influence disease susceptibility. However, with vast numbers of markers, true 'hits' may become lost in a sea of false positives. This problem may be particularly acute for infectious diseases, where the control group may contain unexposed individuals with susceptible genotypes. To explore this effect, we used a series of stochastic simulations to model a scenario based loosely on bovine tuberculosis. We find that a candidate gene approach tends to have greater statistical power than studies that use large numbers of single nucleotide polymorphisms (SNPs) in genome-wide association tests, almost regardless of the number of SNPs deployed. Both approaches struggle to detect genetic effects when these are either weak or if an appreciable proportion of individuals are unexposed to the disease when modest sample sizes (250 each of cases and controls) are used, but these issues are largely mitigated if sample sizes can be increased to 2000 or more of each class. We conclude that the power of any genotype-phenotype association test will be improved if the sampling strategy takes account of exposure heterogeneity, though this is not necessarily easy to do.  相似文献   

9.
The availability of a large number of dense SNPs, high-throughput genotyping and computation methods promotes the application of family-based association tests. While most of the current family-based analyses focus only on individual traits, joint analyses of correlated traits can extract more information and potentially improve the statistical power. However, current TDT-based methods are low-powered. Here, we develop a method for tests of association for bivariate quantitative traits in families. In particular, we correct for population stratification by the use of an integration of principal component analysis and TDT. A score test statistic in the variance-components model is proposed. Extensive simulation studies indicate that the proposed method not only outperforms approaches limited to individual traits when pleiotropic effect is present, but also surpasses the power of two popular bivariate association tests termed FBAT-GEE and FBAT-PC, respectively, while correcting for population stratification. When applied to the GAW16 datasets, the proposed method successfully identifies at the genome-wide level the two SNPs that present pleiotropic effects to HDL and TG traits.  相似文献   

10.
We carried out a genome-wide association study (GWAS) for general cognitive ability (GCA) plus three other analyses of GWAS data that aggregate the effects of multiple single-nucleotide polymorphisms (SNPs) in various ways. Our multigenerational sample comprised 7,100 Caucasian participants, drawn from two longitudinal family studies, who had been assessed with an age-appropriate IQ test and had provided DNA samples passing quality screens. We conducted the GWAS across ∼2.5 million SNPs (both typed and imputed), using a generalized least-squares method appropriate for the different family structures present in our sample, and subsequently conducted gene-based association tests. We also conducted polygenic prediction analyses under five-fold cross-validation, using two different schemes of weighting SNPs. Using parametric bootstrapping, we assessed the performance of this prediction procedure under the null. Finally, we estimated the proportion of variance attributable to all genotyped SNPs as random effects with software GCTA. The study is limited chiefly by its power to detect realistic single-SNP or single-gene effects, none of which reached genome-wide significance, though some genomic inflation was evident from the GWAS. Unit SNP weights performed about as well as least-squares regression weights under cross-validation, but the performance of both increased as more SNPs were included in calculating the polygenic score. Estimates from GCTA were 35% of phenotypic variance at the recommended biological-relatedness ceiling. Taken together, our results concur with other recent studies: they support a substantial heritability of GCA, arising from a very large number of causal SNPs, each of very small effect. We place our study in the context of the literature–both contemporary and historical–and provide accessible explication of our statistical methods.  相似文献   

11.
Cancer patients show large individual variation in their response to chemotherapeutic agents. Gemcitabine (dFdC) and AraC, two cytidine analogues, have shown significant activity against a variety of tumors. We previously used expression data from a lymphoblastoid cell line-based model system to identify genes that might be important for the two drug cytotoxicity. In the present study, we used that same model system to perform a genome-wide association (GWA) study to test the hypothesis that common genetic variation might influence both gene expression and response to the two drugs. Specifically, genome-wide single nucleotide polymorphisms (SNPs) and mRNA expression data were obtained using the Illumina 550K® HumanHap550 SNP Chip and Affymetrix U133 Plus 2.0 GeneChip, respectively, for 174 ethnically-defined “Human Variation Panel” lymphoblastoid cell lines. Gemcitabine and AraC cytotoxicity assays were performed to obtain IC50 values for the cell lines. We then performed GWA studies with SNPs, gene expression and IC50 of these two drugs. This approach identified SNPs that were associated with gemcitabine or AraC IC50 values and with the expression regulation for 29 genes or 30 genes, respectively. One SNP in IQGAP2 (rs3797418) was significantly associated with variation in both the expression of multiple genes and gemcitabine and AraC IC50. A second SNP in TGM3 (rs6082527) was also significantly associated with multiple gene expression and gemcitabine IC50. To confirm the association results, we performed siRNA knock down of selected genes with expression that was associated with rs3797418 and rs6082527 in tumor cell and the knock down altered gemcitabine or AraC sensitivity, confirming our association study results. These results suggest that the application of GWA approaches using cell-based model systems, when combined with complementary functional validation, can provide insights into mechanisms responsible for variation in cytidine analogue response.  相似文献   

12.
Ma L  Han S  Yang J  Da Y 《PloS one》2010,5(11):e15006
Complex diseases or phenotypes may involve multiple genetic variants and interactions between genetic, environmental and other factors. Current genome-wide association studies (GWAS) mostly used single-locus analysis and had identified genetic effects with multiple confirmations. Such confirmed single-nucleotide polymorphism (SNP) effects were likely to be true genetic effects and ignoring this information in testing new effects of the same phenotype results in decreased statistical power due to increased residual variance that has a component of the omitted effects. In this study, a multi-locus association test (MLT) was proposed for GWAS analysis conditional on SNPs with confirmed effects to improve statistical power. Analytical formulae for statistical power were derived and were verified by simulation for MLT accounting for confirmed SNPs and for single-locus test (SLT) without accounting for confirmed SNPs. Statistical power of the two methods was compared by case studies with simulated and the Framingham Heart Study (FHS) GWAS data. Results showed that the MLT method had increased statistical power over SLT. In the GWAS case study on four cholesterol phenotypes and serum metabolites, the MLT method improved statistical power by 5% to 38% depending on the number and effect sizes of the conditional SNPs. For the analysis of HDL cholesterol (HDL-C) and total cholesterol (TC) of the FHS data, the MLT method conditional on confirmed SNPs from GWAS catalog and NCBI had considerably more significant results than SLT.  相似文献   

13.
A great majority of genetic markers discovered in recent genome-wide association studies have small effect sizes, and they explain only a small fraction of the genetic contribution to the diseases. How many more variants can we expect to discover and what study sizes are needed? We derive the connection between the cumulative risk of the SNP variants to the latent genetic risk model and heritability of the disease. We determine the sample size required for case-control studies in order to achieve a certain expected number of discoveries in a collection of most significant SNPs. Assuming similar allele frequencies and effect sizes of the currently validated SNPs, complex phenotypes such as type-2 diabetes would need approximately 800 variants to explain its 40% heritability. Much smaller numbers of variants are needed if we assume rare-variants but higher penetrance models. We estimate that up to 50,000 cases and an equal number of controls are needed to discover 800 common low-penetrant variants among the top 5000 SNPs. Under common and rare low-penetrance models, the very large studies required to discover the numerous variants are probably at the limit of practical feasibility. Under rare-variant with medium- to high-penetrance models (odds-ratios between 1.6 and 4.0), studies comparable in size to many existing studies are adequate provided the genotyping technology can interrogate more and rarer variants.  相似文献   

14.
《PloS one》2015,10(6)
Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today''s GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today''s GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.  相似文献   

15.
In spite of the success of genome-wide association studies (GWASs), only a small proportion of heritability for each complex trait has been explained by identified genetic variants, mainly SNPs. Likely reasons include genetic heterogeneity (i.e., multiple causal genetic variants) and small effect sizes of causal variants, for which pathway analysis has been proposed as a promising alternative to the standard single-SNP-based analysis. A pathway contains a set of functionally related genes, each of which includes multiple SNPs. Here we propose a pathway-based test that is adaptive at both the gene and SNP levels, thus maintaining high power across a wide range of situations with varying numbers of the genes and SNPs associated with a trait. The proposed method is applicable to both common variants and rare variants and can incorporate biological knowledge on SNPs and genes to boost statistical power. We use extensively simulated data and a WTCCC GWAS dataset to compare our proposal with several existing pathway-based and SNP-set-based tests, demonstrating its promising performance and its potential use in practice.  相似文献   

16.
We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P 相似文献   

17.
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.  相似文献   

18.
For genomewide association (GWA) studies in family-based designs, we propose a novel two-stage strategy that weighs the association P values with the use of independently estimated weights. The association information contained in the family sample is partitioned into two orthogonal components--namely, the between-family information and the within-family information. The between-family component is used in the first (i.e., screening) stage to obtain a relative ranking of all the markers. The within-family component is used in the second (i.e., testing) stage in the framework of the standard family-based association test, and the resulting P values are weighted using the estimated marker ranking from the screening step. The approach is appealing, in that it ensures that all the markers are tested in the testing step and, at the same time, also uses information from the screening step. Through simulation studies, we show that testing all the markers is more powerful than testing only the most promising ones from the screening step, which was the method suggested by Van Steen et al. A comparison with a population-based approach shows that the approach achieves comparable power. In the presence of a reasonable level of population stratification, our approach is only slightly affected in terms of power and, since it is a family-based method, is completely robust to spurious effects. An application to a 100K scan in the Framingham Heart Study illustrates the practical advantages of our approach. The proposed method is of general applicability; it extends to any setting in which prior, independent ranking of hypotheses is available.  相似文献   

19.
We performed linkage and linkage disequilibrium (LD) mapping analyses to compare the power between microsatellite and single nucleotide polymorphism (SNP) markers. Chromosome-wide analyses were performed for a quantitative electrophysiological phenotype, ttth1, on chromosome 7. Multipoint analysis of microsatellite markers using the variance component (VC) method showed the highest LOD score of 4.20 at 162 cM, near D7S509 (163.7 cM). Two-point analysis of SNPs using the VC method yielded the highest LOD score of 3.98 in the Illumina SNP data and 3.45 in the Affymetrix SNP data around 152-153 cM. In family-based single SNP and SNP haplotype LD analysis, we identified seven SNPs associated with ttth1. We searched for any potential candidate genes in the location of the seven SNPs. The SNPs rs1476640 and rs768055 are located in the FLJ40852 gene (a hypothetical protein), and SNP rs1859646 is located in the TAS2R5 gene (a taste receptor). The other four SNPs are not located in any known or annotated genes. We found the high density SNP scan to be superior to microsatellites because it is effective in downstream fine mapping due to a better defined linkage region. Our study proves the utility of high density SNP in genome-wide mapping studies.  相似文献   

20.
Endurance training-induced changes in hemodynamic traits are heritable. However, few genes associated with heart rate training responses have been identified. The purpose of our study was to perform a genome-wide association study to uncover DNA sequence variants associated with submaximal exercise heart rate training responses in the HERITAGE Family Study. Heart rate was measured during steady-state exercise at 50 W (HR50) on 2 separate days before and after a 20-wk endurance training program in 483 white subjects from 99 families. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. After quality control procedures, 320,000 single-nucleotide polymorphisms (SNPs) were available for the genome-wide association study analyses, which were performed using the MERLIN software package (single-SNP analyses and conditional heritability tests) and standard regression models (multivariate analyses). The strongest associations for HR50 training response adjusted for age, sex, body mass index, and baseline HR50 were detected with SNPs at the YWHAQ locus on chromosome 2p25 (P = 8.1 × 10(-7)), the RBPMS locus on chromosome 8p12 (P = 3.8 × 10(-6)), and the CREB1 locus on chromosome 2q34 (P = 1.6 × 10(-5)). In addition, 37 other SNPs showed P values <9.9 × 10(-5). After removal of redundant SNPs, the 10 most significant SNPs explained 35.9% of the ΔHR50 variance in a multivariate regression model. Conditional heritability tests showed that nine of these SNPs (all intragenic) accounted for 100% of the ΔHR50 heritability. Our results indicate that SNPs in nine genes related to cardiomyocyte and neuronal functions, as well as cardiac memory formation, fully account for the heritability of the submaximal heart rate training response.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号