首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set.

Results

We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6).

Conclusions

Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.  相似文献   

2.
GCTA: a tool for genome-wide complex trait analysis   总被引:7,自引:0,他引:7  
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.  相似文献   

3.
The success of genome-wide association studies (GWAS) to identify risk loci of complex diseases is now well-established. One persistent major hurdle is the cost of those studies, which make them beyond the reach of most research groups. Performing GWAS on pools of DNA samples may be an effective strategy to reduce the costs of these studies. In this study, we performed pooling-based GWAS with more than 550,000 SNPs in two case-control cohorts consisting of patients with Type II diabetes (T2DM) and with chronic rhinosinusitis (CRS). In the T2DM study, the results of the pooling experiment were compared to individual genotypes obtained from a previously published GWAS. TCF7L2 and HHEX SNPs associated with T2DM by the traditional GWAS were among the top ranked SNPs in the pooling experiment. This dataset was also used to refine the best strategy to correctly identify SNPs that will remain significant based on individual genotyping. In the CRS study, the top hits from the pooling-based GWAS located within ten kilobases of known genes were validated by individual genotyping of 1,536 SNPs. Forty-one percent (598 out of the 1,457 SNPs that passed quality control) were associated with CRS at a nominal P value of 0.05, confirming the potential of pooling-based GWAS to identify SNPs that differ in allele frequencies between two groups of subjects. Overall, our results demonstrate that a pooling experiment on high-density genotyping arrays can accurately determine the minor allelic frequency as compared to individual genotyping and produce a list of top ranked SNPs that captures genuine allelic differences between a group of cases and controls. The low cost associated with a pooling-based GWAS clearly justifies its use in screening for genetic determinants of complex diseases. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

4.
《PloS one》2016,11(3)

Background

Data are limited on genome-wide association studies (GWAS) for incident coronary heart disease (CHD). Moreover, it is not known whether genetic variants identified to date also associate with risk of CHD in a prospective setting.

Methods

We performed a two-stage GWAS analysis of incident myocardial infarction (MI) and CHD in a total of 64,297 individuals (including 3898 MI cases, 5465 CHD cases). SNPs that passed an arbitrary threshold of 5×10−6 in Stage I were taken to Stage II for further discovery. Furthermore, in an analysis of prognosis, we studied whether known SNPs from former GWAS were associated with total mortality in individuals who experienced MI during follow-up.

Results

In Stage I 15 loci passed the threshold of 5×10−6; 8 loci for MI and 8 loci for CHD, for which one locus overlapped and none were reported in previous GWAS meta-analyses. We took 60 SNPs representing these 15 loci to Stage II of discovery. Four SNPs near QKI showed nominally significant association with MI (p-value<8.8×10−3) and three exceeded the genome-wide significance threshold when Stage I and Stage II results were combined (top SNP rs6941513: p = 6.2×10−9). Despite excellent power, the 9p21 locus SNP (rs1333049) was only modestly associated with MI (HR = 1.09, p-value = 0.02) and marginally with CHD (HR = 1.06, p-value = 0.08). Among an inception cohort of those who experienced MI during follow-up, the risk allele of rs1333049 was associated with a decreased risk of subsequent mortality (HR = 0.90, p-value = 3.2×10−3).

Conclusions

QKI represents a novel locus that may serve as a predictor of incident CHD in prospective studies. The association of the 9p21 locus both with increased risk of first myocardial infarction and longer survival after MI highlights the importance of study design in investigating genetic determinants of complex disorders.  相似文献   

5.

Background

Independent genome-wide association studies (GWAS) showed an obesogenic effect of two single nucleotide polymorphisms (SNP; rs12970134 and rs17782313) more than 150 kb downstream of the melanocortin 4 receptor gene (MC4R). It is unclear if the SNPs directly influence MC4R function or expression, or if the SNPs are on a haplotype that predisposes to obesity or includes functionally relevant genetic variation (synthetic association). As both exist, functionally relevant mutations and polymorphisms in the MC4R coding region and a robust association downstream of the gene, MC4R is an ideal model to explore synthetic association.

Methodology/Principal Findings

We analyzed a genomic region (364.9 kb) encompassing the MC4R in GWAS data of 424 obesity trios (extremely obese child/adolescent and both parents). SNP rs12970134 showed the lowest p-value (p = 0.004; relative risk for the obesity effect allele: 1.37); conditional analyses on this SNP revealed that 7 of 78 analyzed SNPs provided independent signals (p≤0.05). These 8 SNPs were used to derive two-marker haplotypes. The three best (according to p-value) haplotype combinations were chosen for confirmation in 363 independent obesity trios. The confirmed obesity effect haplotype includes SNPs 3′ and 5′ of the MC4R. Including MC4R coding variants in a joint model had almost no impact on the effect size estimators expected under synthetic association.

Conclusions/Significance

A haplotype reaching from a region 5′ of the MC4R to a region at least 150 kb from the 3′ end of the gene showed a stronger association to obesity than single SNPs. Synthetic association analyses revealed that MC4R coding variants had almost no impact on the association signal. Carriers of the haplotype should be enriched for relevant mutations outside the MC4R coding region and could thus be used for re-sequencing approaches. Our data also underscore the problems underlying the identification of relevant mutations depicted by GWAS derived SNPs.  相似文献   

6.
We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P 相似文献   

7.
Integrating evidence from multiple domains is useful in prioritizing disease candidate genes for subsequent testing. We ranked all known human genes (n = 3819) under linkage peaks in the Irish Study of High-Density Schizophrenia Families using three different evidence domains: 1) a meta-analysis of microarray gene expression results using the Stanley Brain collection, 2) a schizophrenia protein-protein interaction network, and 3) a systematic literature search. Each gene was assigned a domain-specific p-value and ranked after evaluating the evidence within each domain. For comparison to this ranking process, a large-scale candidate gene hypothesis was also tested by including genes with Gene Ontology terms related to neurodevelopment. Subsequently, genotypes of 3725 SNPs in 167 genes from a custom Illumina iSelect array were used to evaluate the top ranked vs. hypothesis selected genes. Seventy-three genes were both highly ranked and involved in neurodevelopment (category 1) while 42 and 52 genes were exclusive to neurodevelopment (category 2) or highly ranked (category 3), respectively. The most significant associations were observed in genes PRKG1, PRKCE, and CNTN4 but no individual SNPs were significant after correction for multiple testing. Comparison of the approaches showed an excess of significant tests using the hypothesis-driven neurodevelopment category. Random selection of similar sized genes from two independent genome-wide association studies (GWAS) of schizophrenia showed the excess was unlikely by chance. In a further meta-analysis of three GWAS datasets, four candidate SNPs reached nominal significance. Although gene ranking using integrated sources of prior information did not enrich for significant results in the current experiment, gene selection using an a priori hypothesis (neurodevelopment) was superior to random selection. As such, further development of gene ranking strategies using more carefully selected sources of information is warranted.  相似文献   

8.
Sugarcane is an economically important crop for both food and biofuel industries. Marker-assisted breeding in sugarcane is becoming a reality with the recent development and deployment of markers linked with disease resistance genes. Large linkage disequilibrium in sugarcane makes genome-wide association studies (GWAS) a better alternative to biparental mapping to identify markers associated with agronomic traits. GWAS was conducted on a Louisiana core collection to identify marker-trait associations (MTA) for 11 cane yield and sucrose traits using single nucleotide polymorphism (SNP) and insertion-deletion (Indel) markers. Significant (P < .05) MTAs were identified for all traits where the top ranked markers explained up to 15% of the total phenotypic variation. High correlations (0.732 to 0.999) were observed between sucrose traits and 56 markers were found consistent across multiple traits. These markers following validation in more diverse populations could be used in marker-assisted selection of clones in sugarcane breeding program in Louisiana and elsewhere.  相似文献   

9.
Primary open angle glaucoma (POAG) is a complex disease and is one of the major leading causes of blindness worldwide. Genome-wide association studies have successfully identified several common variants associated with glaucoma; however, most of these variants only explain a small proportion of the genetic risk. Apart from the standard approach to identify main effects of variants across the genome, it is believed that gene-gene interactions can help elucidate part of the missing heritability by allowing for the test of interactions between genetic variants to mimic the complex nature of biology. To explain the etiology of glaucoma, we first performed a genome-wide association study (GWAS) on glaucoma case-control samples obtained from electronic medical records (EMR) to establish the utility of EMR data in detecting non-spurious and relevant associations; this analysis was aimed at confirming already known associations with glaucoma and validating the EMR derived glaucoma phenotype. Our findings from GWAS suggest consistent evidence of several known associations in POAG. We then performed an interaction analysis for variants found to be marginally associated with glaucoma (SNPs with main effect p-value <0.01) and observed interesting findings in the electronic MEdical Records and GEnomics Network (eMERGE) network dataset. Genes from the top epistatic interactions from eMERGE data (Likelihood Ratio Test i.e. LRT p-value <1e-05) were then tested for replication in the NEIGHBOR consortium dataset. To replicate our findings, we performed a gene-based SNP-SNP interaction analysis in NEIGHBOR and observed significant gene-gene interactions (p-value <0.001) among the top 17 gene-gene models identified in the discovery phase. Variants from gene-gene interaction analysis that we found to be associated with POAG explain 3.5% of additional genetic variance in eMERGE dataset above what is explained by the SNPs in genes that are replicated from previous GWAS studies (which was only 2.1% variance explained in eMERGE dataset); in the NEIGHBOR dataset, adding replicated SNPs from gene-gene interaction analysis explain 3.4% of total variance whereas GWAS SNPs alone explain only 2.8% of variance. Exploring gene-gene interactions may provide additional insights into many complex traits when explored in properly designed and powered association studies.  相似文献   

10.
Genome-wide association studies (GWAS) have revealed many single nucleotide polymorphisms (SNPs) associated with complex traits. Although these studies frequently fail to identify statistically significant associations, the top association signals from GWAS may be enriched for true associations. We therefore investigated the association of alcohol dependence with 43 SNPs selected from association signals in the first two published GWAS of alcoholism. Our analysis of 808 alcohol-dependent cases and 1,248 controls provided evidence of association of alcohol dependence with SNP rs1614972 in the ADH1C gene (unadjusted p = 0.0017). Because the GWAS study that originally reported association of alcohol dependence with this SNP [1] included only men, we also performed analyses in sex-specific strata. The results suggest that this SNP has a similar effect in both sexes (men: OR (95%CI) = 0.80 (0.66, 0.95); women: OR (95%CI) = 0.83 (0.66, 1.03)). We also observed marginal evidence of association of the rs1614972 minor allele with lower alcohol consumption in the non-alcoholic controls (p = 0.081), and independently in the alcohol-dependent cases (p = 0.046). Despite a number of potential differences between the samples investigated by the prior GWAS and the current study, data presented here provide additional support for the association of SNP rs1614972 in ADH1C with alcohol dependence and extend this finding by demonstrating association with consumption levels in both non-alcoholic and alcohol-dependent populations. Further studies should investigate the association of other polymorphisms in this gene with alcohol dependence and related alcohol-use phenotypes.  相似文献   

11.

Background

A recent ovarian cancer genome-wide association study (GWAS) identified a locus on 9p22 associated with reduced ovarian cancer risk. The single nucleotide polymorphism (SNP) markers localize to the BNC2 gene, which has been associated with ovarian development.

Methods

We analyzed the association of 9p22 SNPs with transvaginal ultrasound (TVU) screening results and CA-125 blood levels from participants without ovarian cancer in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO); 1,106 women with adequate ultrasound screening results and available genotyping information were included in the study.

Results

We observed a significantly increased risk of abnormal suspicious TVU results for seven SNPs on 9p22, with odds ratios between 1.68 (95% CI: 1.04–2.72) for rs4961501 and 2.10 (95% CI: 1.31–3.38) for rs12379183. Associations were restricted to abnormal suspicious findings at the first TVU screen. We did not observe an association between 9p22 SNPs and CA-125 levels.

Conclusions

Our findings suggest that 9p22 SNPs, which were found to be associated with decreased risk of ovarian cancer in a recent GWAS, are associated with sonographically detectable ovarian abnormalities. Our results corroborate the relevance of the 9p22 locus for ovarian biology. Further studies are required to understand the complex relationship between screening abnormalities and ovarian carcinogenesis and to evaluate whether this locus can influence the risk stratification of ovarian cancer screening.  相似文献   

12.

Background  

Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS.  相似文献   

13.
The aim of this study was to identify the candidate causal single nucleotide polymorphisms (SNPs) and candidate causal mechanisms that contribute to bone mineral density (BMD) and to generate a SNP to gene to pathway hypothesis using an analytical pathway-based approach. We used hip BMD GWAS data of the genotypes of 301,019 SNPs in 5,715 Europeans. ICSNPathway (identify candidate causal SNPs and pathways) analysis was applied to the BMD GWAS dataset. The first stage involved the pre-selection of candidate causal SNPs by linkage disequilibrium analysis and the functional SNP annotation of the most significant SNPs found. The second stage involved the annotation of biological mechanisms for the pre-selected candidate causal SNPs using improved-gene set enrichment analysis. ICSNPathway analysis identified seven candidate SNPs, eight candidate pathways, and seven hypothetical biological mechanisms. Eight pathways are as follows; gamma-hexachlorocyclohexane degradation (nominal p-value < 0.001, false discovery rate (FDR) <0.001), regulation of the smoothened signaling pathway (nominal p-value < 0.001, FDR = 0.016), TACI and BCMA stimulation of B cell immune response (nominal p-value < 0.001, FDR = 0.021), endonuclease activity (nominal p-value = 0.001, FDR = 0,026), regulation of defense response to virus (nominal p-value = 0.001, FDR = 0.028), serine_type_endopeptidase_inhibitor_activity (nominal p-value = 0.001, FDR = 0.044), endoribonuclease activity (nominal p-value = 0.002, FDR = 0.045), and myeloid leukocyte differentiation (nominal p-value = 0.001, FDR = 0.050). The most significant causal pathway was gamma-hexachlorocyclohexane degradation. CYP3A5, PON2, PON3, CMBL, PON1, ALPL, CYP3A43, CYP3A7, ACP6, ACPP, and ALPI (p < 0.05) are involved in the pathway of gamma-hexachlorocyclohexane degradation. Further examination of the gene contents revealed that DBR1, DICER1, EXO1, FEN1, POP1, POP4, RPP30, and RPP38 were involved in 2 of the 8 pathways (p < 0.05). By applying ICSNPathway analysis to BMD GWAS data, we identified seven candidate SNPs and eight pathways involving gamma-hexachlorocyclohexane degradation, which may contribute to low BMD.  相似文献   

14.
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster‐based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single‐ and multilocus models that can efficiently conduct the association tests on such high‐dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.  相似文献   

15.

Background

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.

Methodology

A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.

Conclusions

Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.  相似文献   

16.
Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low discordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.  相似文献   

17.

Background  

Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs.  相似文献   

18.
Lehne B  Lewis CM  Schlitt T 《PloS one》2011,6(6):e20133
Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes.  相似文献   

19.

Background

Correlations between Educational Attainment (EA) and measures of cognitive performance are as high as 0.8. This makes EA an attractive alternative phenotype for studies wishing to map genes affecting cognition due to the ease of collecting EA data compared to other cognitive phenotypes such as IQ.

Methodology

In an Australian family sample of 9538 individuals we performed a genome-wide association scan (GWAS) using the imputed genotypes of ∼2.4 million single nucleotide polymorphisms (SNP) for a 6-point scale measure of EA. Top hits were checked for replication in an independent sample of 968 individuals. A gene-based test of association was then applied to the GWAS results. Additionally we performed prediction analyses using the GWAS results from our discovery sample to assess the percentage of EA and full scale IQ variance explained by the predicted scores.

Results

The best SNP fell short of having a genome-wide significant p-value (p = 9.77×10−7). In our independent replication sample six SNPs among the top 50 hits pruned for linkage disequilibrium (r2<0.8) had a p-value<0.05 but only one of these SNPs survived correction for multiple testing - rs7106258 (p = 9.7*10−4) located in an intergenic region of chromosome 11q14.1. The gene based test results were non-significant and our prediction analyses show that the predicted scores explained little variance in EA in our replication sample.

Conclusion

While we have identified a polymorphism chromosome 11q14.1 associated with EA, further replication is warranted. Overall, the absence of genome-wide significant p-values in our large discovery sample confirmed the high polygenic architecture of EA. Only the assembly of large samples or meta-analytic efforts will be able to assess the implication of common DNA polymorphisms in the etiology of EA.  相似文献   

20.
Kuo CL  Zaykin DV 《Genetics》2011,189(1):329-340
In recent years, genome-wide association studies (GWAS) have uncovered a large number of susceptibility variants. Nevertheless, GWAS findings provide only tentative evidence of association, and replication studies are required to establish their validity. Due to this uncertainty, researchers often focus on top-ranking SNPs, instead of considering strict significance thresholds to guide replication efforts. The number of SNPs for replication is often determined ad hoc. We show how the rank-based approach can be used for sample size allocation in GWAS as well as for deciding on a number of SNPs for replication. The basis of this approach is the "ranking probability": chances that at least j true associations will rank among top u SNPs, when SNPs are sorted by P-value. By employing simple but accurate approximations for ranking probabilities, we accommodate linkage disequilibrium (LD) and evaluate consequences of ignoring LD. Further, we relate ranking probabilities to the proportion of false discoveries among top u SNPs. A study-specific proportion can be estimated from P-values, and its expected value can be predicted for study design applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号