首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations.  相似文献   

2.
Summary .  To detect association between a genetic marker and a disease in case–control studies, the Cochran–Armitage trend test is typically used. The trend test is locally optimal when the genetic model is correctly specified. However, in practice, the underlying genetic model, and hence the optimal trend test, are usually unknown. In this case, Pearson's chi-squared test, the maximum of three trend test statistics (optimal for the recessive, additive, and dominant models), and the test based on genetic model selection (GMS) are useful. In this article, we first modify the existing GMS method so that it can be used when the risk allele is unknown. Then we propose a new approach by excluding a genetic model that is not supported by the data. Using either the model selection or exclusion, the alternative space is reduced conditional on the observed data, and hence the power to detect a true association can be increased. Simulation results are reported and the proposed methods are applied to the genetic markers identified from the genome-wide association studies conducted by the Wellcome Trust Case–Control Consortium. The results demonstrate that the genetic model exclusion approach usually performs better than existing methods under its worst situation across scientifically plausible genetic models we considered.  相似文献   

3.
In genomic research phenotype transformations are commonly used as a straightforward way to reach normality of the model outcome. Many researchers still believe it to be necessary for proper inference. Using regression simulations, we show that phenotype transformations are typically not needed and, when used in phenotype with heteroscedasticity, result in inflated Type I error rates. We further explain that important is to address a combination of rare variant genotypes and heteroscedasticity. Incorrectly estimated parameter variability or incorrect choice of the distribution of the underlying test statistic provide spurious detection of associations. We conclude that it is a combination of heteroscedasticity, minor allele frequency, sample size, and to a much lesser extent the error distribution, that matter for proper statistical inference.  相似文献   

4.

Background

The timing of associations between common genetic variants and changes in growth patterns over childhood may provide insight into the development of obesity in later life. To address this question, it is important to define appropriate statistical models to allow for the detection of genetic effects influencing longitudinal childhood growth.

Methods and Results

Children from The Western Australian Pregnancy Cohort (Raine; n = 1,506) Study were genotyped at 17 genetic loci shown to be associated with childhood obesity (FTO, MC4R, TMEM18, GNPDA2, KCTD15, NEGR1, BDNF, ETV5, SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3, SH2B1, MRSA) and an obesity-risk-allele-score was calculated as the total number of ‘risk alleles’ possessed by each individual. To determine the statistical method that fits these data and has the ability to detect genetic differences in BMI growth profile, four methods were investigated: linear mixed effects model, linear mixed effects model with skew-t random errors, semi-parametric linear mixed models and a non-linear mixed effects model. Of the four methods, the semi-parametric linear mixed model method was the most efficient for modelling childhood growth to detect modest genetic effects in this cohort. Using this method, three of the 17 loci were significantly associated with BMI intercept or trajectory in females and four in males. Additionally, the obesity-risk-allele score was associated with increased average BMI (female: β = 0.0049, P = 0.0181; male: β = 0.0071, P = 0.0001) and rate of growth (female: β = 0.0012, P = 0.0006; male: β = 0.0008, P = 0.0068) throughout childhood.

Conclusions

Using statistical models appropriate to detect genetic variants, variations in adult obesity genes were associated with childhood growth. There were also differences between males and females. This study provides evidence of genetic effects that may identify individuals early in life that are more likely to rapidly increase their BMI through childhood, which provides some insight into the biology of childhood growth.  相似文献   

5.
The Kallikrein-related peptidase, KLK4, has been shown to be significantly overexpressed in prostate tumours in numerous studies and is suggested to be a potential biomarker for prostate cancer. KLK4 may also play a role in prostate cancer progression through its involvement in epithelial-mesenchymal transition, a more aggressive phenotype, and metastases to bone. It is well known that genetic variation has the potential to affect gene expression and/or various protein characteristics and hence we sought to investigate the possible role of single nucleotide polymorphisms (SNPs) in the KLK4 gene in prostate cancer. Assessment of 61 SNPs in the KLK4 locus (±10 kb) in approximately 1300 prostate cancer cases and 1300 male controls for associations with prostate cancer risk and/or prostate tumour aggressiveness (Gleason score <7 versus ≥7) revealed 7 SNPs to be associated with a decreased risk of prostate cancer at the P(trend)<0.05 significance level. Three of these SNPs, rs268923, rs56112930 and the HapMap tagSNP rs7248321, are located several kb upstream of KLK4; rs1654551 encodes a non-synonymous serine to alanine substitution at position 22 of the long isoform of the KLK4 protein, and the remaining 3 risk-associated SNPs, rs1701927, rs1090649 and rs806019, are located downstream of KLK4 and are in high linkage disequilibrium with each other (r(2)≥0.98). Our findings provide suggestive evidence of a role for genetic variation in the KLK4 locus in prostate cancer predisposition.  相似文献   

6.
Genotype-imputation methods provide an essential technique for high-resolution genome-wide association (GWA) studies with millions of single-nucleotide polymorphisms. For optimal design and interpretation of imputation-based GWA studies, it is important to understand the connection between imputation error and power to detect associations at imputed markers. Here, using a 2 × 3 chi-square test, we describe a relationship between genotype-imputation error rates and the sample-size inflation required for achieving statistical power at an imputed marker equal to that obtained if genotypes at the marker were known with certainty. Surprisingly, typical imputation error rates (∼2%–6%) lead to a large increase in the required sample size (∼10%–60%), and in some African populations whose genotypes are particularly difficult to impute, the required sample-size increase is as high as ∼30%–150%. In most populations, each 1% increase in imputation error leads to an increase of ∼5%–13% in the sample size required for maintaining power. These results imply that in GWA sample-size calculations investigators will need to account for a potentially considerable loss of power from even low levels of imputation error and that development of additional genomic resources that decrease imputation error will translate into substantial reduction in the sample sizes needed for imputation-based detection of the variants that underlie complex human diseases.  相似文献   

7.

Background and Purpose

Rates and extent of recovery after stroke vary considerably between individuals and genetic factors are thought to contribute to post-stroke outcome. Brain-derived neurotrophic factor (BDNF) plays important roles in brain plasticity and repair and has been shown to be involved in stroke severity, recovery, and outcome in animal models. Few clinical studies on BDNF genotypes in relation to ischemic stroke have been performed. The aims of the present study are therefore to investigate whether genetic variation at the BDNF locus is associated with initial stroke severity, recovery and/or short-term and long-term functional outcome after ischemic stroke.

Methods

Four BDNF tagSNPs were analyzed in the Sahlgrenska Academy Study on Ischemic Stroke (SAHLSIS; 600 patients and 600 controls, all aged 18–70 years). Stroke severity was assessed using the NIH Stroke Scale (NIHSS). Stroke recovery was defined as the change in NIHSS over a 3-month period. Short- and long-term functional outcome post-stroke was assessed using the modified Rankin Scale at 3 months and at 2 and 7 years after stroke, respectively.

Results

No SNP was associated with stroke severity or recovery at 3 months and no SNP had an impact on short-term outcome. However, rs11030119 was independently associated with poor functional outcome 7-years after stroke (OR 0.66, 95% CI 0.46–0.92; P =  0.006).

Conclusions

BDNF gene variants were not major contributors to ischemic stroke severity, recovery, or short-term functional outcome. However, this study suggests that variants in the BDNF gene may contribute to poor long-term functional outcome after ischemic stroke.  相似文献   

8.
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.  相似文献   

9.
Intracerebral hemorrhage (ICH) is the stroke subtype with the worst prognosis and has no established acute treatment. ICH is classified as lobar or nonlobar based on the location of ruptured blood vessels within the brain. These different locations also signal different underlying vascular pathologies. Heritability estimates indicate a substantial genetic contribution to risk of ICH in both locations. We report a genome-wide association study of this condition that meta-analyzed data from six studies that enrolled individuals of European ancestry. Case subjects were ascertained by neurologists blinded to genotype data and classified as lobar or nonlobar based on brain computed tomography. ICH-free control subjects were sampled from ambulatory clinics or random digit dialing. Replication of signals identified in the discovery cohort with p < 1 × 10−6 was pursued in an independent multiethnic sample utilizing both direct and genome-wide genotyping. The discovery phase included a case cohort of 1,545 individuals (664 lobar and 881 nonlobar cases) and a control cohort of 1,481 individuals and identified two susceptibility loci: for lobar ICH, chromosomal region 12q21.1 (rs11179580, odds ratio [OR] = 1.56, p = 7.0 × 10−8); and for nonlobar ICH, chromosomal region 1q22 (rs2984613, OR = 1.44, p = 1.6 × 10−8). The replication included a case cohort of 1,681 individuals (484 lobar and 1,194 nonlobar cases) and a control cohort of 2,261 individuals and corroborated the association for 1q22 (p = 6.5 × 10−4; meta-analysis p = 2.2 × 10−10) but not for 12q21.1 (p = 0.55; meta-analysis p = 2.6 × 10−5). These results demonstrate biological heterogeneity across ICH subtypes and highlight the importance of ascertaining ICH cases accordingly.  相似文献   

10.
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.  相似文献   

11.
The availability of high-density single nucleotide polymorphisms (SNPs) data has made the human genetic association studies possible to identify common and rare variants underlying complex diseases in a genome-wide scale. A handful of novel genetic variants have been identified, which gives much hope and prospects for the future of genetic association studies. In this process, statistical and computational methods play key roles, among which information-based association tests have gained large popularity. This paper is intended to give a comprehensive review of the current literature in genetic association analysis casted in the framework of information theory. We focus our review on the following topics: (1) information theoretic approaches in genetic linkage and association studies; (2) entropy-based strategies for optimal SNP subset selection; and (3) the usage of theoretic information criteria in gene clustering and gene regulatory network construction.  相似文献   

12.
The thoracic-to-hip circumference ratio (THR) is an anthropometric marker recently described as a predictor of type 2 diabetes. In this study, we performed a genome-wide association study (GWAS) followed by confirmatory analyses to identify genetic markers associated with THR. A total of 7,240 Korean subjects (4,988 for the discovery stage and 2,252 for the confirmatory analyses) were recruited for this study, and genome-wide single nucleotide polymorphism (SNP) genotyping of the initial 4,988 individuals was performed using Affymetrix Human SNP array 5.0. Linear regression analysis was then performed to adjust for the effects of age, sex, and current diabetes medication status on the THR of the study subjects. In the initial discovery stage, there was a statistically nominal association between minor alleles of SNP markers on chromosomes 4, 8, 10, and 12, and THR changes (p < 5.0 × 10−6). The subsequent confirmatory analyses of these markers, however, only detected a significant association between two SNPs in the HECTD4 gene and decreased THRs. Notably, this association was detected in male (rs11066280: p = 1.14 × 10−2; rs2074356: p = 1.10 × 10−2), but not in female subjects. Meanwhile, the combined results from the two analyses (initial and confirmatory) indicated that minor alleles of these two intronic variants exhibited a significant genome-wide association with decreased THR in the male subjects (n = 3,155; rs11066280: effect size = −0.008624, p = 6.19 × 10−9; rs2074356: effect size = −0.008762, p = 1.89 × 10−8). Furthermore, minor alleles of these two SNPs exhibited protective effects on patients’ risks for developing type 2 diabetes. In conclusion, we have identified two genetic variations in HECTD4 that are associated with THR, particularly in men.  相似文献   

13.
We consider the feasibility of reusing existing control data obtained in genetic association studies in order to reduce costs for new studies. We discuss controlling for the population differences between cases and controls that are implicit in studies utilizing external control data. We give theoretical calculations of the statistical power of a test due to Bourgain et al (Am J Human Genet 2003), applied to the problem of dealing with case-control differences in genetic ancestry related to population isolation or population admixture. Theoretical results show that there may exist bounds for the non-centrality parameter for a test of association that places limits on study power even if sample sizes can grow arbitrarily large. We apply this method to data from a multi-center, geographically-diverse, genome-wide association study of breast cancer in African-American women. Our analysis of these data shows that admixture proportions differ by center with the average fraction of European admixture ranging from approximately 20% for participants from study sites in the Eastern United States to 25% for participants from West Coast sites. However, these differences in average admixture fraction between sites are largely counterbalanced by considerable diversity in individual admixture proportion within each study site. Our results suggest that statistical correction for admixture differences is feasible for future studies of African-Americans, utilizing the existing controls from the African-American Breast Cancer study, even if case ascertainment for the future studies is not balanced over the same centers or regions that supplied the controls for the current study.  相似文献   

14.
15.
Electronic medical records (EMRs) are being widely implemented for use in genetic and genomic studies. As a phenotypic rich resource, EMRs provide researchers with the opportunity to identify disease cohorts and perform genotype-phenotype association studies. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study, has genotyped more than 15,000 individuals of diverse genetic ancestry in BioVU, the Vanderbilt University Medical Center’s biorepository linked to a de-identified version of the EMR (EAGLE BioVU). Here we develop and deploy an algorithm utilizing data mining techniques to identify primary open-angle glaucoma (POAG) in African Americans from EAGLE BioVU for genetic association studies. The algorithm described here was designed using a combination of diagnostic codes, current procedural terminology billing codes, and free text searches to identify POAG status in situations where gold-standard digital photography cannot be accessed. The case algorithm identified 267 potential POAG subjects but underperformed after manual review with a positive predictive value of 51.6% and an accuracy of 76.3%. The control algorithm identified controls with a negative predictive value of 98.3%. Although the case algorithm requires more downstream manual review for use in large-scale studies, it provides a basis by which to extract a specific clinical subtype of glaucoma from EMRs in the absence of digital photographs.  相似文献   

16.
《PloS one》2014,9(5)
Endometrial cancer (EC) contributes substantially to total burden of cancer morbidity and mortality in the United States. Family history is a known risk factor for EC, thus genetic factors may play a role in EC pathogenesis. Three previous genome-wide association studies (GWAS) have found only one locus associated with EC, suggesting that common variants with large effects may not contribute greatly to EC risk. Alternatively, we hypothesize that rare variants may contribute to EC risk. We conducted an exome-wide association study (EXWAS) of EC using the Infinium HumanExome BeadChip in order to identify rare variants associated with EC risk. We successfully genotyped 177,139 variants in a multiethnic population of 1,055 cases and 1,778 controls from four studies that were part of the Epidemiology of Endometrial Cancer Consortium (E2C2). No variants reached global significance in the study, suggesting that more power is needed to detect modest associations between rare genetic variants and risk of EC.  相似文献   

17.

Background

Existing studies indicate a significant genetic component for sudden cardiac arrest (SCA) and genome-wide association studies (GWAS) provide an unbiased approach for identification of novel genes. We performed a GWAS to identify genetic determinants of SCA.

Methodology/Principal Findings

We used a case-control design within the ongoing Oregon Sudden Unexpected Death Study (Oregon-SUDS). Cases (n = 424) were SCAs with coronary artery disease (CAD) among residents of Portland, OR (2002–07, population ∼1,000,000) and controls (n = 226) were residents with CAD, but no history of SCA. All subjects were of White-European ancestry and GWAS was performed using Affymetrix 500K/5.0 and 6.0 arrays. High signal markers were genotyped in SCA cases (n = 521) identified from the Atherosclerosis Risk in Communities Study (ARIC) and the Cardiovascular Health Study (CHS) (combined n = 19,611). No SNPs reached genome-wide significance (p<5×10−8). SNPs at 6 loci were prioritized for follow-up primarily based on significance of p<10−4 and proximity to a known gene (CSMD2, GPR37L1, LIN9, B4GALNT3, GPC5, and ZNF592). The minor allele of GPC5 (GLYPICAN 5, rs3864180) was associated with a lower risk of SCA in Oregon-SUDS, an effect that was also observed in ARIC/CHS whites (p<0.05) and blacks (p<0.04). In a combined Cox proportional hazards model analysis that adjusted for race, the minor allele exhibited a hazard ratio of 0.85 (95% CI 0.74 to 0.98; p<0.01).

Conclusions/Significance

A novel genetic locus for SCA, GPC5, was identified from Oregon-SUDS and successfully validated in the ARIC and CHS cohorts. Three other members of the Glypican family have been previously implicated in human disease, including cardiac conditions. The mechanism of this specific association requires further study.  相似文献   

18.
Genetic association studies routinely involve massive numbers of statistical tests accompanied by P-values. Whole genome sequencing technologies increased the potential number of tested variants to tens of millions. The more tests are performed, the smaller P-value is required to be deemed significant. However, a small P-value is not equivalent to small chances of a spurious finding and significance thresholds may fail to serve as efficient filters against false results. While the Bayesian approach can provide a direct assessment of the probability that a finding is spurious, its adoption in association studies has been slow, due in part to the ubiquity of P-values and the automated way they are, as a rule, produced by software packages. Attempts to design simple ways to convert an association P-value into the probability that a finding is spurious have been met with difficulties. The False Positive Report Probability (FPRP) method has gained increasing popularity. However, FPRP is not designed to estimate the probability for a particular finding, because it is defined for an entire region of hypothetical findings with P-values at least as small as the one observed for that finding. Here we propose a method that lets researchers extract probability that a finding is spurious directly from a P-value. Considering the counterpart of that probability, we term this method POFIG: the Probability that a Finding is Genuine. Our approach shares FPRP''s simplicity, but gives a valid probability that a finding is spurious given a P-value. In addition to straightforward interpretation, POFIG has desirable statistical properties. The POFIG average across a set of tentative associations provides an estimated proportion of false discoveries in that set. POFIGs are easily combined across studies and are immune to multiple testing and selection bias. We illustrate an application of POFIG method via analysis of GWAS associations with Crohn''s disease.  相似文献   

19.
20.
There is strong evidence that rare variants are involved in complex disease etiology. The first step in implicating rare variants in disease etiology is their identification through sequencing in both randomly ascertained samples (e.g., the 1,000 Genomes Project) and samples ascertained according to disease status. We investigated to what extent rare variants will be observed across the genome and in candidate genes in randomly ascertained samples, the magnitude of variant enrichment in diseased individuals, and biases that can occur due to how variants are discovered. Although sequencing cases can enrich for casual variants, when a gene or genes are not involved in disease etiology, limiting variant discovery to cases can lead to association studies with dramatically inflated false positive rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号