首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The detrimental effects of the winner’s curse, including overestimation of the genetic effects of associated variants and underestimation of sufficient sample sizes for replication studies are well-recognized in genome-wide association studies (GWAS). These effects can be expected to worsen as the field moves from GWAS into whole genome sequencing. To date, few studies have reported statistical adjustments to the naive estimates, due to the lack of suitable statistical methods and computational tools. We have developed an efficient genome-wide non-parametric method that explicitly accounts for the threshold, ranking, and allele frequency effects in whole genome scans. Here, we implement the method to provide bias-reduced estimates via bootstrap re-sampling (BR-squared) for association studies of both disease status and quantitative traits, and we report the results of applying BR-squared to GWAS of psoriasis and HbA1c. We observed over 50% reduction in the genetic effect size estimation for many associated SNPs. This translates into a greater than fourfold increase in sample size requirements for successful replication studies, which in part explains some of the apparent failures in replicating the original signals. Our analysis suggests that adjusting for the winner’s curse is critical for interpreting findings from whole genome scans and planning replication and meta-GWAS studies, as well as in attempts to translate findings into the clinical setting.  相似文献   

2.
So HC  Yip BH  Sham PC 《PloS one》2010,5(11):e13898
Recently genome-wide association studies (GWAS) have identified numerous susceptibility variants for complex diseases. In this study we proposed several approaches to estimate the total number of variants underlying these diseases. We assume that the variance explained by genetic markers (Vg) follow an exponential distribution, which is justified by previous studies on theories of adaptation. Our aim is to fit the observed distribution of Vg from GWAS to its theoretical distribution. The number of variants is obtained by the heritability divided by the estimated mean of the exponential distribution. In practice, due to limited sample sizes, there is insufficient power to detect variants with small effects. Therefore the power was taken into account in fitting. Besides considering the most significant variants, we also tried to relax the significance threshold, allowing more markers to be fitted. The effects of false positive variants were removed by considering the local false discovery rates. In addition, we developed an alternative approach by directly fitting the z-statistics from GWAS to its theoretical distribution. In all cases, the "winner's curse" effect was corrected analytically. Confidence intervals were also derived. Simulations were performed to compare and verify the performance of different estimators (which incorporates various means of winner's curse correction) and the coverage of the proposed analytic confidence intervals. Our methodology only requires summary statistics and is able to handle both binary and continuous traits. Finally we applied the methods to a few real disease examples (lipid traits, type 2 diabetes and Crohn's disease) and estimated that hundreds to nearly a thousand variants underlie these traits.  相似文献   

3.
Kuo CL  Zaykin DV 《Genetics》2011,189(1):329-340
In recent years, genome-wide association studies (GWAS) have uncovered a large number of susceptibility variants. Nevertheless, GWAS findings provide only tentative evidence of association, and replication studies are required to establish their validity. Due to this uncertainty, researchers often focus on top-ranking SNPs, instead of considering strict significance thresholds to guide replication efforts. The number of SNPs for replication is often determined ad hoc. We show how the rank-based approach can be used for sample size allocation in GWAS as well as for deciding on a number of SNPs for replication. The basis of this approach is the "ranking probability": chances that at least j true associations will rank among top u SNPs, when SNPs are sorted by P-value. By employing simple but accurate approximations for ranking probabilities, we accommodate linkage disequilibrium (LD) and evaluate consequences of ignoring LD. Further, we relate ranking probabilities to the proportion of false discoveries among top u SNPs. A study-specific proportion can be estimated from P-values, and its expected value can be predicted for study design applications.  相似文献   

4.
Li MX  Sham PC  Cherny SS  Song YQ 《PloS one》2010,5(12):e14480

Background

We are moving to second-wave analysis of genome-wide association studies (GWAS), characterized by comprehensive bioinformatical and statistical evaluation of genetic associations. Existing biological knowledge is very valuable for GWAS, which may help improve their detection power particularly for disease susceptibility loci of moderate effect size. However, a challenging question is how to utilize available resources that are very heterogeneous to quantitatively evaluate the statistic significances.

Methodology/Principal Findings

We present a novel knowledge-based weighting framework to boost power of the GWAS and insightfully strengthen their explorative performance for follow-up replication and deep sequencing. Built upon diverse integrated biological knowledge, this framework directly models both the prior functional information and the association significances emerging from GWAS to optimally highlight single nucleotide polymorphisms (SNPs) for subsequent replication. In the theoretical calculation and computer simulation, it shows great potential to achieve extra over 15% power to identify an association signal of moderate strength or to use hundreds of whole-genome subjects fewer to approach similar power. In a case study on late-onset Alzheimer disease (LOAD) for a proof of principle, it highlighted some genes, which showed positive association with LOAD in previous independent studies, and two important LOAD related pathways. These genes and pathways could be originally ignored due to involved SNPs only having moderate association significance.

Conclusions/Significance

With user-friendly implementation in an open-source Java package, this powerful framework will provide an important complementary solution to identify more true susceptibility loci with modest or even small effect size in current GWAS for complex diseases.  相似文献   

5.
BackgroundGenome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other “-omics” and interaction data.Scope of review1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other “-omics” and interaction data.Major conclusionsTo choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other “-omics” data and interaction can better explain gene functions.General significancePathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.  相似文献   

6.
Measurement error of a phenotypic trait reduces the power to detect genetic associations. We examined the impact of sample size, allele frequency and effect size in presence of measurement error for quantitative traits. The statistical power to detect genetic association with phenotype mean and variability was investigated analytically. The non-centrality parameter for a non-central F distribution was derived and verified using computer simulations. We obtained equivalent formulas for the cost of phenotype measurement error. Effects of differences in measurements were examined in a genome-wide association study (GWAS) of two grading scales for cataract and a replication study of genetic variants influencing blood pressure. The mean absolute difference between the analytic power and simulation power for comparison of phenotypic means and variances was less than 0.005, and the absolute difference did not exceed 0.02. To maintain the same power, a one standard deviation (SD) in measurement error of a standard normal distributed trait required a one-fold increase in sample size for comparison of means, and a three-fold increase in sample size for comparison of variances. GWAS results revealed almost no overlap in the significant SNPs (p<10−5) for the two cataract grading scales while replication results in genetic variants of blood pressure displayed no significant differences between averaged blood pressure measurements and single blood pressure measurements. We have developed a framework for researchers to quantify power in the presence of measurement error, which will be applicable to studies of phenotypes in which the measurement is highly variable.  相似文献   

7.

Background

Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.

Results

In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.

Conclusions

The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: http://bioinformatics.ust.hk/SRPI.rar.
  相似文献   

8.
IntroductionStroke is a multifactorial and heterogeneous disorder, correlates with heritability and considered as one of the major diseases. The prior reports performed the variable models such as genome-wide association studies (GWAS), replication, case-control, cross-sectional and meta-analysis studies and still, we lack diagnostic marker in the global world. There are limited studies were carried out in Saudi population, and we aim to investigate the molecular association of single nucleotide polymorphisms (SNPs) identified through GWAS and meta-analysis studies in stroke patients in the Saudi population.MethodsIn this case-control study, we have opted gender equality of 207 cases and 207 controls from the capital city of Saudi Arabia in King Saud University Hospital. The peripheral blood (5 ml) sample will be collected in two different vacutainers, and three mL of the coagulated blood will be used for lipid analysis (biochemical tests) and two mL will be used for DNA analysis (molecular tests). Genomic DNA will be extracted with the collected blood samples, and specific primers will be designed for the opted SNPs (SORT1-rs646218 and OLR1-rs11053646 polymorphisms) and PCR-RFLP will be performed and randomly DNA sequencing will be carried out to cross check the results.ResultsThe rs646218 and rs11053646 polymorphisms were significantly associated with allele, genotype and dominant models with and without crude odds ratios (OR’s) and Multiple logistic regression analysis (p < 0.05). Correlation between lipid profile and genotypes has confirmed the significant relation between triglycerides and rs646218 and rs1105364 6polymorphisms. However, rs11053646 polymorphism was correlated with HDLC (p = 0.04). Genotypes were examined in both males' vs. males and females' vs. females in cases and control and we concluded that in rs11053646 polymorphisms with male subjects compared between cases and controls found to be associated with dominant model heterozygote genotypes (p < 0.05).ConclusionThe results of the current study confirmed the SORT1 and OLR1 SNPs were associated in the Saudi population. The current results were in the association with the prior study results documented through GWAS and meta-analysis association. However, other ethnic population studies should be performed to rule out in the human hereditary diseases.  相似文献   

9.
Ma L  Han S  Yang J  Da Y 《PloS one》2010,5(11):e15006
Complex diseases or phenotypes may involve multiple genetic variants and interactions between genetic, environmental and other factors. Current genome-wide association studies (GWAS) mostly used single-locus analysis and had identified genetic effects with multiple confirmations. Such confirmed single-nucleotide polymorphism (SNP) effects were likely to be true genetic effects and ignoring this information in testing new effects of the same phenotype results in decreased statistical power due to increased residual variance that has a component of the omitted effects. In this study, a multi-locus association test (MLT) was proposed for GWAS analysis conditional on SNPs with confirmed effects to improve statistical power. Analytical formulae for statistical power were derived and were verified by simulation for MLT accounting for confirmed SNPs and for single-locus test (SLT) without accounting for confirmed SNPs. Statistical power of the two methods was compared by case studies with simulated and the Framingham Heart Study (FHS) GWAS data. Results showed that the MLT method had increased statistical power over SLT. In the GWAS case study on four cholesterol phenotypes and serum metabolites, the MLT method improved statistical power by 5% to 38% depending on the number and effect sizes of the conditional SNPs. For the analysis of HDL cholesterol (HDL-C) and total cholesterol (TC) of the FHS data, the MLT method conditional on confirmed SNPs from GWAS catalog and NCBI had considerably more significant results than SLT.  相似文献   

10.
Colorectal cancer is the second leading cause of cancer death in developed countries. Genome-wide association studies (GWAS) have successfully identified novel susceptibility loci for colorectal cancer. To follow up on these findings, and try to identify novel colorectal cancer susceptibility loci, we present results for GWAS of colorectal cancer (2,906 cases, 3,416 controls) that have not previously published main associations. Specifically, we calculated odds ratios and 95% confidence intervals using log-additive models for each study. In order to improve our power to detect novel colorectal cancer susceptibility loci, we performed a meta-analysis combining the results across studies. We selected the most statistically significant single nucleotide polymorphisms (SNPs) for replication using ten independent studies (8,161 cases and 9,101 controls). We again used a meta-analysis to summarize results for the replication studies alone, and for a combined analysis of GWAS and replication studies. We measured ten SNPs previously identified in colorectal cancer susceptibility loci and found eight to be associated with colorectal cancer (p value range 0.02 to 1.8?×?10(-8)). When we excluded studies that have previously published on these SNPs, five SNPs remained significant at p?相似文献   

11.
Association signals in GWAS are usually prioritized solely by p values. Here, we attempt to improve the power of GWAS by using a weighted false discovery rate control procedure to detect associations of low-frequency variants with effect sizes similar to or even larger than those of common variants. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to test for association with fasting glucose levels in the Atherosclerosis Risk in Communities Study (ARIC) population. In addition to finding several previously identified sequence variations, we identified a low-frequency variant (rs1209523; minor allele frequency = 0.043) near FOXA2 that was associated with fasting glucose levels in European Americans (EAs) (n = 7428, p value = 1.3 × 10−5). The association between rs1209523 and glucose levels was also significant in African Americans (AAs) (n = 2029, p value = 6.7 × 10−3) of the ARIC and was confirmed by replication in both EAs and AAs of the Dallas Heart Study (n = 963 and 1571, respectively; p values = 5.3 × 10−3 and 5.8 × 10−4, respectively) and in EAs of the Cooper Center Longitudinal Study (n = 2862; p value = 1.6 × 10−2). A meta-analysis of these five populations yielded an estimated effect size of −1.31 mg/dl per minor allele (p value = 2.2 × 10−11). This study reveals that there is a cache of less-frequent variants in GWAS arrays that can be identified via analytical approaches accounting for allele frequencies.  相似文献   

12.
The level of population structure and the extent of linkage disequilibrium (LD) can have large impacts on the power, resolution, and design of genome-wide association studies (GWAS) in plants. Until recently, the topics of LD and population structure have not been explored in oat due to the lack of a high-throughput, high-density marker system. The objectives of this research were to survey the level of population structure and the extent of LD in oat germplasm and determine their implications for GWAS. In total, 1,205 lines and 402 diversity array technology (DArT) markers were used to explore population structure. Principal component analysis and model-based cluster analysis of these data indicated that, for the lines used in this study, relatively weak population structure exists. To explore LD decay, map distances of 2,225 linked DArT marker pairs were compared with LD (estimated as r 2). Results showed that LD between linked markers decayed rapidly to r 2 = 0.2 for marker pairs with a map distance of 1.0 centi-Morgan (cM). For GWAS, we suggest a minimum of one marker every cM, but higher densities of markers should increase marker-QTL association and therefore detection power. Additionally, it was found that LD was relatively consistent across the majority of germplasm clusters. These findings suggest that GWAS in oat can include germplasm with diverse origins and backgrounds. The results from this research demonstrate the feasibility of GWAS and related analyses in oat.  相似文献   

13.
原发性高血压全基因组关联研究进展   总被引:2,自引:0,他引:2  
Xu RW  Yan WL 《遗传》2012,34(7):793-809
原发性高血压是一种由遗传与环境因素共同导致的复杂疾病,具有高度的遗传异质性。自2007年首个高血压全基因组关联研究(Genome-wide association studies,GWAS)报道以来,许多GWAS相继开展。文章首先对2007年1月至2011年9月期间报道的24篇血压/高血压易感基因的GWAS按人种与染色体位置对其结果进行汇总,经统计位点rs17249754、rs1378942和rs11191548报道频数最多。其次介绍了GWAS方法学的研究进展,包括选择高质量的数量表型和选择多阶段研究设计来增加研究发现阳性关联的机会。统计分析方面,除强调了已经报道过的多重比较和重复(验证)研究等问题外,文章还介绍了通过Meta分析对GWAS数据进行深度发掘,并应用基因型填补法对缺失数据进行填补可以提高全基因组遗传标记的覆盖率的方法。尽管GWAS发现了许多我们未知的基因与疾病表型的关联,为了解高血压的发病机制提供了更多线索,但是目前GWAS发现的血压/高血压相关变异多为对人群血压的影响极其微弱的常见变异。因此今后的研究中可加强深度功能学研究对易感基因精细定位和外显子组测序技术的应用,结合GWAS的成果进行生物信息学通路分析和表观遗传学机制研究等,逐步揭示高血压的遗传机制。  相似文献   

14.
In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case–control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case–control GWAS as well as those with shared controls.  相似文献   

15.
Capsule: Pairs of White-throated Dippers Cinclus cinclus which defended winter territories bred earlier than non-territorial individuals, but there was no difference in reproductive success.

Aims: The effect of winter territoriality on breeding ecology has rarely been studied in resident birds. We carried out a preliminary investigation of whether winter territorial behaviour and territory size affect the timing of reproduction, breeding territory size and reproductive success in a riverine bird, the White-throated Dipper.

Methods: We monitored an individually marked population of White-throated Dippers in the UK. Wintering individuals were classified as either territorial or ‘floaters’ according to their patterns of occurrence and behaviour, and their nesting attempts were closely monitored in the subsequent months. Winter and breeding territory sizes were measured by gently ‘pushing’ birds along the river and recording the point at which they turned back.

Results: All birds defending winter territories did so in pairs, but some individuals changed partners before breeding. Territorial pairs that were together throughout the study laid eggs significantly earlier than pairs containing floaters and those comprising territorial birds that changed partners. However, there were no significant differences in clutch size, nestling mass or the number of chicks fledged. There was no relationship between winter territory length and lay date or any measure of reproductive success, although sample sizes were small. Winter territories were found to be significantly shorter than breeding territories.

Conclusion: Winter territoriality may be advantageous because breeding earlier increases the likelihood that pairs will raise a second brood, but further study is needed. Territories are shorter in winter as altitudinal migrants from upland streams increase population density on rivers, but this may also reflect seasonal changes in nutritional and energetic demands.  相似文献   

16.
Background:Genome-wide association studies (GWAS) have been the primary tool for an unbiased study of the genetic background of coronary artery disease (CAD). They have identified a list of single-nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD). In this study, we aimed to replicate the association of rs2954029 and rs6982502, a GWAS identified SNP, to CAD in an Iranian population.Methods:A sample of 285 subjects undergoing coronary angiography, including 134 CAD patients and 151 healthy. The genotype determination of rs2954029 and rs6982502 SNPs performed using the high-resolution melting analysis (HRM) technique.Results:Our results revealed that the TT genotype of rs2954029 (p= 0.009) and rs6982502 (p< 0.001) were significantly higher in CAD patients compared with controls. Binary logistic regression showed that rs6982502 and rs2954029 increase the risk of CAD incidence (2.470 times, p= 0.011, 95% CI= [1.219-4.751], and 2.174 times, p= 0.033, 95% CI= [1.066-4.433] respectively). After adjusting for confounders, we found that rs6982502 and rs2954029 are significantly associated with CAD risk.Conclusion:These data showed that the TT genotype of rs2954029 and rs6982502 is associated with the risk of CAD in a hospital-based sample of the Iranian population, which has replicated the result of recent GWAS studies.Key Words: Coronary Artery Disease (CAD), Genome-Wide Association Studies (GWAS), High-Resolution Melting (HRM), Single-Nucleotide Polymorphisms (SNP)  相似文献   

17.
Capsule Population sizes of Common Guillemots Uria aalge, Razorbills Alca torda and Lesser Black‐backed Gulls Larus fuscus were associated with prey abundance but not prey quality.

Aims To examine how the abundance and quality of prey fish affects seabird population size and to test the ‘junk‐food’ or nutritional stress hypothesis.

Methods Analysis of long‐term seabird population size data and Sprat Sprattus sprattus biomass and age‐related weight data using a correlative approach.

Results De‐trended seabird and Sprat population data showed that the abundance of Sprat, the main prey species, was associated with the abundance of seabirds, while no effect of age‐related size of prey on seabird population size was found.

Conclusion As the Sprat population increased so did the seabird populations, regardless of decreases in ‘quality’ of Sprats, implying that more prey fish simply seem to mean more food in this marine ecosystem. No support for the ‘junk‐food’ hypothesis was found and the results contradict suggestions from earlier studies that prey quality is important to top‐predators in the Baltic Sea.  相似文献   

18.
Ghosh  Saurabh  Fardo  David W. 《BMC genetics》2018,19(1):127-131
Background

The GAW20 group formed on the theme of methods for association analyses of repeated measures comprised 4sets of investigators. The provided “real” data set included genotypes obtained from a human whole-genome association study based on longitudinal measurements of triglycerides (TGs) and high-density lipoprotein in addition to methylation levels before and after administration of fenofibrate. The simulated data set contained 200 replications of methylation levels and posttreatment TGs, mimicking the real data set.

Results

The different investigators in the group focused on the statistical challenges unique to family-based association analyses of phenotypes measured longitudinally and applied a wide spectrum of statistical methods such as linear mixed models, generalized estimating equations, and quasi-likelihood–based regression models. This article discusses the varying strategies explored by the group’s investigators with the common goal of improving the power to detect association with repeated measures of a phenotype.

Conclusions

Although it is difficult to identify a common message emanating from the different contributions because of the diversity in the issues addressed, the unifying theme of the contributions lie in the search for novel analytic strategies to circumvent the limitations of existing methodologies to detect genetic association.

  相似文献   

19.
ABSTRACT

Background: Discrepancies in the shape of the productivity–diversity relationship may arise from differences in spatial scale. We hypothesised that there is a grain size effect on the productivity–diversity relationship.

Aims: To determine the effect of three sampling grain sizes on the productivity–diversity relationship.

Methods: We applied generalised linear mixed effect models on community data from 735 vegetation plots in the Taleghan rangelands, Iran, sampled at three grain sizes (0.25, 1 and 2 m2) to ascertain plant productivity-diversity patterns, while accounting for the effects of site, plant community type, disturbance, and life form.

Results: Overall, relationships between biomass and plant species richness were unimodal at grain sizes of 0.25 and 1 m2, and asymptotical at 2 m2. The spurious occurrence of a single large shrub may overwhelm a small-sized sampling unit, resulting in a high estimate of the sample’s biomass relative to species richness. However, the relationship between biomass and species richness at larger grain sizes is more likely to reach an asymptote.

Conclusions: Shrubs are partly responsible for driving the relationship between plant biomass and species richness. Given that the frequency of shrubs is highly variable between small plots but not so in large plots, their presence may result in unimodal productivity–diversity relationships at small but not at large grain sizes.  相似文献   

20.
Single nucleotide polymorphisms (SNPs) in the vitamin D pathway genes have been implicated in cutaneous melanoma (CM) risk, but their role in CM disease‐specific survival (DSS) remains obscure. We comprehensively analyzed the prognostic roles of 2669 common SNPs in the vitamin D pathway genes using data from a published genome‐wide association study (GWAS) at The University of Texas M.D. Anderson Cancer Center (MDACC) and then validated the SNPs of interest in another GWAS from the Nurses’ Health Study and Health Professionals Follow‐up Study. Among the 2669 SNPs, 203 were significantly associated with DSS in MDACC dataset (P < 0.05 and false‐positive report probability < 0.2), of which 18 were the tag SNPs. In the replication, two of these 18 SNPs showed nominal significance: the VDBP rs12512631 T > C was associated with a better DSS [combined hazards ratio (HR) = 0.66]; and the same for RXRA rs7850212 C > A (combined HR = 0.38), which were further confirmed by the Fine and Gray competing‐risks regression model. Further bioinformatics analyses indicated that these loci may modulate corresponding gene methylation status.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号