共查询到20条相似文献,搜索用时 0 毫秒
1.
Power for genetic association studies with random allele frequencies and genotype distributions
下载免费PDF全文

One of the first and most important steps in planning a genetic association study is the accurate estimation of the statistical power under a proposed study design and sample size. In association studies for candidate genes or in fine-mapping applications, allele and genotype frequencies are often assumed to be known when, in fact, they are unknown (i.e., random variables from some distribution). For example, if we consider a diallelic marker with allele frequencies of 0.5 and 0.5 and Hardy-Weinberg proportions, the three genotype frequencies are often assumed to be 0.25, 0.50, and 0.25, and the statistical power is calculated. Unfortunately, ignoring this source of variation can inflate the estimated power of the study. In the present article, we propose averaging the estimates of power over the distribution of the genotype frequencies to calculate the true estimate of power for a fixed allele frequency. For the usual situation, in which allele frequencies in a population are not known, we propose placing a prior distribution on the allele frequency, taking advantage of any available genotype information. This Bayesian approach provides a more accurate estimate of power. We present examples for quantitative and qualitative traits in cohort studies of unrelated individuals and results from an extensive series of examples that show that ignoring the uncertainty in allele frequencies can inflate the estimated power of the study. We also present the results from case-control studies and show that standard methods may also overestimate power. As discussed in this article, the approach of fixing allele frequencies even if they are not known is the common approach to power calculations. We show that ignoring the sources of variation in allele frequencies tends to result in overestimates of power and, consequently, in studies that are underpowered. Software in C is available at http://www.ambrosius.net/Power/. 相似文献
2.
Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies has not been fully characterized. We propose a new approach of selecting tSNPs based on determining the set of SNPs with the highest power to detect association. Two-locus genotype frequencies are used in the power calculations. To show utility, we applied this power method to a large number of SNPs that had been genotyped in Caucasian samples. We demonstrate that a significant reduction in genotyping efforts can be achieved although the reduction depends on genotypic relative risk, inheritance mode and the prevalence of disease in the human population. The tSNP sets identified by our method are remarkably robust to changes in the disease model when small relative risk and additive mode of inheritance are employed. We have also evaluated the ability of the method to detect unidentified SNPs. Our findings have important implications in applying tSNPs from different data sources in association studies. 相似文献
3.
Power calculations for matched case-control studies 总被引:4,自引:0,他引:4
W D Dupont 《Biometrics》1988,44(4):1157-1168
Power calculations are derived for matched case-control studies in terms of the probability po of exposure among the control patients, the correlation coefficient phi for exposure between matched case and control patients, and the odds ratio psi for exposure in case and control patients. For given Type I and Type II error probabilities alpha and beta, the odds ratio that can be detected with a given sample size is derived as well as the sample size needed to detect a specified value of the odds ratio. Graphs are presented for paired designs that show the relationship between sample size and power for alpha = .05, beta = .2, and different values of po, phi, and psi. The sample size needed for designs involving M matched control patients can be derived from these graphs by means of a simple equation. These results quantify the loss of power associated with increasing correlation between the exposure status of matched case and control patients. Sample size requirements are also greatly increased for values of po near 0 or 1. The relationship between sample size, psi, phi, and po is discussed and illustrated by examples. 相似文献
4.
Brown BW 《Genetical research》2004,83(2):133-141
The transmission/disequilibrium test (TDT) and the affected sib pair test (ASP) both test for the association of a marker allele with some conditions. Here, we present methods for calculating the probability of detecting the association (power) for a study examining a fixed number of families for suitability for the study and for calculating the number of such families to be examined. Both calculations use a genetic model for the association. The model considered posits a bi-allelic marker locus that is linked to a bi-allelic disease locus with a possibly nonzero recombination fraction between the loci. The penetrance of the disease is an increasing function of the number of disease alleles. The TDT tests whether the transmission by a heterozygous parent of a particular allele at a marker locus to an affected offspring occurs with probability greater than 0.5. The ASP tests whether transmission of the same allele to two affected sibs occurs with probability greater than 0.5. In either case, evidence that the probability is greater than 0.5 is evidence for association between the marker and the disease. Study inclusion criteria (IC) can greatly affect the necessary sample size of a TDT or ASP study. IC considered by us include a randomly selected parent at least one parent or both parents required to be heterozygous. It also allows a specified minimum number of affected offspring to be required (TDT only). We use elementary probability calculations rather than complex mathematical manipulations or asymptotic methods (large sample size approximations) to compute power and requisite sample size for a proposed study. The advantages of these methods are simplicity and generality. 相似文献
5.
We studied a trend test for genetic association between disease and the number of risk alleles using case-control data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for case-control studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism. 相似文献
6.
Probabilistic graphical models have been widely recognized as a powerful formalism in the bioinformatics field, especially in gene expression studies and linkage analysis. Although less well known in association genetics, many successful methods have recently emerged to dissect the genetic architecture of complex diseases. In this review article, we cover the applications of these models to the population association studies' context, such as linkage disequilibrium modeling, fine mapping and candidate gene studies, and genome-scale association studies. Significant breakthroughs of the corresponding methods are highlighted, but emphasis is also given to their current limitations, in particular, to the issue of scalability. Finally, we give promising directions for future research in this field. 相似文献
7.
Genotyping technology now allows the rapid and affordable generation of million-SNP profiles for humans, leading to considerable activity in association mapping. Similar activity is anticipated for many plant species, including Brassica. These plant association mapping activities will require the same care in quality control and quality assurance as for humans. The subsequent analyses may draw upon the same body of theory that is described here in the language of quantitative genetics. 相似文献
8.
König IR 《Briefings in bioinformatics》2011,12(3):253-258
Validation of genetic associations is understood to be a cornerstone for the scientific credibility of the results. To approach this topic, the general concept of genetic association studies is introduced briefly, followed by how the term 'validation' is used in the context of genetic association studies. As a central issue, reasons for the importance of validation and for failure of validation will be described. 相似文献
9.
Li H 《Human genetics》2012,131(9):1395-1401
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed. 相似文献
10.
Meta-analysis of genetic association studies 总被引:11,自引:0,他引:11
Meta-analysis, a statistical tool for combining results across studies, is becoming popular as a method for resolving discrepancies in genetic association studies. Persistent difficulties in obtaining robust, replicable results in genetic association studies are almost certainly because genetic effects are small, requiring studies with many thousands of subjects to be detected. In this article, we describe how meta-analysis works and consider whether it will solve the problem of underpowered studies or whether it is another affliction visited by statisticians on geneticists. We show that meta-analysis has been successful in revealing unexpected sources of heterogeneity, such as publication bias. If heterogeneity is adequately recognized and taken into account, meta-analysis can confirm the involvement of a genetic variant, but it is not a substitute for an adequately powered primary study. 相似文献
11.
Background
Infectious disease of livestock continues to be a cause of substantial economic loss and has adverse welfare consequences in both the developing and developed world. New solutions to control disease are needed and research focused on the genetic loci determining variation in immune-related traits has the potential to deliver solutions. However, identifying selectable markers and the causal genes involved in disease resistance and vaccine response is not straightforward. The aims of this study were to locate regions of the bovine genome that control the immune response post immunisation. 195 F2 and backcross Holstein Charolais cattle were immunised with a 40-mer peptide derived from foot-and-mouth disease virus (FMDV). T cell and antibody (IgG1 and IgG2) responses were measured at several time points post immunisation. All experimental animals (F0, F1 and F2, n = 982) were genotyped with 165 microsatellite markers for the genome scan.Results
Considerable variability in the immune responses across time was observed and sire, dam and age had significant effects on responses at specific time points. There were significant correlations within traits across time, and between IgG1 and IgG2 traits, also some weak correlations were detected between T cell and IgG2 responses. The whole genome scan detected 77 quantitative trait loci (QTL), on 22 chromosomes, including clusters of QTL on BTA 4, 5, 6, 20, 23 and 25. Two QTL reached 5% genome wide significance (on BTA 6 and 24) and one on BTA 20 reached 1% genome wide significance.Conclusions
A proportion of the variance in the T cell and antibody response post immunisation with an FDMV peptide has a genetic component. Even though the antigen was relatively simple, the humoral and cell mediated responses were clearly under complex genetic control, with the majority of QTL located outside the MHC locus. The results suggest that there may be specific genes or loci that impact on variation in both the primary and secondary immune responses, whereas other loci may be specifically important for early or later phases of the immune response. Future fine mapping of the QTL clusters identified has the potential to reveal the causal variations underlying the variation in immune response observed. 相似文献12.
13.
14.
Stephen P Peters 《Respiratory research》2009,10(1):109
Genetic association studies have become an important part of our scientific landscape. This commentary discusses some basic scientific issues which should be considered when reporting and evaluating such studies including SNP Discovery, Genotyping and Haplotype Analysis; Population Size, Matching of Cases and Controls, and Population Stratification; Phenotype Definition and Multiple Related Phenotypes; Multiple Testing; Replication; Genome-wide Association Studies (GWAS); and the Role of Functional Studies. All of these elements are important in evaluating such studies and should be carefully considered when these studies are conceived and carried out. 相似文献
15.
Although inbred mouse strains have been the premier model organism used in biomedical research, multiple studies and analyses have indicated that genome-wide association studies (GWAS) cannot be productively performed using inbred mouse strains. However, there is one type of GWAS in mice that has successfully identified the genetic basis for many biomedical traits of interest: haplotype-based computational genetic mapping (HBCGM). Here, we describe how the methodological basis for a HBCGM study significantly differs from that of a conventional murine GWAS, and how an integrative analysis of its output within the context of other 'omic' information can enable genetic discovery. Consideration of these factors will substantially improve the prognosis for the utility of murine genetic association studies for biomedical discovery. 相似文献
16.
Hinds DA Stokowski RP Patil N Konvicka K Kershenobich D Cox DR Ballinger DG 《American journal of human genetics》2004,74(2):317-325
Association studies in populations that are genetically heterogeneous can yield large numbers of spurious associations if population subgroups are unequally represented among cases and controls. This problem is particularly acute for studies involving pooled genotyping of very large numbers of single-nucleotide-polymorphism (SNP) markers, because most methods for analysis of association in structured populations require individual genotyping data. In this study, we present several strategies for matching case and control pools to have similar genetic compositions, based on ancestry information inferred from genotype data for approximately 300 SNPs tiled on an oligonucleotide-based genotyping array. We also discuss methods for measuring the impact of population stratification on an association study. Results for an admixed population and a phenotype strongly confounded with ancestry show that these simple matching strategies can effectively mitigate the impact of population stratification. 相似文献
17.
Case-control studies are commonly used to study whether a candidate allele and a disease are associated. However, spurious association can arise due to population substructure or cryptic relatedness, which cause the variance of the trend test to increase. Devlin and Roeder derived the appropriate variance inflation factor (VIF) for the trend test and proposed a novel genomic control (GC) approach to estimate VIF and adjust the test statistic. Their results were derived assuming an additive genetic model and the corresponding VIF is independent of the candidate allele frequency. We determine the appropriate VIFs for recessive and dominant models. Unlike the additive test, the VIFs for the optimal tests for these two models depend on the candidate allele frequency. Simulation results show that, when the null loci used to estimate the VIF have allele frequencies similar to that of the candidate gene, the GC tests derived for recessive and dominant models remain optimal. When the underlying genetic model is unknown or the null loci and candidate gene have quite different allele frequencies, the GC tests derived for the recessive or dominant models cannot be used while the GC test derived for the additive model can be. 相似文献
18.
19.
A power calculation is crucial in planning genetic studies. In genetic association studies, the power is often calculated using the expected number of individuals with each genotype calculated from an assumed allele frequency under Hardy-Weinberg equilibrium. Since the allele frequency is often unknown, the number of individuals with each genotype is random and so a power calculation assuming a known allele frequency may be incorrect. Ambrosius et al. recently showed that the power ignoring this randomness may lead to studies with insufficient power and proposed averaging the power due to the randomness. We extend the method of averaging power in two directions. First, for testing association in case-control studies, we use the Cochran-Armitage trend test and find that the time needed for calculating the averaged power is much reduced compared to the chi-square test with two degrees of freedom studied by Ambrosius et al. A real study is used for illustration of the method. Second, we extend the method to linkage analysis, where the number of identical-by-descent alleles shared by siblings is random. The distribution of identical-by-descent numbers depends on the underlying genetic model rather than the allele frequency. The robust test for linkage analysis is also examined using the averaged powers. We also recommend a sensitivity analysis when the true allele frequency or the number of identical-by-descent alleles is unknown. 相似文献
20.