首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To avoid problems related to unknown population substructure, association studies may be conducted in founder populations. In such populations, however, the relatedness among individuals may be considerable. Neglecting such correlations among individuals can lead to seriously spurious associations. Here, we propose a method for case-control association studies of binary traits that is suitable for any set of related individuals, provided that their genealogy is known. Although we focus here on large inbred pedigrees, this method may also be used in outbred populations for case-control studies in which some individuals are relatives. We base inference on a quasi-likelihood score (QLS) function and construct a QLS test for allelic association. This approach can be used even when the pedigree structure is far too complex to use an exact-likelihood calculation. We also present an alternative approach to this test, in which we use the known genealogy to derive a correction factor for the case-control association chi2 test. We perform analytical power calculations for each of the two tests by deriving their respective noncentrality parameters. The QLS test is more powerful than the corrected chi2 test in every situation considered. Indeed, under certain regularity conditions, the QLS test is asymptotically the locally most powerful test in a general class of linear tests that includes the corrected chi2 test. The two methods are used to test for associations between three asthma-associated phenotypes and 48 SNPs in 35 candidate genes in the Hutterites. We report a highly significant novel association (P=2.10-6) between atopy and an amino acid polymorphism in the P-selectin gene, detected with the QLS test and also, but less significantly (P=.0014), with the transmission/disequilibrium test.  相似文献   

2.
In genome-wide association studies, only a subset of all genomic variants are typed by current, high-throughput, SNP-genotyping platforms. However, many of the untyped variants can be well predicted from typed variants, with linkage disequilibrium (LD) information among typed and untyped variants available from an external reference panel such as HapMap. Incorporation of such external information can allow one to perform tests of association between untyped variants and phenotype, thereby making more efficient use of the available genotype data. When related individuals are included in case-control samples, the dependence among their genotypes must be properly addressed for valid association testing. In the context of testing untyped variants, an additional analytical challenge is that the dependence, across related individuals, of the partial information on untyped-SNP genotypes must also be assessed and incorporated into the analysis for valid inference. We address this challenge with ATRIUM, a method for case-control association testing with untyped SNPs, based on genome screen data in samples in which some individuals are related. ATRIUM uses LD information from an external reference panel to specify a one-degree-of-freedom test of association with an untyped SNP. It properly accounts for dependence in the partial information on untyped-SNP genotypes across related individuals. We demonstrate that ATRIUM is robust in that it maintains the nominal type I error rate even when the external reference panel is not well matched to the case-control sample. We apply the method to detect association between type 2 diabetes and variants on chromosome 10 in the Framingham SHARe data.  相似文献   

3.
4.
Zang Y  Zhang H  Yang Y  Zheng G 《Human heredity》2007,63(3-4):187-195
The population-based case-control design is a powerful approach for detecting susceptibility markers of a complex disease. However, this approach may lead to spurious association when there is population substructure: population stratification (PS) or cryptic relatedness (CR). Two simple approaches to correct for the population substructure are genomic control (GC) and delta centralization (DC). GC uses the variance inflation factor to correct for the variance distortion of a test statistic, and the DC centralizes the non-central chi-square distribution of the test statistic. Both GC and DC have been studied for case-control association studies mainly under a specific genetic model (e.g. recessive, additive or dominant), under which an optimal trend test is available. The genetic model is usually unknown for many complex diseases. In this situation, we study the performance of three robust tests based on the GC and DC corrections in the presence of the population substructure. Our results show that, when the genetic model is unknown, the DC- (or GC-) corrected maximum and Pearson's association test are robust and have good control of Type I error and high power relative to the optimal trend tests in the presence of PS (or CR).  相似文献   

5.
We consider the problem of genomewide association testing of a binary trait when some sampled individuals are related, with known relationships. This commonly arises when families sampled for a linkage study are included in an association study. Furthermore, power to detect association with complex traits can be increased when affected individuals with affected relatives are sampled, because they are more likely to carry disease alleles than are randomly sampled affected individuals. With related individuals, correlations among relatives must be taken into account, to ensure validity of the test, and consideration of these correlations can also improve power. We provide new insight into the use of pedigree-based weights to improve power, and we propose a novel test, the MQLS test, which, as we demonstrate, represents an overall, and in many cases, substantial, improvement in power over previous tests, while retaining a computational simplicity that makes it useful in genomewide association studies in arbitrary pedigrees. Other features of the MQLS are as follows: (1) it is applicable to completely general combinations of family and case-control designs, (2) it can incorporate both unaffected controls and controls of unknown phenotype into the same analysis, and (3) it can incorporate phenotype data about relatives with missing genotype data. The methods are applied to data from the Genetic Analysis Workshop 14 Collaborative Study of the Genetics of Alcoholism, where the MQLS detects genomewide significant association (after Bonferroni correction) with an alcoholism-related phenotype for four different single-nucleotide polymorphisms: tsc1177811 (P=5.9x10(-7)), tsc1750530 (P=4.0x10(-7)), tsc0046696 (P=4.7x10(-7)), and tsc0057290 (P=5.2x10(-7)) on chromosomes 1, 16, 18, and 18, respectively. Three of these four significant associations were not detected in previous studies analyzing these data.  相似文献   

6.
Becker T  Knapp M 《Human heredity》2005,59(4):185-189
In the context of haplotype association analysis of unphased genotype data, methods based on Monte-Carlo simulations are often used to compensate for missing or inappropriate asymptotic theory. Moreover, such methods are an indispensable means to deal with multiple testing problems. We want to call attention to a potential trap in this usually useful approach: The simulation approach may lead to strongly inflated type I errors in the presence of different missing rates between cases and controls, depending on the chosen test statistic. Here, we consider four different testing strategies for haplotype analysis of case-control data. We recommend to interpret results for data sets with non-comparable distributions of missing genotypes with special caution, in case the test statistic is based on inferred haplotypes per individual. Moreover, our results are important for the conduction and interpretation of genome-wide association studies.  相似文献   

7.
Association studies are one of the major strategies for identifying genetic factors underlying complex traits. In samples of related individuals, conventional statistical procedures are not valid for testing association, and maximum likelihood (ML) methods have to be used, but they are computationally demanding and are not necessarily robust to violations of their assumptions. Estimating equations (EE) offer an alternative to ML methods, for estimating association parameters in correlated data. We studied through simulations the behavior of EE in a large range of practical situations, including samples of nuclear families of varying sizes and mixtures of related and unrelated individuals. For a quantitative phenotype, the power of the EE test was comparable to that of a conventional ML test and close to the power expected in a sample of unrelated individuals. For a binary phenotype, the power of the EE test decreased with the degree of clustering, as did the power of the ML test. This result might be partly explained by a modeling of the correlations between responses that is less efficient than that in the quantitative case. In small samples (< 50 families), the variance of the EE association parameter tended to be underestimated, leading to an inflation of the type I error. The heterogeneity of cluster size induced a slight loss of efficiency of the EE estimator, by comparison with balanced samples. The major advantages of the EE technique are its computational simplicity and its great flexibility, easily allowing investigation of gene-gene and gene-environment interactions. It constitutes a powerful tool for testing genotype-phenotype association in related individuals.  相似文献   

8.
Haplotype-based risk models can lead to powerful methods for detecting the association of a disease with a genomic region of interest. In population-based studies of unrelated individuals, however, the haplotype status of some subjects may not be discernible without ambiguity from available locus-specific genotype data. A score test for detecting haplotype-based association using genotype data has been developed in the context of generalized linear models for analysis of data from cross-sectional and retrospective studies. In this article, we develop a test for association using genotype data from cohort and nested case-control studies where subjects are prospectively followed until disease incidence or censoring (end of follow-up) occurs. Assuming a proportional hazard model for the haplotype effects, we derive an induced hazard function of the disease given the genotype data, and hence propose a test statistic based on the associated partial likelihood. The proposed test procedure can account for differential follow-up of subjects, can adjust for possibly time-dependent environmental co-factors and can make efficient use of valuable age-at-onset information that is available on cases. We provide an algorithm for computing the test statistic using readily available statistical software. Utilizing simulated data in the context of two genomic regions GPX1 and GPX3, we evaluate the validity of the proposed test for small sample sizes and study its power in the presence and absence of missing genotype data.  相似文献   

9.
In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits.  相似文献   

10.
Association studies are the most powerful method available for identifying modest gene effects in complex disorders, but they often produce inconsistent results. With the rapidly growing SNP databases, haplotype maps and high throughput genotyping, the use of association studies is expected to increase; therefore, it is critical and timely that the problems with study design are identified and fixed. We questioned if unrecognized allele and genotype frequency variations in controls could be responsible for some of the inconsistent association findings. We performed a population genetic study of apolipoprotein E (APOE) and cytochrome P450 2D6 (CYP2D6) in 1,748 individuals ranging in age from newborns to centenarians. Although APOE and CYP2D6 are two of the most commonly used candidate genes, this is the first study to examine age- and gender-specific frequency distributions over the entire age spectrum, using a large, ethnically and geographically uniform population. We found significant, previously unrecognized variations in APOE allele frequencies, and deviations from Hardy-Weinberg expectations in CYP2D6 genotype frequencies starting at birth. The allele frequency variations within controls were larger than some reported case-control differences. We demonstrate that unrecognized frequency fluctuations in controls are a serious and potentially common confounder whose impact on association studies has not been appreciated, and one that can be addressed with proper study design. We recommend that population genetic studies be performed on commonly used candidate markers and that rigorous standards be applied for case-control matching.  相似文献   

11.
For mapping complex disease traits, linkage studies are often followed by a case-control association strategy in order to identify disease-associated genes/single-nucleotide polymorphisms (SNPs). Substantial efforts are required in selecting the most informative cases from a large collection of affected individuals in order to maximize the power of the study, while taking into consideration study cost. In this article, we applied and extended three case-selection strategies that use allele-sharing information method for families with multiple affected offspring to select most informative cases using additional information on disease severity. Our results revealed that most significant associations, as measured by the lowest p-values, were obtained from a strategy that selected a case with the most allele sharing with other affected sibs from linked families ("linked-best"), despite reduction in sample size resulting from discarding unlinked families. Moreover, information on disease severity appears to be useful to improve the ability to detect associations between markers and disease loci.  相似文献   

12.
The DD genotype of the angiotensin converting enzyme (ACE) polymorphism has been associated with myocardial infarction (MI). However, sample sizes of many case-control studies showing positive association were small and data were inconsistent. Furthermore, no family-based study is available.In a case-control study frequencies of the ACE genotypes were compared in 1319 unrelated patients with previous MI before 60 years of age (616 from the MONICA Augsburg region and 703 from rehabilitation centers in south Germany) and in 2381 population controls from the MONICA Augsburg study region). Furthermore, linkage and association of the ACE I/D polymorphism with MI were tested in 246 informative families using the sib-transmission/disequilibrium test (S-TDT).Overall, no excess of the D allele was found in MI patients (frequency 0.53 versus 0.57 in the general population; P=0.2). The ACE DD genotype was even slightly less frequent in groups with MI compared to the general population controls (0.26 versus 0.33 in women and 0.28 versus 0.33 in men). Similar results were also obtained in 247 men with low cardiovascular risk. In the family-based study, the frequency of the D allele was not different in siblings with or without previous MI (0.53 versus 0.50, respectively; S-TDT P=0.15) indicating no linkage or association of the D allele with MI.In a case-control study of MI patients and controls from the general population as well as a family study neither association nor linkage of the ACE D allele with MI was detected despite sample sizes that were among the largest samples studied so far.  相似文献   

13.
The demonstration of association between common genetic variants and chronic human diseases such as obesity could have profound implications for the prediction, prevention, and treatment of these conditions. Unequivocal proof of such an association, however, requires independent replication of initial positive findings. Recently, three (-243 A>G, +61450 C>A, and +83897 T>A) single nucleotide polymorphisms (SNPs) within glutamate decarboxylase 2 (GAD2) were found to be associated with class III obesity (body mass index > 40 kg/m2). The association was observed among 188 families (612 individuals) segregating the condition, and a case-control study of 575 cases and 646 lean controls. Functional data supporting a pathophysiological role for one of the SNPs (-243 A>G) were also presented. The gene GAD2 encodes the 65-kDa subunit of glutamic acid decarboxylase-GAD65. In the present study, we attempted to replicate this association in larger groups of individuals, and to extend the functional studies of the -243 A>G SNP. Among 2,359 individuals comprising 693 German nuclear families with severe, early-onset obesity, we found no evidence for a relationship between the three GAD2 SNPs and obesity, whether SNPs were studied individually or as haplotypes. In two independent case-control studies (a total of 680 class III obesity cases and 1,186 lean controls), there was no significant relationship between the -243 A>G SNP and obesity (OR = 0.99, 95% CI 0.83-1.18, p = 0.89) in the pooled sample. These negative findings were recapitulated in a meta-analysis, incorporating all published data for the association between the -243G allele and class III obesity, which yielded an OR of 1.11 (95% CI 0.90-1.36, p = 0.28) in a total sample of 1,252 class III obese cases and 1,800 lean controls. Moreover, analysis of common haplotypes encompassing the GAD2 locus revealed no association with severe obesity in families with the condition. We also obtained functional data for the -243 A>G SNP that does not support a pathophysiological role for this variant in obesity. Potential confounding variables in association studies involving common variants and complex diseases (low power to detect modest genetic effects, overinterpretation of marginal data, population stratification, and biological plausibility) are also discussed in the context of GAD2 and severe obesity.  相似文献   

14.
Family-based tests of linkage disequilibrium typically are based on nuclear-family data including affected individuals and their parents or their unaffected siblings. A limitation of such tests is that they generally are not valid tests of association when data from related nuclear families from larger pedigrees are used. Standard methods require selection of a single nuclear family from any extended pedigrees when testing for linkage disequilibrium. Often data are available for larger pedigrees, and it would be desirable to have a valid test of linkage disequilibrium that can use all potentially informative data. In this study, we present the pedigree disequilibrium test (PDT) for analysis of linkage disequilibrium in general pedigrees. The PDT can use data from related nuclear families from extended pedigrees and is valid even when there is population substructure. Using computer simulations, we demonstrated validity of the test when the asymptotic distribution is used to assess the significance, and examined statistical power. Power simulations demonstrate that, when extended pedigree data are available, substantial gains in power can be attained by use of the PDT rather than existing methods that use only a subset of the data. Furthermore, the PDT remains more powerful even when there is misclassification of unaffected individuals. Our simulations suggest that there may be advantages to using the PDT even if the data consist of independent families without extended family information. Thus, the PDT provides a general test of linkage disequilibrium that can be widely applied to different data structures.  相似文献   

15.

Background  

We present a general approach to perform association analyses in pedigrees of arbitrary size and structure, which also allows for a mixture of pedigree members and independent individuals to be analyzed together, to test genetic markers and qualitative or quantitative traits. Our software, PedGenie, uses Monte Carlo significance testing to provide a valid test for related individuals that can be applied to any test statistic, including transmission disequilibrium statistics. Single locus at a time, composite genotype tests, and haplotype analyses may all be performed. We illustrate the validity and functionality of PedGenie using simulated and real data sets. For the real data set, we evaluated the role of two tagging-single nucleotide polymorphisms (tSNPs) in the DNA repair gene, NBS1, and their association with female breast cancer in 462 cases and 572 controls selected to be BRCA1/2 mutation negative from 139 high-risk Utah breast cancer families.  相似文献   

16.
Wang J  Shete S 《PloS one》2011,6(11):e27642
In case-control genetic association studies, cases are subjects with the disease and controls are subjects without the disease. At the time of case-control data collection, information about secondary phenotypes is also collected. In addition to studies of primary diseases, there has been some interest in studying genetic variants associated with secondary phenotypes. In genetic association studies, the deviation from Hardy-Weinberg proportion (HWP) of each genetic marker is assessed as an initial quality check to identify questionable genotypes. Generally, HWP tests are performed based on the controls for the primary disease or secondary phenotype. However, when the disease or phenotype of interest is common, the controls do not represent the general population. Therefore, using only controls for testing HWP can result in a highly inflated type I error rate for the disease- and/or phenotype-associated variants. Recently, two approaches, the likelihood ratio test (LRT) approach and the mixture HWP (mHWP) exact test were proposed for testing HWP in samples from case-control studies. Here, we show that these two approaches result in inflated type I error rates and could lead to the removal from further analysis of potential causal genetic variants associated with the primary disease and/or secondary phenotype when the study of primary disease is frequency-matched on the secondary phenotype. Therefore, we proposed alternative approaches, which extend the LRT and mHWP approaches, for assessing HWP that account for frequency matching. The goal was to maintain more (possible causative) single-nucleotide polymorphisms in the sample for further analysis. Our simulation results showed that both extended approaches could control type I error probabilities. We also applied the proposed approaches to test HWP for SNPs from a genome-wide association study of lung cancer that was frequency-matched on smoking status and found that the proposed approaches can keep more genetic variants for association studies.  相似文献   

17.
An exome-sequencing study of families with multiple breast-cancer-affected individuals identified two families with XRCC2 mutations, one with a protein-truncating mutation and one with a probably deleterious missense mutation. We performed a population-based case-control mutation-screening study that identified six probably pathogenic coding variants in 1,308 cases with early-onset breast cancer and no variants in 1,120 controls (the severity grading was p < 0.02). We also performed additional mutation screening in 689 multiple-case families. We identified ten breast-cancer-affected families with protein-truncating or probably deleterious rare missense variants in XRCC2. Our identification of XRCC2 as a breast cancer susceptibility gene thus increases the proportion of breast cancers that are associated with homologous recombination-DNA-repair dysfunction and Fanconi anemia and could therefore benefit from specific targeted treatments such as PARP (poly ADP ribose polymerase) inhibitors. This study demonstrates the power of massively parallel sequencing for discovering susceptibility genes for common, complex diseases.  相似文献   

18.
The CHRM2 gene is thought to be involved in neuronal excitability, synaptic plasticity and feedback regulation of acetylcholine release and has previously been implicated in higher cognitive processing. In a sample of 667 individuals from 304 families, we genotyped three single-nucleotide polymorphisms (SNPs) in the CHRM2 gene on 7q31-35. From all individuals, standardized intelligence measures were available. Using a test of within-family association, which controls for the possible effects of population stratification, a highly significant association was found between the CHRM2 gene and intelligence. The strongest association was between rs324650 and performance IQ (PIQ), where the T allele was associated with an increase of 4.6 PIQ points. In parallel with a large family-based association, we observed an attenuated - although still significant - population-based association, illustrating that population stratification may decrease our chances of detecting allele-trait associations. Such a mechanism has been predicted earlier, and this article is one of the first to empirically show that family-based association methods are not only needed to guard against false positives, but are also invaluable in guarding against false negatives.  相似文献   

19.
Family-based association tests for genomewide association scans   总被引:7,自引:1,他引:6       下载免费PDF全文
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ~860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ~6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free.  相似文献   

20.
In many case-control genetic association studies, a set of correlated secondary phenotypes that may share common genetic factors with disease status are collected. Examination of these secondary phenotypes can yield valuable insights about the disease etiology and supplement the main studies. However, due to unequal sampling probabilities between cases and controls, standard regression analysis that assesses the effect of SNPs (single nucleotide polymorphisms) on secondary phenotypes using cases only, controls only, or combined samples of cases and controls can yield inflated type I error rates when the test SNP is associated with the disease. To solve this issue, we propose a Gaussian copula-based approach that efficiently models the dependence between disease status and secondary phenotypes. Through simulations, we show that our method yields correct type I error rates for the analysis of secondary phenotypes under a wide range of situations. To illustrate the effectiveness of our method in the analysis of real data, we applied our method to a genome-wide association study on high-density lipoprotein cholesterol (HDL-C), where "cases" are defined as individuals with extremely high HDL-C level and "controls" are defined as those with low HDL-C level. We treated 4 quantitative traits with varying degrees of correlation with HDL-C as secondary phenotypes and tested for association with SNPs in LIPG, a gene that is well known to be associated with HDL-C. We show that when the correlation between the primary and secondary phenotypes is >0.2, the P values from case-control combined unadjusted analysis are much more significant than methods that aim to correct for ascertainment bias. Our results suggest that to avoid false-positive associations, it is important to appropriately model secondary phenotypes in case-control genetic association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号