首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Asymptotic distribution for epistatic tests in case-control studies   总被引:1,自引:0,他引:1  
Liu T  Thalamuthu A  Liu JJ  Chen C  Wang Z  Wu R 《Genomics》2011,98(2):145-151
We propose a statistical model for dissecting a multilocus genotypic value into its main (additive and dominant) effects and epistatic effects between different loci in a case-control association study. The model can discern four different kinds of epistasis, additive × additive, additive × dominant, dominant × additive, and dominant × dominant interactions. To test each kind of epistasis, a χ2 test statistic was computed for a two by two contingency table derived from combined genotypes in both case and control groups. We derived an analytical approach for estimating the asymptotic distribution of the χ2 test statistic for epistatic tests under the null hypothesis, with the result being consistent with that from Monte Carlo simulations. The new model was used to analyze a case-control data set for candidate gene studies of stroke, leading to the identification of several significant interactions between causal SNPs on this disease.  相似文献   

2.
An entropy-based statistic for genomewide association studies   总被引:8,自引:0,他引:8       下载免费PDF全文
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.  相似文献   

3.
Here, we describe the results from the first variance heterogeneity Genome Wide Association Study (VGWAS) on yeast expression data. Using this forward genetics approach, we show that the genetic regulation of gene-expression in the budding yeast, Saccharomyces cerevisiae, includes mechanisms that can lead to variance heterogeneity in the expression between genotypes. Additionally, we performed a mean effect association study (GWAS). Comparing the mean and variance heterogeneity analyses, we find that the mean expression level is under genetic regulation from a larger absolute number of loci but that a higher proportion of the variance controlling loci were trans-regulated. Both mean and variance regulating loci cluster in regulatory hotspots that affect a large number of phenotypes; a single variance-controlling locus, mapping close to DIA2, was found to be involved in more than 10% of the significant associations. It has been suggested in the literature that variance-heterogeneity between the genotypes might be due to genetic interactions. We therefore screened the multi-locus genotype-phenotype maps for several traits where multiple associations were found, for indications of epistasis. Several examples of two and three locus genetic interactions were found to involve variance-controlling loci, with reports from the literature corroborating the functional connections between the loci. By using a new analytical approach to re-analyze a powerful existing dataset, we are thus able to both provide novel insights to the genetic mechanisms involved in the regulation of gene-expression in budding yeast and experimentally validate epistasis as an important mechanism underlying genetic variance-heterogeneity between genotypes.  相似文献   

4.
Identification of genetic loci in complex traits has focused largely on one-dimensional genome scans to search for associations between single markers and the phenotype. There is mounting evidence that locus interactions, or epistasis, are a crucial component of the genetic architecture of biologically relevant traits. However, epistasis is often viewed as a nuisance factor that reduces power for locus detection. Counter to expectations, recent work shows that fitting full models, instead of testing marker main effect and interaction components separately, in exhaustive multi-locus genome scans can have higher power to detect loci when epistasis is present than single-locus scans, and improvement that comes despite a much larger multiple testing alpha-adjustment in such searches. We demonstrate, both theoretically and via simulation, that the expected power to detect loci when fitting full models is often larger when these loci act epistatically than when they act additively. Additionally, we show that the power for single locus detection may be improved in cases of epistasis compared to the additive model. Our exploration of a two step model selection procedure shows that identifying the true model is difficult. However, this difficulty is certainly not exacerbated by the presence of epistasis, on the contrary, in some cases the presence of epistasis can aid in model selection. The impact of allele frequencies on both power and model selection is dramatic.  相似文献   

5.
Aylor DL  Zeng ZB 《PLoS genetics》2008,4(3):e1000029
Gene expression data has been used in lieu of phenotype in both classical and quantitative genetic settings. These two disciplines have separate approaches to measuring and interpreting epistasis, which is the interaction between alleles at different loci. We propose a framework for estimating and interpreting epistasis from a classical experiment that combines the strengths of each approach. A regression analysis step accommodates the quantitative nature of expression measurements by estimating the effect of gene deletions plus any interaction. Effects are selected by significance such that a reduced model describes each expression trait. We show how the resulting models correspond to specific hierarchical relationships between two regulator genes and a target gene. These relationships are the basic units of genetic pathways and genomic system diagrams. Our approach can be extended to analyze data from a variety of experiments, multiple loci, and multiple environments.  相似文献   

6.
Biological functions typically involve complex interacting molecular networks, with numerous feedback and regulation loops. How the properties of the system are affected when one, or several of its parts are modified is a question of fundamental interest, with numerous implications for the way we study and understand biological processes and treat diseases. This question can be rephrased in terms of relating genotypes to phenotypes: to what extent does the effect of a genetic variation at one locus depend on genetic variation at all other loci? Systematic quantitative measurements of epistasis – the deviation from additivity in the effect of alleles at different loci – on a given quantitative trait remain a major challenge. Here, we take a complementary approach of studying theoretically the effect of varying multiple parameters in a validated model of molecular signal transduction. To connect with the genotype/phenotype mapping we interpret parameters of the model as different loci with discrete choices of these parameters as alleles, which allows us to systematically examine the dependence of the signaling output – a quantitative trait – on the set of possible allelic combinations. We show quite generally that quantitative traits behave approximately additively (weak epistasis) when alleles correspond to small changes of parameters; epistasis appears as a result of large differences between alleles. When epistasis is relatively strong, it is concentrated in a sparse subset of loci and in low order (e.g. pair-wise) interactions. We find that focusing on interaction between loci that exhibit strong additive effects is an efficient way of identifying most of the epistasis. Our model study defines a theoretical framework for interpretation of experimental data and provides statistical predictions for the structure of genetic interaction expected for moderately complex biological circuits.  相似文献   

7.
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method—Tango’s statistic—to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios.  相似文献   

8.
Improving yield is a major objective for cotton breeding schemes, and lint yield and its three component traits (boll number, boll weight and lint percentage) are complex traits controlled by multiple genes and various environments. Association mapping was performed to detect markers associated with these four traits using 651 simple sequence repeats (SSRs). A mixed linear model including epistasis and environmental interaction was used to screen the loci associated with these four yield traits by 323 accessions of Gossypium hirsutum L. evaluated in nine different environments. 251 significant loci were detected to be associated with lint yield and its three components, including 69 loci with individual effects and all involved in epistasis interactions. These significant loci explain ∼ 62.05% of the phenotypic variance (ranging from 49.06% ∼ 72.29% for these four traits). It was indicated by high contribution of environmental interaction to the phenotypic variance for lint yield and boll numbers, that genetic effects of SSR loci were susceptible to environment factors. Shared loci were also observed among these four traits, which may be used for simultaneous improvement in cotton breeding for yield traits. Furthermore, consistent and elite loci were screened with −Log10 (P-value) >8.0 based on predicted effects of loci detected in different environments. There was one locus and 6 pairs of epistasis for lint yield, 4 loci and 10 epistasis for boll number, 15 loci and 2 epistasis for boll weight, and 2 loci and 5 epistasis for lint percentage, respectively. These results provided insights into the genetic basis of lint yield and its components and may be useful for marker-assisted breeding to improve cotton production.  相似文献   

9.

Background

The deployment of Genome-wide association studies (GWASs) requires genomic information of a large population to produce reliable results. This raises significant privacy concerns, making people hesitate to contribute their genetic information to such studies.

Results

We propose two provably secure solutions to address this challenge: (1) a somewhat homomorphic encryption (HE) approach, and (2) a secure multiparty computation (MPC) approach. Unlike previous work, our approach does not rely on adding noise to the input data, nor does it reveal any information about the patients. Our protocols aim to prevent data breaches by calculating the χ2 statistic in a privacy-preserving manner, without revealing any information other than whether the statistic is significant or not. Specifically, our protocols compute the χ2 statistic, but only return a yes/no answer, indicating significance. By not revealing the statistic value itself but only the significance, our approach thwarts attacks exploiting statistic values. We significantly increased the efficiency of our HE protocols by introducing a new masking technique to perform the secure comparison that is necessary for determining significance.

Conclusions

We show that full-scale privacy-preserving GWAS is practical, as long as the statistics can be computed by low degree polynomials. Our implementations demonstrated that both approaches are efficient. The secure multiparty computation technique completes its execution in approximately 2 ms for data contributed by one million subjects.
  相似文献   

10.
Libraries of near-isogenic lines (NILs) are a powerful plant genetic resource to map quantitative trait loci (QTL). Nevertheless, QTL mapping with NILs is mostly restricted to genetic main effects. Here we propose a two-step procedure to map additive-by-additive digenic epistasis with NILs. In the first step, a generation means analysis of parents, their F1 hybrid, and one-segment NILs and their triple testcross (TTC) progenies is used to identify in a one-dimensional scan loci exhibiting QTL-by-background interactions. In a second step, one-segment NILs with significant additive-by-additive background interactions are used to produce particular two-segment NILs to test for digenic epistatic interactions between these segments. We evaluated our approach by analyzing a random subset of a genomewide Arabidopsis thaliana NIL library for growth-related traits. The results of our experimental study illustrated the potential of the presented two-step procedure to map additive-by-additive digenic epistasis with NILs. Furthermore, our findings suggested that additive main effects as well as additive-by-additive digenic epistasis strongly influence the genetic architecture underlying growth-related traits of A. thaliana.  相似文献   

11.
Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple—even distinct—traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10−8) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10−7) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes.  相似文献   

12.
Zhao J  Boerwinkle E  Xiong M 《Human genetics》2007,121(3-4):357-367
Availability of a large collection of single nucleotide polymorphisms (SNPs) and efficient genotyping methods enable the extension of linkage and association studies for complex diseases from small genomic regions to the whole genome. Establishing global significance for linkage or association requires small P-values of the test. The original TDT statistic compares the difference in linear functions of the number of transmitted and nontransmitted alleles or haplotypes. In this report, we introduce a novel TDT statistic, which uses Shannon entropy as a nonlinear transformation of the frequencies of the transmitted or nontransmitted alleles (or haplotypes), to amplify the difference in the number of transmitted and nontransmitted alleles or haplotypes in order to increase statistical power with large number of marker loci. The null distribution of the entropy-based TDT statistic and the type I error rates in both homogeneous and admixture populations are validated using a series of simulation studies. By analytical methods, we show that the power of the entropy-based TDT statistic is higher than the original TDT, and this difference increases with the number of marker loci. Finally, the new entropy-based TDT statistic is applied to two real data sets to test the association of the RET gene with Hirschsprung disease and the Fcγ receptor genes with systemic lupus erythematosus. Results show that the entropy-based TDT statistic can reach p-values that are small enough to establish genome-wide linkage or association analyses.  相似文献   

13.
The seeds of flowering plants develop from double fertilization and play a vital role in reproduction and supplying human and animal food. The genetic variation of seed traits is influenced by multiple genetic systems, e.g., maternal, embryo, and/or endosperm genomes. Understanding the genetic architecture of seed traits is a major challenge because of this complex mechanism of multiple genetic systems, especially the epistasis within or between different genomes and their interactions with the environment. In this study, a statistical model was proposed for mapping QTL with epistasis and QTL-by-environment (QE) interactions underlying endosperm and embryo traits. Our model integrates the maternal and the offspring genomes into one mapping framework and can accurately analyze maternal additive and dominant effects, endosperm/embryo additive and dominant effects, and epistatic effects of two loci in the same or two different genomes, as well as interaction effects of each genetic component of QTL with environment. Intensive simulations under different sampling strategies, heritabilities, and model parameters were performed to investigate the statistical properties of the model. A set of real cottonseed data was analyzed to demonstrate our methods. A software package, QTLNetwork-Seed-1.0.exe, was developed for QTL analysis of seed traits.  相似文献   

14.
A central goal of evolutionary genetics is to understand, at the molecular level, how organisms adapt to their environments. For a given trait, the answer often involves the acquisition of variants at unlinked sites across the genome. Genomic methods have achieved landmark successes in pinpointing these adaptive loci. To figure out how a suite of adaptive alleles work together, and to what extent they can reconstitute the phenotype of interest, requires their transfer into an exogenous background. We studied the joint effect of adaptive, gain-of-function thermotolerance alleles at eight unlinked genes from Saccharomyces cerevisiae, when introduced into a thermosensitive sister species, S. paradoxus. Although the loci damped each other’s beneficial impact (that is, they were subject to negative epistasis), most boosted high-temperature growth alone and in combination, and none was deleterious. The complete set of eight genes was sufficient to confer ~15% of the S. cerevisiae thermotolerance phenotype in the S. paradoxus background. The same loci also contributed to a heretofore unknown advantage in cold growth by S. paradoxus. Together, our data establish temperature resistance in yeasts as a model case of a genetically complex evolutionary tradeoff, which can be partly reconstituted from the sequential assembly of unlinked underlying loci.  相似文献   

15.
Detecting gene-gene interaction in complex diseases has become an important priority for common disease genetics, but most current approaches to detecting interaction start with disease-marker associations. These approaches are based on population allele frequency correlations, not genetic inheritance, and therefore cannot exploit the rich information about inheritance contained within families. They are also hampered by issues of rigorous phenotype definition, multiple test correction, and allelic and locus heterogeneity. We recently developed, tested, and published a powerful gene-gene interaction detection strategy based on conditioning family data on a known disease-causing allele or a disease-associated marker allele4. We successfully applied the method to disease data and used computer simulation to exhaustively test the method for some epistatic models. We knew that the statistic we developed to indicate interaction was less reliable when applied to more-complex interaction models. Here, we improve the statistic and expand the testing procedure. We computer-simulated multipoint linkage data for a disease caused by two interacting loci. We examined epistatic as well as additive models and compared them with heterogeneity models. In all our models, the at-risk genotypes are “major” in the sense that among affected individuals, a substantial proportion has a disease-related genotype. One of the loci (A) has a known disease-related allele (as would have been determined from a previous analysis). We removed (pruned) family members who did not carry this allele; the resultant dataset is referred to as “stratified.” This elimination step has the effect of raising the “penetrance” and detectability at the second locus (B). We used the lod scores for the stratified and unstratified data sets to calculate a statistic that either indicated the presence of interaction or indicated that no interaction was detectable. We show that the new method is robust and reliable for a wide range of parameters. Our statistic performs well both with the epistatic models (false negative rates, i.e., failing to detect interaction, ranging from 0 to 2.5%) and with the heterogeneity models (false positive rates, i.e., falsely detecting interaction, ≤1%). It works well with the additive model except when allele frequencies at the two loci differ widely. We explore those features of the additive model that make detecting interaction more difficult. All testing of this method suggests that it provides a reliable approach to detecting gene-gene interaction.  相似文献   

16.
A simulation study is conducted to compare several methods that test the common log odds ratio in multiple 2 × 2 tables when the data are correlated within clusters. Allowing cluster size to vary within each table, we evaluate the unadjusted Mantel‐Haenszel chi‐square statistic (χ2MH), the adjusted Mantel‐Haenszel chi‐square statistics of Rao and Scott using both an unpooled design effect (χ2RSN) and a pooled design effect (χ2RSP), the adjusted Mantel‐Haenszel chi‐square statistic of Donald and Donner (χ2DD), the chi‐square statistic using the GEE approach (χ2GEE), the adjusted Mantel‐Haenszel chi‐square statistic of Begg (χ2B), the Wald (χ2W), the robust Wald (χ2RW), the score (χ2S), the robust score (χ2RS), and the adjusted Mantel‐Haenszel chi‐square statistics of Zhang and Boos (χ2ZBP and χ2ZBN). The test statistics above are compared in terms of empirical significance levels and empirical power levels. The robust score statistic χ2RS and the adjusted Mantel‐Haenszel chi‐square statistics of Zhang and Boos (χ2ZBP and χ2ZBN) generally have empirical significance levels closer to the nominal value than the other statistics. These three statistics have similar empirical power levels when the intracluster correlation is zero or the cluster sizes are balanced. χ2RS performs better in terms of empirical power levels when a positive intracluster correlation exists in the imbalance setting.  相似文献   

17.
Geographic isolation interrupted gene flow between populations leading to population differentiation during the long evolutionary period. In this paper, 33 colonies from Damen Island and 100 colonies from adjacent mainland populations, Juxi and Chixi, were analyzed with both mitochondrial tRNAleu-COII sequences and five microsatellite loci. The results showed that Apis cerana cerana population from Damen Island significantly differentiated from its adjacent mainland populations. In addition, Damen Island population showed a lower level of genetic diversity in terms of the number of mitochondrial haplotypes while both island and mainland populations showed a low level of genetic diversity with mutilocus analysis. The divergent small island population A.c. cerana might probably have suffered inbreeding and genetic drift as well as limited gene flow across the strait. Our data provides useful information for management and preservation for the Damen Island population.  相似文献   

18.
19.
20.
The detection of epistatic interactive effects of multiple genetic variants on the susceptibility of human complex diseases is a great challenge in genome-wide association studies (GWAS). Although methods have been proposed to identify such interactions, the lack of an explicit definition of epistatic effects, together with computational difficulties, makes the development of new methods indispensable. In this paper, we introduce epistatic modules to describe epistatic interactive effects of multiple loci on diseases. On the basis of this notion, we put forward a Bayesian marker partition model to explain observed case-control data, and we develop a Gibbs sampling strategy to facilitate the detection of epistatic modules. Comparisons of the proposed approach with three existing methods on seven simulated disease models demonstrate the superior performance of our approach. When applied to a genome-wide case-control data set for Age-related Macular Degeneration (AMD), the proposed approach successfully identifies two known susceptible loci and suggests that a combination of two other loci—one in the gene SGCD and the other in SCAPER—is associated with the disease. Further functional analysis supports the speculation that the interaction of these two genetic variants may be responsible for the susceptibility of AMD. When applied to a genome-wide case-control data set for Parkinson's disease, the proposed method identifies seven suspicious loci that may contribute independently to the disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号