首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Widespread multifactor interactions present a significant challenge in determining risk factors of complex diseases. Several combinatorial approaches, such as the multifactor dimensionality reduction (MDR) method, have emerged as a promising tool for better detecting gene-gene (G x G) and gene-environment (G x E) interactions. We recently developed a general combinatorial approach, namely the generalized multifactor dimensionality reduction (GMDR) method, which can entertain both qualitative and quantitative phenotypes and allows for both discrete and continuous covariates to detect G x G and G x E interactions in a sample of unrelated individuals. In this article, we report the development of an algorithm that can be used to study G x G and G x E interactions for family-based designs, called pedigree-based GMDR (PGMDR). Compared to the available method, our proposed method has several major improvements, including allowing for covariate adjustments and being applicable to arbitrary phenotypes, arbitrary pedigree structures, and arbitrary patterns of missing marker genotypes. Our Monte Carlo simulations provide evidence that the PGMDR method is superior in performance to identify epistatic loci compared to the MDR-pedigree disequilibrium test (PDT). Finally, we applied our proposed approach to a genetic data set on tobacco dependence and found a significant interaction between two taste receptor genes (i.e., TAS2R16 and TAS2R38) in affecting nicotine dependence.  相似文献   

2.
Chen GB  Xu Y  Xu HM  Li MD  Zhu J  Lou XY 《PloS one》2011,6(2):e16981
Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50~0.65) reported in the literature. The GMDR with covariate adjustment had a power of >80% in a case-control design with a sample size of ≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was <0.56, a sample size of ≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56~0.62 for a sample size of 1000-2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000~2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy <0.56.  相似文献   

3.
Epistasis or gene-gene interaction is a fundamental component of the genetic architecture of complex traits such as disease susceptibility. Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free method to detect epistasis when there are no significant marginal genetic effects. However, in many studies of complex disease, other covariates like age of onset and smoking status could have a strong main effect and may potentially interfere with MDR's ability to achieve its goal. In this paper, we present a simple and computationally efficient sampling method to adjust for covariate effects in MDR. We use simulation to show that after adjustment, MDR has sufficient power to detect true gene-gene interactions. We also compare our method with the state-of-art technique in covariate adjustment. The results suggest that our proposed method performs similarly, but is more computationally efficient. We then apply this new method to an analysis of a population-based bladder cancer study in New Hampshire.  相似文献   

4.
Neuhaus JM  Scott AJ  Wild CJ 《Biometrics》2006,62(2):488-494
Case-control studies augmented by the values of responses and covariates from family members allow investigators to study the association between the response and genetics and environment by relating differences in the response directly to within-family differences in covariates. However, existing approaches for case-control family data parameterize covariate effects in terms of the marginal probability of response, the same effects that one estimates from standard case-control studies. This article focuses on the estimation of family-specific covariate effects and develops efficient methods to fit family-specific models such as binary mixed-effects models. We also extend the approach to cover any setting where one has a fully specified model for the vector of responses in a family. We illustrate our approach using data from a case-control family study of brain cancer and consider the use of weighted and conditional likelihood methods as alternatives.  相似文献   

5.
Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion.  相似文献   

6.
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.  相似文献   

7.
There has been increased interest in discovering combinations of single-nucleotide polymorphisms (SNPs) that are strongly associated with a phenotype even if each SNP has little individual effect. Efficient approaches have been proposed for searching two-locus combinations from genome-wide datasets. However, for high-order combinations, existing methods either adopt a brute-force search which only handles a small number of SNPs (up to few hundreds), or use heuristic search that may miss informative combinations. In addition, existing approaches lack statistical power because of the use of statistics with high degrees-of-freedom and the huge number of hypotheses tested during combinatorial search. Due to these challenges, functional interactions in high-order combinations have not been systematically explored. We leverage discriminative-pattern-mining algorithms from the data-mining community to search for high-order combinations in case-control datasets. The substantially improved efficiency and scalability demonstrated on synthetic and real datasets with several thousands of SNPs allows the study of several important mathematical and statistical properties of SNP combinations with order as high as eleven. We further explore functional interactions in high-order combinations and reveal a general connection between the increase in discriminative power of a combination over its subsets and the functional coherence among the genes comprising the combination, supported by multiple datasets. Finally, we study several significant high-order combinations discovered from a lung-cancer dataset and a kidney-transplant-rejection dataset in detail to provide novel insights on the complex diseases. Interestingly, many of these associations involve combinations of common variations that occur in small fractions of population. Thus, our approach is an alternative methodology for exploring the genetics of rare diseases for which the current focus is on individually rare variations.  相似文献   

8.
Complex diseases such as cardiovascular disease are likely due to the effects of high-order interactions among multiple genes and demographic factors. Therefore, in order to understand their underlying biological mechanisms, we need to consider simultaneously the effects of genotypes across multiple loci. Statistical methods such as multifactor dimensionality reduction (MDR), the combinatorial partitioning method (CPM), recursive partitioning (RP), and patterning and recursive partitioning (PRP) are designed to uncover complex relationships without relying on a specific model for the interaction, and are therefore well-suited to this data setting. However, the theoretical overlap among these methods and their relative merits have not been well characterized. In this paper we demonstrate mathematically that MDR is a special case of RP in which (1) patterns are used as predictors (PRP), (2) tree growth is restricted to a single split, and (3) misclassification error is used as the measure of impurity. Both approaches are applied to a case-control study assessing the effect of eleven single nucleotide polymorphisms on coronary artery calcification in people at risk for cardiovascular disease.  相似文献   

9.
Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k th order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.  相似文献   

10.
We developed a computationally efficient algorithm AMBIENCE, for identifying the informative variables involved in gene-gene (GGI) and gene-environment interactions (GEI) that are associated with disease phenotypes. The AMBIENCE algorithm uses a novel information theoretic metric called phenotype-associated information (PAI) to search for combinations of genetic variants and environmental variables associated with the disease phenotype. The PAI-based AMBIENCE algorithm effectively and efficiently detected GEI in simulated data sets of varying size and complexity, including the 10K simulated rheumatoid arthritis data set from Genetic Analysis Workshop 15. The method was also successfully used to detect GGI in a Crohn's disease data set. The performance of the AMBIENCE algorithm was compared to the multifactor dimensionality reduction (MDR), generalized MDR (GMDR), and pedigree disequilibrium test (PDT) methods. Furthermore, we assessed the computational speed of AMBIENCE for detecting GGI and GEI for data sets varying in size from 100 to 10(5) variables. Our results demonstrate that the AMBIENCE information theoretic algorithm is useful for analyzing a diverse range of epidemiologic data sets containing evidence for GGI and GEI.  相似文献   

11.
Sabbagh A  Darlu P 《Human heredity》2006,62(3):119-134
OBJECTIVES: Selecting a maximally informative subset of polymorphisms to predict a clinical outcome, such as drug response, requires appropriate search methods due to the increased dimensionality associated with looking at multiple genotypes. In this study, we investigated the ability of several pattern recognition methods to identify the most informative markers in the CYP2D6 gene for the prediction of CYP2D6 metabolizer status. METHODS: Four data-mining tools were explored: decision trees, random forests, artificial neural networks, and the multifactor dimensionality reduction (MDR) method. Marker selection was performed separately in eight population samples of different ethnic origin to evaluate to what extent the most informative markers differ across ethnic groups. RESULTS: Our results show that the number of polymorphisms required to predict CYP2D6 metabolic phenotype with a high accuracy can be dramatically reduced owing to the strong haplotype block structure observed at CYP2D6. MDR and neural networks provided nearly identical results and performed the best. CONCLUSION: Data-mining methods, such as MDR and neural networks, appear as promising tools to improve the efficiency of genotyping tests in pharmacogenetics with the ultimate goal of pre-screening patients for individual therapy selection with minimum genotyping effort.  相似文献   

12.
MOTIVATION: The identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases is a challenging task in genetic association studies. The multifactor dimensionality reduction (MDR) method has been proposed and implemented by Ritchie et al. (2001) to identify the combinations of multilocus genotypes and discrete environmental factors that are associated with a particular disease. However, the original MDR method classifies the combination of multilocus genotypes into high-risk and low-risk groups in an ad hoc manner based on a simple comparison of the ratios of the number of cases and controls. Hence, the MDR approach is prone to false positive and negative errors when the ratio of the number of cases and controls in a combination of genotypes is similar to that in the entire data, or when both the number of cases and controls is small. Hence, we propose the odds ratio based multifactor dimensionality reduction (OR MDR) method that uses the odds ratio as a new quantitative measure of disease risk. RESULTS: While the original MDR method provides a simple binary measure of risk, the OR MDR method provides not only the odds ratio as a quantitative measure of risk but also the ordering of the multilocus combinations from the highest risk to lowest risk groups. Furthermore, the OR MDR method provides a confidence interval for the odds ratio for each multilocus combination, which is extremely informative in judging its importance as a risk factor. The proposed OR MDR method is illustrated using the dataset obtained from the CDC Chronic Fatigue Syndrome Research Group. AVAILABILITY: The program written in R is available.  相似文献   

13.
X-Y Lou 《Heredity》2015,114(3):255-261
Biological outcomes are governed by multiple genetic and environmental factors that act in concert. Determining multifactor interactions is the primary topic of interest in recent genetics studies but presents enormous statistical and mathematical challenges. The computationally efficient multifactor dimensionality reduction (MDR) approach has emerged as a promising tool for meeting these challenges. On the other hand, complex traits are expressed in various forms and have different data generation mechanisms that cannot be appropriately modeled by a dichotomous model; the subjects in a study may be recruited according to its own analytical goals, research strategies and resources available, not only consisting of homogeneous unrelated individuals. Although several modifications and extensions of MDR have in part addressed the practical problems, they are still limited in statistical analyses of diverse phenotypes, multivariate phenotypes and correlated observations, correcting for potential population stratification and unifying both unrelated and family samples into a more powerful analysis. I propose a comprehensive statistical framework, referred as to unified generalized MDR (UGMDR), for systematic extension of MDR. The proposed approach is quite versatile, not only allowing for covariate adjustment, being suitable for analyzing almost any trait type, for example, binary, count, continuous, polytomous, ordinal, time-to-onset, multivariate and others, as well as combinations of those, but also being applicable to various study designs, including homogeneous and admixed unrelated-subject and family as well as mixtures of them. The proposed UGMDR offers an important addition to the arsenal of analytical tools for identifying nonlinear multifactor interactions and unraveling the genetic architecture of complex traits.  相似文献   

14.
Roy J  Lin X 《Biometrics》2005,61(3):837-846
We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology.  相似文献   

15.
We present an extension of the two-class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of a quantitative trait. The proposed Quantitative MDR (QMDR) method handles continuous data by modifying MDR’s constructive induction algorithm to use a T-test. QMDR replaces the balanced accuracy metric with a T-test statistic as the score to determine the best interaction model. We used a simulation to identify the empirical distribution of QMDR’s testing score. We then applied QMDR to genetic data from the ongoing prospective Prevention of Renal and Vascular End-Stage Disease (PREVEND) study.  相似文献   

16.
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods—Gini, absolute probability difference (APD), and entropy—to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (< 0.05) compared to GS, APDS, ES (> 0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions.  相似文献   

17.
MOTIVATION: The identification and characterization of susceptibility genes that influence the risk of common and complex diseases remains a statistical and computational challenge in genetic association studies. This is partly because the effect of any single genetic variant for a common and complex disease may be dependent on other genetic variants (gene-gene interaction) and environmental factors (gene-environment interaction). To address this problem, the multifactor dimensionality reduction (MDR) method has been proposed by Ritchie et al. to detect gene-gene interactions or gene-environment interactions. The MDR method identifies polymorphism combinations associated with the common and complex multifactorial diseases by collapsing high-dimensional genetic factors into a single dimension. That is, the MDR method classifies the combination of multilocus genotypes into high-risk and low-risk groups based on a comparison of the ratios of the numbers of cases and controls. When a high-order interaction model is considered with multi-dimensional factors, however, there may be many sparse or empty cells in the contingency tables. The MDR method cannot classify an empty cell as high risk or low risk and leaves it as undetermined. RESULTS: In this article, we propose the log-linear model-based multifactor dimensionality reduction (LM MDR) method to improve the MDR in classifying sparse or empty cells. The LM MDR method estimates frequencies for empty cells from a parsimonious log-linear model so that they can be assigned to high-and low-risk groups. In addition, LM MDR includes MDR as a special case when the saturated log-linear model is fitted. Simulation studies show that the LM MDR method has greater power and smaller error rates than the MDR method. The LM MDR method is also compared with the MDR method using as an example sporadic Alzheimer's disease.  相似文献   

18.
Geometric estimates of heritability in biological shape   总被引:3,自引:0,他引:3  
The recently developed geometric morphometrics methods represent an important contribution of statistics and geometry to the study of biological shapes. We propose simple protocols using shape distances that incorporate geometric techniques into linear quantitative genetic models that should provide insights into the contribution of genetics to shape variation in organisms. The geometric approaches use Procrustes distances in a curved shape space and distances in tangent spaces within and among families to estimate shape heritability. We illustrate the protocols with an example of wing shape variation in the honeybee, Apis mellifera. The heritability of overall shape variation was small, but some localized components depicting shape changes on distal wing regions showed medium to large heritabilities. The genetic variance-covariance matrix of the geometric shape variables was significantly correlated with the phenotypic shape variance-covariance matrix. A comparison of the results of geometric methods with the traditional multivariate analysis of interlandmark distances indicated that even with a larger dimensionality, the interlandmark distances were not as rich in shape information as the landmark coordinates. Quantitative genetics studies of shape should greatly benefit from the application of geometric methods.  相似文献   

19.
The multifactor dimensionality reduction (MDR) is a model-free approach that can identify gene x gene or gene x environment effects in a case-control study. Here we explore several modifications of the MDR method. We extended MDR to provide model selection without crossvalidation, and use a chi-square statistic as an alternative to prediction error (PE). We also modified the permutation test to provide different levels of stringency. The extended MDR (EMDR) includes three permutation tests (fixed, non-fixed, and omnibus) to obtain p-values of multilocus models. The goal of this study was to compare the different approaches implemented in the EMDR method and evaluate the ability to identify genetic effects in the Genetic Analysis Workshop 14 simulated data. We used three replicates from the simulated family data, generating matched pairs from family triads. The results showed: 1) chi-square and PE statistics give nearly consistent results; 2) results of EMDR without cross-validation matched that of EMDR with 10-fold cross-validation; 3) the fixed permutation test reports false-positive results in data from loci unrelated to the disease, but the non-fixed and omnibus permutation tests perform well in preventing false positives, with the omnibus test being the most conservative. We conclude that the non-cross-validation test can provide accurate results with the advantage of high efficiency compared to 10-cross-validation, and the non-fixed permutation test provides a good compromise between power and false-positive rate.  相似文献   

20.
In this paper, we propose a frequentist model averaging method for quantile regression with high-dimensional covariates. Although research on these subjects has proliferated as separate approaches, no study has considered them in conjunction. Our method entails reducing the covariate dimensions through ranking the covariates based on marginal quantile utilities. The second step of our method implements model averaging on the models containing the covariates that survive the screening of the first step. We use a delete-one cross-validation method to select the model weights, and prove that the resultant estimator possesses an optimal asymptotic property uniformly over any compact (0,1) subset of the quantile indices. Our proof, which relies on empirical process theory, is arguably more challenging than proofs of similar results in other contexts owing to the high-dimensional nature of the problem and our relaxation of the conventional assumption of the weights summing to one. Our investigation of finite-sample performance demonstrates that the proposed method exhibits very favorable properties compared to the least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalized regression methods. The method is applied to a microarray gene expression data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号