首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 19 毫秒
1.
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.  相似文献   

2.
Reese  Randall  Fu  Guifang  Zhao  Geran  Dai  Xiaotian  Li  Xiaotian  Chiu  Kenneth 《Statistics in biosciences》2022,14(3):514-532
Statistics in Biosciences - Selecting influential non-linear interactive features from ultrahigh-dimensional data has been an important task in various fields. However, statistical accuracy and...  相似文献   

3.
The maximal linear predictable combination of a set of dependent variables is defined as that linear combination maximizing the multiple correlation coefficient with the predictor set. It allows the relative importance of a number of factors to be evaluated for the joint response, rather than for the response of each dependent variable in turn. The procedure is illustrated by an example. AMS subject classification: major 62J10, 62H20; minor 62H25.  相似文献   

4.
In this paper some new, exactly distribution-free tests are offered for the hypothesis about the slope parameter in one-sample, two-sample, and several-sample simple linear regression problems. Asymptotic relative efficiencies of these test procedures are also studied.  相似文献   

5.
6.
Percentage is widely used to describe different results in food microbiology, e.g., probability of microbial growth, percent inactivated, and percent of positive samples. Four sets of percentage data, percent-growth-positive, germination extent, probability for one cell to grow, and maximum fraction of positive tubes, were obtained from our own experiments and the literature. These data were modeled using linear and logistic regression. Five methods were used to compare the goodness of fit of the two models: percentage of predictions closer to observations, range of the differences (predicted value minus observed value), deviation of the model, linear regression between the observed and predicted values, and bias and accuracy factors. Logistic regression was a better predictor of at least 78% of the observations in all four data sets. In all cases, the deviation of logistic models was much smaller. The linear correlation between observations and logistic predictions was always stronger. Validation (accomplished using part of one data set) also demonstrated that the logistic model was more accurate in predicting new data points. Bias and accuracy factors were found to be less informative when evaluating models developed for percentage data, since neither of these indices can compare predictions at zero. Model simplification for the logistic model was demonstrated with one data set. The simplified model was as powerful in making predictions as the full linear model, and it also gave clearer insight in determining the key experimental factors.  相似文献   

7.
In genomic research phenotype transformations are commonly used as a straightforward way to reach normality of the model outcome. Many researchers still believe it to be necessary for proper inference. Using regression simulations, we show that phenotype transformations are typically not needed and, when used in phenotype with heteroscedasticity, result in inflated Type I error rates. We further explain that important is to address a combination of rare variant genotypes and heteroscedasticity. Incorrectly estimated parameter variability or incorrect choice of the distribution of the underlying test statistic provide spurious detection of associations. We conclude that it is a combination of heteroscedasticity, minor allele frequency, sample size, and to a much lesser extent the error distribution, that matter for proper statistical inference.  相似文献   

8.
In this article, we have considered two families of predictors for the simultaneous prediction of actual and average values of study variable in a linear regression model when a set of stochastic linear constraints binding the regression coefficients is available. These families arise from the method of mixed regression estimation. Performance properties of these families are analyzed when the objective is to predict values outside the sample and within the sample.  相似文献   

9.
10.
Regression tree analysis, a non-parametric method, was undertaken to identify predictors of the serum concentration of polychlorinated biphenyls (sum of marker PCB 1 ABBREVIATIONS: BMI: body-mass index, CV: cross validation, ln: natural logarithm, ns: not significant, PCAHs: polychlorinated aromatic hydrocarbons, PCBs: polychlorinated biphenyls, R2 a: adjusted coefficient of determination, VIF: variance inflation factor. View all notes 138, 153, and 180) in humans. This method was applied on biomonitoring data of the Flemish Environment and Health study (2002–2006) and included 1679 adolescents and 1583 adults. Potential predictor variables were collected via a self-administered questionnaire, assessing information on lifestyle, food intake, use of tobacco and alcohol, residence history, health, education, hobbies, and occupation. Relevant predictors of human PCB exposure were identified with regression tree analysis using ln-transformed sum of PCBs, separately in adolescents and adults. The obtained results were compared with those from a standard linear regression approach. The results of the non-parametric analysis confirm the selection of the covariates in the multiple regression models. In both analyses, blood fat, gender, age, body-mass index (BMI) or change in bodyweight, former breast-feeding, and a number of nutritional factors were identified as statistically significant predictors in the serum PCB concentration, either in adolescents, in adults or in both. Regression trees can be used as an explorative analysis in combination with multiple linear regression models, where relationships between the determinants and the biomarkers can be quantified.  相似文献   

11.
When the explanatory variables of a linear model are split into two groups, two notions of collinearity are defined: a collinearity between the variables of each group, of which the mean is called residual collinearity, and a collinearity between the two groups called explained collinearity. Canonical correlation analysis provides information about the collinearity: large canonical correlation coefficients correspond to some small eigenvalues and eigenvectors of the correlation matrix and characterise the explained collinearity. Other small eigenvalues of this matrix correspond to the residual collinearity. A selection of predictors can be performed from the canonical correlation variables, according to their partial correlation coefficient with the explained variable. In the proposed application, the results obtained by the selection of canonical variables are better than those given by classical regression and by principal component regression.  相似文献   

12.
A major goal of human genetics is to elucidate the genetic architecture of human disease, with the goal of fueling improvements in diagnosis and the understanding of disease pathogenesis. The degree to which epistasis, or non-additive effects of risk alleles at different loci, accounts for common disease traits is hotly debated, in part because the conditions under which epistasis evolves are not well understood. Using both theory and evolutionary simulation, we show that the occurrence of common diseases (i.e. unfit phenotypes with frequencies on the order of 1%) can, under the right circumstances, be expected to be driven primarily by synergistic epistatic interactions. Conditions that are necessary, collectively, for this outcome include a strongly non-linear phenotypic landscape, strong (but not too strong) selection against the disease phenotype, and “noise” in the genotype-phenotype map that is both environmental (extrinsic, time-correlated) and developmental (intrinsic, uncorrelated) and, in both cases, neither too little nor too great. These results suggest ways in which geneticists might identify, a priori, those disease traits for which an “epistatic explanation” should be sought, and in the process better focus ongoing searches for risk alleles.  相似文献   

13.
Many human diseases are attributable to complex interactions among genetic and environmental factors. Statistical tools capable of modeling such complex interactions are necessary to improve identification of genetic factors that increase a patient''s risk of disease. Logic Forest (LF), a bagging ensemble algorithm based on logic regression (LR), is able to discover interactions among binary variables predictive of response such as the biologic interactions that predispose individuals to disease. However, LF''s ability to recover interactions degrades for more infrequently occurring interactions. A rare genetic interaction may occur if, for example, the interaction increases disease risk in a patient subpopulation that represents only a small proportion of the overall patient population. We present an alternative ensemble adaptation of LR based on boosting rather than bagging called LBoost. We compare the ability of LBoost and LF to identify variable interactions in simulation studies. Results indicate that LBoost is superior to LF for identifying genetic interactions associated with disease that are infrequent in the population. We apply LBoost to a subset of single nucleotide polymorphisms on the PRDX genes from the Cancer Genetic Markers of Susceptibility Breast Cancer Scan to investigate genetic risk for breast cancer. LBoost is publicly available on CRAN as part of the LogicForest package, http://cran.r-project.org/.  相似文献   

14.
Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads.  相似文献   

15.
Inbreeding depression is a topic of main interest in experimental and domestic species, although previous studies simplified this genetically complex effect to the linear (or quadratic) regression coefficient linked to the inbreeding coefficient of each individual or, in more recent studies, to founder-specific inbreeding coefficients. Going beyond generalizing to these traditional scenarios, our research focused on the analysis of gene-by-gene interactions leading to epistasis for inbreeding depression effects. Under a Bayesian context, inbreeding depression effects were evaluated for weaning weight (WW) in a commercial rabbit population founded from 4 bucks and 1 doe (MARET population). Founder-specific inbreeding depression effects for the 4 bucks ranged between -81.1 and 38.3 g for each 1% inbreeding. More interestingly, 2 epistatic interactions between the partial inbreeding coefficient of 2 bucks were also significant and negative, showing a -1.9 and -1.0 g reduction on WW. These results provide the first evidence of epistatic inbreeding depression phenomena in domestic species, emphasizing the complexity of the genetic architecture in mammals.  相似文献   

16.
In classical linear regression model for analysis of medical data concerning hepatic extraction of insulin and c-peptide the fundamental assumption is that the subjects involved are of a similar nature. In reality, if this assumption is violated then the precision of the results is questionable. This paper suggests a robust alternative to overcome this problem. It is observed that the robustification may be a better option to determine an optimal quantity of insulin which minimizes the risk of damage associated with diabatic treatments.  相似文献   

17.
The widespread availability of high-throughput genotyping technology has opened the door to the era of personal genetics, which brings to consumers the promise of using genetic variations to predict individual susceptibility to common diseases. Despite easy access to commercial personal genetics services, our knowledge of the genetic architecture of common diseases is still very limited and has not yet fulfilled the promise of accurately predicting most people at risk. This is partly because of the complexity of the mapping relationship between genotype and phenotype that is a consequence of epistasis (gene-gene interaction) and other phenomena such as gene-environment interaction and locus heterogeneity. Unfortunately, these aspects of genetic architecture have not been addressed in most of the genetic association studies that provide the knowledge base for interpreting large-scale genetic association results. We provide here an introductory review of how epistasis can affect human health and disease and how it can be detected in population-based studies. We provide some thoughts on the implications of epistasis for personal genetics and some recommendations for improving personal genetics in light of this complexity.  相似文献   

18.
Contemporary genetic studies are revealing the genetic complexity of many traits in humans and model organisms. Two hallmarks of this complexity are epistasis, meaning gene-gene interaction, and pleiotropy, in which one gene affects multiple phenotypes. Understanding the genetic architecture of complex traits requires addressing these phenomena, but interpreting the biological significance of epistasis and pleiotropy is often difficult. While epistasis reveals dependencies between genetic variants, it is often unclear how the activity of one variant is specifically modifying the other. Epistasis found in one phenotypic context may disappear in another context, rendering the genetic interaction ambiguous. Pleiotropy can suggest either redundant phenotype measures or gene variants that affect multiple biological processes. Here we present an R package, R/cape, which addresses these interpretation ambiguities by implementing a novel method to generate predictive and interpretable genetic networks that influence quantitative phenotypes. R/cape integrates information from multiple related phenotypes to constrain models of epistasis, thereby enhancing the detection of interactions that simultaneously describe all phenotypes. The networks inferred by R/cape are readily interpretable in terms of directed influences that indicate suppressive and enhancing effects of individual genetic variants on other variants, which in turn account for the variance in quantitative traits. We demonstrate the utility of R/cape by analyzing a mouse backcross, thereby discovering novel epistatic interactions influencing phenotypes related to obesity and diabetes. R/cape is an easy-to-use, platform-independent R package and can be applied to data from both genetic screens and a variety of segregating populations including backcrosses, intercrosses, and natural populations. The package is freely available under the GPL-3 license at http://cran.r-project.org/web/packages/cape.
This is a PLOS Computational Biology Software Article
  相似文献   

19.
In this study, we are interested in the problem of estimating the parameters in a nonlinear regression model when the error terms are correlated. Throughout this work, we restrict ourselves to the special case when the error terms follow a pth order stationary autoregressive model (AR(p)). Following the idea of LAWTON and SYLVESTRE (1971) and GALLANT and GOEBEL (1976), a parameter-elimination method is proposed, which has the advantages that it is not sensitive to the initial values and convergence of the procedure may be more stable because of the reduced dimension of the problem. The parameter-elimination method is compared with the methods by GALLANT and GOEBEL (1976) and GLASBEY (1980) by Monte Carlo Simulation, and the results of applying the first two methods to the real data obtained from the Environmental Protection Administration of the Executive Yuan of the Republic of China are presented.  相似文献   

20.
The weights used in iterative weighted least squares (IWLS) regression are usually estimated parametrically using a working model for the error variance. When the variance function is misspecified, the IWLS estimates of the regression coefficients β are still asymptotically consistent but there is some loss in efficiency. Since second moments can be quite hard to model, it makes sense to estimate the error variances nonparametrically and to employ weights inversely proportional to the estimated variances in computing the WLS estimate for β. Surprisingly, this approach had not received much attention in the literature. The aim of this note is to demonstrate that such a procedure can be implemented easily in S-plus using standard functions with default options making it suitable for routine applications. The particular smoothing method that we use is local polynomial regression applied to the logarithm of the squared residuals but other smoothers can be tried as well. The proposed procedure is applied to data on the use of two different assay methods for a hormone. Efficiency calculations based on the estimated model show that the nonparametric IWLS estimates are more efficient than the parametric IWLS estimates based on three different plausible working models for the variance function. The proposed estimators also perform well in a simulation study using both parametric and nonparametric variance functions as well as normal and gamma errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号