首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Variance-component (VC) methods are flexible and powerful procedures for the mapping of genes that influence quantitative traits. However, traditional VC methods make the critical assumption that the quantitative-trait data within a family either follow or can be transformed to follow a multivariate normal distribution. Violation of the multivariate normality assumption can occur if trait data are censored at some threshold value. Trait censoring can arise in a variety of ways, including assay limitation or confounding due to medication. Valid linkage analyses of censored data require the development of a modified VC method that directly models the censoring event. Here, we present such a model, which we call the "tobit VC method." Using simulation studies, we compare and contrast the performance of the traditional and tobit VC methods for linkage analysis of censored trait data. For the simulation settings that we considered, our results suggest that (1) analyses of censored data by using the traditional VC method lead to severe bias in parameter estimates and a modest increase in false-positive linkage findings, (2) analyses with the tobit VC method lead to unbiased parameter estimates and type I error rates that reflect nominal levels, and (3) the tobit VC method has a modest increase in linkage power as compared with the traditional VC method. We also apply the tobit VC method to censored data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics study and provide two examples in which the tobit VC method yields noticeably different results as compared with the traditional method.  相似文献   

2.
Detection of linkage to genes for quantitative traits remains a challenging task. Recently, variance components (VC) techniques have emerged as among the more powerful of available methods. As often implemented, such techniques require assumptions about the phenotypic distribution. Usually, multivariate normality is assumed. However, several factors may lead to markedly nonnormal phenotypic data, including (a) the presence of a major gene (not necessarily linked to the markers under study), (b) some types of gene x environment interaction, (c) use of a dichotomous phenotype (i.e., affected vs. unaffected), (d) nonnormality of the population within-genotype (residual) distribution, and (e) selective (extreme) sampling. Using simulation, we have investigated, for sib-pair studies, the robustness of the likelihood-ratio test for a VC quantitative-trait locus-detection procedure to violations of normality that are due to these factors. Results showed (a) that some types of nonnormality, such as leptokurtosis, produced type I error rates in excess of the nominal, or alpha, levels whereas others did not; and (b) that the degree of type I error-rate inflation appears to be directly related to the residual sibling correlation. Potential solutions to this problem are discussed. Investigators contemplating use of this VC procedure are encouraged to provide evidence that their trait data are normally distributed, to employ a procedure that allows for nonnormal data, or to consider implementation of permutation tests.  相似文献   

3.
Variance-component methods are popular and flexible analytic tools for elucidating the genetic mechanisms of complex quantitative traits from pedigree data. However, variance-component methods typically assume that the trait of interest follows a multivariate normal distribution within a pedigree. Studies have shown that violation of this normality assumption can lead to biased parameter estimates and inflations in type-I error. This limits the application of variance-component methods to more general trait outcomes, whether continuous or categorical in nature. In this paper, we develop and apply a general variance-component framework for pedigree analysis of continuous and categorical outcomes. We develop appropriate models using generalized-linear mixed model theory and fit such models using approximate maximum-likelihood procedures. Using our proposed method, we demonstrate that one can perform variance-component pedigree analysis on outcomes that follow any exponential-family distribution. Additionally, we also show how one can modify the method to perform pedigree analysis of ordinal outcomes. We also discuss extensions of our variance-component framework to accommodate pedigrees ascertained based on trait outcome. We demonstrate the feasibility of our method using both simulated data and data from a genetic study of ovarian insufficiency.  相似文献   

4.
Effects of censoring on parameter estimates and power in genetic modeling.   总被引:5,自引:0,他引:5  
Genetic and environmental influences on variance in phenotypic traits may be estimated with normal theory Maximum Likelihood (ML). However, when the assumption of multivariate normality is not met, this method may result in biased parameter estimates and incorrect likelihood ratio tests. We simulated multivariate normal distributed twin data under the assumption of three different genetic models. Genetic model fitting was performed in six data sets: multivariate normal data, discrete uncensored data, censored data, square root transformed censored data, normal scores of censored data, and categorical data. Estimates were obtained with normal theory ML (data sets 1-5) and with categorical data analysis (data set 6). Statistical power was examined by fitting reduced models to the data. When fitting an ACE model to censored data, an unbiased estimate of the additive genetic effect was obtained. However, the common environmental effect was underestimated and the unique environmental effect was overestimated. Transformations did not remove this bias. When fitting an ADE model, the additive genetic effect was underestimated while the dominant and unique environmental effects were overestimated. In all models, the correct parameter estimates were recovered with categorical data analysis. However, with categorical data analysis, the statistical power decreased. The analysis of L-shaped distributed data with normal theory ML results in biased parameter estimates. Unbiased parameter estimates are obtained with categorical data analysis, but the power decreases.  相似文献   

5.
Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance components-based likelihood ratio testing may be affected when phenotypic data are non-normally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality. Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.  相似文献   

6.
Wu C  Li G  Zhu J  Cui Y 《PloS one》2011,6(9):e24902
Functional mapping has been a powerful tool in mapping quantitative trait loci (QTL) underlying dynamic traits of agricultural or biomedical interest. In functional mapping, multivariate normality is often assumed for the underlying data distribution, partially due to the ease of parameter estimation. The normality assumption however could be easily violated in real applications due to various reasons such as heavy tails or extreme observations. Departure from normality has negative effect on testing power and inference for QTL identification. In this work, we relax the normality assumption and propose a robust multivariate t-distribution mapping framework for QTL identification in functional mapping. Simulation studies show increased mapping power and precision with the t distribution than that of a normal distribution. The utility of the method is demonstrated through a real data analysis.  相似文献   

7.
Z Li  J M?tt?nen  M J Sillanp?? 《Heredity》2015,115(6):556-564
Linear regression-based quantitative trait loci/association mapping methods such as least squares commonly assume normality of residuals. In genetics studies of plants or animals, some quantitative traits may not follow normal distribution because the data include outlying observations or data that are collected from multiple sources, and in such cases the normal regression methods may lose some statistical power to detect quantitative trait loci. In this work, we propose a robust multiple-locus regression approach for analyzing multiple quantitative traits without normality assumption. In our method, the objective function is least absolute deviation (LAD), which corresponds to the assumption of multivariate Laplace distributed residual errors. This distribution has heavier tails than the normal distribution. In addition, we adopt a group LASSO penalty to produce shrinkage estimation of the marker effects and to describe the genetic correlation among phenotypes. Our LAD-LASSO approach is less sensitive to the outliers and is more appropriate for the analysis of data with skewedly distributed phenotypes. Another application of our robust approach is on missing phenotype problem in multiple-trait analysis, where the missing phenotype items can simply be filled with some extreme values, and be treated as outliers. The efficiency of the LAD-LASSO approach is illustrated on both simulated and real data sets.  相似文献   

8.
Species distributional or trait data based on range map (extent‐of‐occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix.  相似文献   

9.
Evaluating trait correlations across species within a lineage via phylogenetic regression is fundamental to comparative evolutionary biology, but when traits of interest are derived from two sets of lineages that coevolve with one another, methods for evaluating such patterns in a dual‐phylogenetic context remain underdeveloped. Here, we extend multivariate permutation‐based phylogenetic regression to evaluate trait correlations in two sets of interacting species while accounting for their respective phylogenies. This extension is appropriate for both univariate and multivariate response data, and may use one or more independent variables, including environmental covariates. Imperfect correspondence between species in the interacting lineages can also be accommodated, such as when species in one lineage associate with multiple species in the other, or when there are unmatched taxa in one or both lineages. For both univariate and multivariate data, the method displays appropriate type I error, and statistical power increases with the strength of the trait covariation and the number of species in the phylogeny. These properties are retained even when there is not a 1:1 correspondence between lineages. Finally, we demonstrate the approach by evaluating the evolutionary correlation between traits in fig species and traits in their agaonid wasp pollinators. R computer code is provided.  相似文献   

10.
Path analysis is one of several methods available for quantitative genetic analysis, providing for both tests of hypotheses and estimates of relevant parameters. Central to the theory is the assumption that the observations follow a multivariate normal distribution within families. The purpose of the present investigation is to assess the effects of a certain type of departures from multivariate normality using quantitative family data on lipid and lipoprotein levels. The results show that even large departures produce reasonably unbiased parameter estimates. Whereas moderate departures lead to few inferential errors in hypothesis testing, gross departures from multivariate normality may have considerable effects on likelihood ratio tests.  相似文献   

11.
Quantitative traits analyzed in Genome-Wide Association Studies (GWAS) are often nonnormally distributed. For such traits, association tests based on standard linear regression are subject to reduced power and inflated type I error in finite samples. Applying the rank-based inverse normal transformation (INT) to nonnormally distributed traits has become common practice in GWAS. However, the different variations on INT-based association testing have not been formally defined, and guidance is lacking on when to use which approach. In this paper, we formally define and systematically compare the direct (D-INT) and indirect (I-INT) INT-based association tests. We discuss their assumptions, underlying generative models, and connections. We demonstrate that the relative powers of D-INT and I-INT depend on the underlying data generating process. Since neither approach is uniformly most powerful, we combine them into an adaptive omnibus test (O-INT). O-INT is robust to model misspecification, protects the type I error, and is well powered against a wide range of nonnormally distributed traits. Extensive simulations were conducted to examine the finite sample operating characteristics of these tests. Our results demonstrate that, for nonnormally distributed traits, INT-based tests outperform the standard untransformed association test, both in terms of power and type I error rate control. We apply the proposed methods to GWAS of spirometry traits in the UK Biobank. O-INT has been implemented in the R package RNOmni , which is available on CRAN.  相似文献   

12.
We describe a variance-components method for multipoint linkage analysis that allows joint consideration of a discrete trait and a correlated continuous biological marker (e.g., a disease precursor or associated risk factor) in pedigrees of arbitrary size and complexity. The continuous trait is assumed to be multivariate normally distributed within pedigrees, and the discrete trait is modeled by a threshold process acting on an underlying multivariate normal liability distribution. The liability is allowed to be correlated with the quantitative trait, and the liability and quantitative phenotype may each include covariate effects. Bivariate discrete-continuous observations will be common, but the method easily accommodates qualitative and quantitative phenotypes that are themselves multivariate. Formal likelihood-based tests are described for coincident linkage (i.e., linkage of the traits to distinct quantitative-trait loci [QTLs] that happen to be linked) and pleiotropy (i.e., the same QTL influences both discrete-trait status and the correlated continuous phenotype). The properties of the method are demonstrated by use of simulated data from Genetic Analysis Workshop 10. In a companion paper, the method is applied to data from the Collaborative Study on the Genetics of Alcoholism, in a bivariate linkage analysis of alcoholism diagnoses and P300 amplitude of event-related brain potentials.  相似文献   

13.
We present a new method of quantitative-trait linkage analysis that combines the simplicity and robustness of regression-based methods and the generality and greater power of variance-components models. The new method is based on a regression of estimated identity-by-descent (IBD) sharing between relative pairs on the squared sums and squared differences of trait values of the relative pairs. The method is applicable to pedigrees of arbitrary structure and to pedigrees selected on the basis of trait value, provided that population parameters of the trait distribution can be correctly specified. Ambiguous IBD sharing (due to incomplete marker information) can be accommodated in the method by appropriate specification of the variance-covariance matrix of IBD sharing between relative pairs. We have implemented this regression-based method and have performed simulation studies to assess, under a range of conditions, estimation accuracy, type I error rate, and power. For normally distributed traits and in large samples, the method is found to give the correct type I error rate and an unbiased estimate of the proportion of trait variance accounted for by the additive effects of the locus-although, in cases where asymptotic theory is doubtful, significance levels should be checked by simulations. In large sibships, the new method is slightly more powerful than variance-components models. The proposed method provides a practical and powerful tool for the linkage analysis of quantitative traits.  相似文献   

14.
Modeling the joint distribution of a binary trait (disease) within families is a tedious challenge, owing to the lack of a general statistical model with desirable properties such as the multivariate Gaussian model for a quantitative trait. Models have been proposed that either assume the existence of an underlying liability variable, the reality of which cannot be checked, or provide estimates of aggregation parameters that are dependent on the ordering of family members and on family size. We describe how a class of copula models for the analysis of exchangeable categorical data can be incorporated into a familial framework. In this class of models, the joint distribution of binary outcomes is characterized by a function of the given marginals. This function, referred to as a "copula," depends on an aggregation parameter that is weakly dependent on the marginal distributions. We propose to decompose a nuclear family into two sets of equicorrelated data (parents and offspring), each of which is characterized by an aggregation parameter (alphaFM and alphaSS, respectively). The marginal probabilities are modeled through a logistic representation. The advantage of this model is that it provides estimates of the aggregation parameters that are independent of family size and does not require any arbitrary ordering of sibs. It can be incorporated easily into segregation or combined segregation-linkage analysis and does not require extensive computer time. As an illustration, we applied this model to a combined segregation-linkage analysis of levels of plasma angiotensin I-converting enzyme (ACE) dichotomized into two classes according to the median. The conclusions of this analysis were very similar to those we had reported in an earlier familial analysis of quantitative ACE levels.  相似文献   

15.
ABSTRACT: BACKGROUND: Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software. RESULTS: Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of Drosophila dataset illustrates how the MTMIM method can better extract information from datasets with measurements in multiple traits. CONCLUSIONS: The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits.  相似文献   

16.
Feenstra B  Skovgaard IM  Broman KW 《Genetics》2006,173(4):2269-2282
The Haley-Knott (HK) regression method continues to be a popular approximation to standard interval mapping (IM) of quantitative trait loci (QTL) in experimental crosses. The HK method is favored for its dramatic reduction in computation time compared to the IM method, something that is particularly important in simultaneous searches for multiple interacting QTL. While the HK method often approximates the IM method well in estimating QTL effects and in power to detect QTL, it may perform poorly if, for example, there is strong epistasis between QTL or if QTL are linked. Also, it is well known that the estimation of the residual variance by the HK method is biased. Here, we present an extension of the HK method that uses estimating equations based on both means and variances. For normally distributed phenotypes this estimating equation (EE) method is more efficient than the HK method. Furthermore, computer simulations show that the EE method performs well for very different genetic models and data set structures, including nonnormal phenotype distributions, nonrandom missing data patterns, varying degrees of epistasis, and varying degrees of linkage between QTL. The EE method retains key qualities of the HK method such as computational speed and robustness against nonnormal phenotype distributions, while approximating the IM method better in terms of accuracy and precision of parameter estimates and power to detect QTL.  相似文献   

17.
Studies of evolutionary correlations commonly use phylogenetic regression (i.e., independent contrasts and phylogenetic generalized least squares) to assess trait covariation in a phylogenetic context. However, while this approach is appropriate for evaluating trends in one or a few traits, it is incapable of assessing patterns in highly multivariate data, as the large number of variables relative to sample size prohibits parametric test statistics from being computed. This poses serious limitations for comparative biologists, who must either simplify how they quantify phenotypic traits, or alter the biological hypotheses they wish to examine. In this article, I propose a new statistical procedure for performing ANOVA and regression models in a phylogenetic context that can accommodate high‐dimensional datasets. The approach is derived from the statistical equivalency between parametric methods using covariance matrices and methods based on distance matrices. Using simulations under Brownian motion, I show that the method displays appropriate Type I error rates and statistical power, whereas standard parametric procedures have decreasing power as data dimensionality increases. As such, the new procedure provides a useful means of assessing trait covariation across a set of taxa related by a phylogeny, enabling macroevolutionary biologists to test hypotheses of adaptation, and phenotypic change in high‐dimensional datasets.  相似文献   

18.
The variance-components model is the method of choice for mapping quantitative trait loci in general human pedigrees. This model assumes normally distributed trait values and includes a major gene effect, random polygenic and environmental effects, and covariate effects. Violation of the normality assumption has detrimental effects on the type I error and power. One possible way of achieving normality is to transform trait values. The true transformation is unknown in practice, and different transformations may yield conflicting results. In addition, the commonly used transformations are ineffective in dealing with outlying trait values. We propose a novel extension of the variance-components model that allows the true transformation function to be completely unspecified. We present efficient likelihood-based procedures to estimate variance components and to test for genetic linkage. Simulation studies demonstrated that the new method is as powerful as the existing variance-components methods when the normality assumption holds; when the normality assumption fails, the new method still provides accurate control of type I error and is substantially more powerful than the existing methods. We performed a genomewide scan of monoamine oxidase B for the Collaborative Study on the Genetics of Alcoholism. In that study, the results that are based on the existing variance-components method changed dramatically when three outlying trait values were excluded from the analysis, whereas our method yielded essentially the same answers with or without those three outliers. The computer program that implements the new method is freely available.  相似文献   

19.
The Haseman-Elston (HE) regression method offers a mathematically and computationally simpler alternative to variance-components (VC) models for the linkage analysis of quantitative traits. However, current versions of HE regression and VC models are not optimised for binary traits. Here, we present a modified HE regression and a liability-threshold VC model for binary-traits. The new HE method is based on the regression of a linear combination of the trait squares and the trait cross-product on the proportion of alleles identical by descent (IBD) at the putative locus, for sibling pairs. We have implemented both the new HE regression-based method and have performed analytic and simulation studies to assess its type 1 error rate and power under a range of conditions. These studies showed that the new HE method is well-behaved under the null hypothesis in large samples, is more powerful than both the original and the revisited HE methods, and is approximately equivalent in power to the liability-threshold VC model.  相似文献   

20.
Yue Wei  Yi Liu  Tao Sun  Wei Chen  Ying Ding 《Biometrics》2020,76(2):619-629
Several gene-based association tests for time-to-event traits have been proposed recently to detect whether a gene region (containing multiple variants), as a set, is associated with the survival outcome. However, for bivariate survival outcomes, to the best of our knowledge, there is no statistical method that can be directly applied for gene-based association analysis. Motivated by a genetic study to discover the gene regions associated with the progression of a bilateral eye disease, age-related macular degeneration (AMD), we implement a novel functional regression (FR) method under the copula framework. Specifically, the effects of variants within a gene region are modeled through a functional linear model, which then contributes to the marginal survival functions within the copula. Generalized score test statistics are derived to test for the association between bivariate survival traits and the genetic region. Extensive simulation studies are conducted to evaluate the type I error control and power performance of the proposed approach, with comparisons to several existing methods for a single survival trait, as well as the marginal Cox FR model using the robust sandwich estimator for bivariate survival traits. Finally, we apply our method to a large AMD study, the Age-related Eye Disease Study, and to identify the gene regions that are associated with AMD progression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号