首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
MIXED MODEL APPROACHES FOR ESTIMATING GENETIC VARIANCES AND COVARIANCES   总被引:62,自引:4,他引:58  
The limitations of methods for analysis of variance(ANOVA)in estimating genetic variances are discussed. Among the three methods(maximum likelihood ML, restricted maximum likelihood REML, and minimum norm quadratic unbiased estimation MINQUE)for mixed linear models, MINQUE method is presented with formulae for estimating variance components and covariances components and for predicting genetic effects. Several genetic models, which cannot be appropriately analyzed by ANOVA methods, are introduced in forms of mixed linear models. Genetic models with independent random effects can be analyzed by MINQUE(1)method whieh is a MINQUE method with all prior values setting 1. MINQUE(1)method can give unbiased estimation for variance components and covariance components, and linear unbiased prediction (LUP) for genetic effects. There are more complicate genetic models for plant seeds which involve correlated random effects. MINQUE(0/1)method, which is a MINQUE method with all prior covariances setting 0 and all prior variances setting 1, is suitable for estimating variance and covariance components in these models. Mixed model approaches have advantage over ANOVA methods for the capacity of analyzing unbalanced data and complicated models. Some problems about estimation and hypothesis test by MINQUE method are discussed.  相似文献   

2.
A Forcina 《Biometrics》1992,48(3):743-750
For linear models, assuming a within-experimental-units covariance structure that incorporates errors of measurement, serial correlation, and variation between units, results on explicit estimation of regression parameters are used to simplify maximum likelihood estimation of covariance parameters. The use of an analysis of variance table as a simpler alternative to likelihood inference is illustrated with two examples.  相似文献   

3.
Analyses of craniodental measurement data from 15 wild-collected population samples of the Neotropical muroid rodent genus Zygodontomys reveal consistent patterns of relative variability and correlation that suggest a common latent structure. Eigenanalysis of each sample covariance matrix of logarithms yields a first principal component that accounts for a large fraction of the total variance. Variances of subsequent sample principal components are much smaller, and the results of bootstrap resampling together with asymptotic statistics suggest that characteristic roots of the covariance matrix after the first are seldom distinct. The coefficients of normalized first principal components are strikingly similar from sample to sample: inner products of these vectors reveal an average between-sample correlation of 0.989, and the mean angle of divergence is only about eight degrees. Since first principal component coefficients identify the same contrasts among variables as comparisons of relative variability and correlation, we conclude that a single factor accounts for most of the common latent determination of these sample dispersions. Analyses of variance based on toothwear (a coarse index of age) and sex in the wild-collected samples, and on known age and sex in a captive-bred population, reveal that specimen scores on sample first principal components are age- and sex-dependent; residual sample dispersion, however, is essentially unaffected by age, sex, or age × sex interaction. The sample first principal component therefore reflects the covariance among measured dimensions induced by general growth, and its coefficients are interpretable as exponents of postnatal growth allometry. Path-analytic models that incorporate prior knowledge of the equivalent allometric effects of general growth within these samples can be used to decompose the between-sample variance by factors corresponding to other ontogenetic mechanisms of form change. The genetic or environmental determinants of differences in sample mean phenotypes induced by such mechanisms, however, can be demonstrated only by experiment.  相似文献   

4.
Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software.  相似文献   

5.
Growing interest in adaptive evolution in natural populations has spurred efforts to infer genetic components of variance and covariance of quantitative characters. Here, I review difficulties inherent in the usual least-squares methods of estimation. A useful alternative approach is that of maximum likelihood (ML). Its particular advantage over least squares is that estimation and testing procedures are well defined, regardless of the design of the data. A modified version of ML, REML, eliminates the bias of ML estimates of variance components. Expressions for the expected bias and variance of estimates obtained from balanced, fully hierarchical designs are presented for ML and REML. Analyses of data simulated from balanced, hierarchical designs reveal differences in the properties of ML, REML, and F-ratio tests of significance. A second simulation study compares properties of REML estimates obtained from a balanced, fully hierarchical design (within-generation analysis) with those from a sampling design including phenotypic data on parents and multiple progeny. It also illustrates the effects of imposing nonnegativity constraints on the estimates. Finally, it reveals that predictions of the behavior of significance tests based on asymptotic theory are not accurate when sample size is small and that constraining the estimates seriously affects properties of the tests. Because of their great flexibility, likelihood methods can serve as a useful tool for estimation of quantitative-genetic parameters in natural populations. Difficulties involved in hypothesis testing remain to be solved.  相似文献   

6.

Background

Estimation of genetic covariance matrices for multivariate problems comprising more than a few traits is inherently problematic, since sampling variation increases dramatically with the number of traits. This paper investigates the efficacy of regularized estimation of covariance components in a maximum likelihood framework, imposing a penalty on the likelihood designed to reduce sampling variation. In particular, penalties that "borrow strength" from the phenotypic covariance matrix are considered.

Methods

An extensive simulation study was carried out to investigate the reduction in average ''loss'', i.e. the deviation in estimated matrices from the population values, and the accompanying bias for a range of parameter values and sample sizes. A number of penalties are examined, penalizing either the canonical eigenvalues or the genetic covariance or correlation matrices. In addition, several strategies to determine the amount of penalization to be applied, i.e. to estimate the appropriate tuning factor, are explored.

Results

It is shown that substantial reductions in loss for estimates of genetic covariance can be achieved for small to moderate sample sizes. While no penalty performed best overall, penalizing the variance among the estimated canonical eigenvalues on the logarithmic scale or shrinking the genetic towards the phenotypic correlation matrix appeared most advantageous. Estimating the tuning factor using cross-validation resulted in a loss reduction 10 to 15% less than that obtained if population values were known. Applying a mild penalty, chosen so that the deviation in likelihood from the maximum was non-significant, performed as well if not better than cross-validation and can be recommended as a pragmatic strategy.

Conclusions

Penalized maximum likelihood estimation provides the means to ''make the most'' of limited and precious data and facilitates more stable estimation for multi-dimensional analyses. It should become part of our everyday toolkit for multivariate estimation in quantitative genetics.  相似文献   

7.
Summary Do morphogenetic processes cause common patterns of phenotypic covariation, and do those patterns evolve over microevolutionary timescales? Evolution of molar shape variance–covariance (P) matrixes was studied in five populations of the common shrew, Sorex araneus. P matrix evolution was assessed using matrix correlation, matrix disparity, and common principal component analysis (CPCA). Significant changes in covariance structure were found among the populations, but the differences were small. A computer model was used to estimate the theoretical covariance introduced into the phenotype by developmental interactions. Molar developmental processes explained some of the covariance in the shrew samples, especially as measured by matrix correlation, but the proportion was relatively small. Developmental principal components (PCs) were only infrequently associable with common principal components. The results suggest that molar shape P matrixes can evolve quickly in a manner only loosely constrained by development, and that their shared covariance is probably dominated by factors more proximate than development. Rarefaction showed that sample size severely affected P comparisons when n < 15 for matrix correlation and disparity, and when n < 30 for CPCA. Among CPCA evaluation criteria, Akaike Information Criterion performed better than jump‐up at n < 30, but worse at n > 30.  相似文献   

8.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

9.
Variance component (VC) approaches based on restricted maximum likelihood (REML) have been used as an attractive method for positioning of quantitative trait loci (QTL). Linkage disequilibrium (LD) information can be easily implemented in the covariance structure among QTL effects (e.g. genotype relationship matrix) and mapping resolution appears to be high. Because of the use of LD information, the covariance structure becomes much richer and denser compared to the use of linkage information alone. This makes an average information (AI) REML algorithm based on mixed model equations and sparse matrix techniques less useful. In addition, (near-) singularity problems often occur with high marker densities, which is common in fine-mapping, causing numerical problems in AIREML based on mixed model equations. The present study investigates the direct use of the variance covariance matrix of all observations in AIREML for LD mapping with a general complex pedigree. The method presented is more efficient than the usual approach based on mixed model equations and robust to numerical problems caused by near-singularity due to closely linked markers. It is also feasible to fit multiple QTL simultaneously in the proposed method whereas this would drastically increase computing time when using mixed model equation-based methods.  相似文献   

10.
Principal component analysis is a widely used ''dimension reduction'' technique, albeit generally at a phenotypic level. It is shown that we can estimate genetic principal components directly through a simple reparameterisation of the usual linear, mixed model. This is applicable to any analysis fitting multiple, correlated genetic effects, whether effects for individual traits or sets of random regression coefficients to model trajectories. Depending on the magnitude of genetic correlation, a subset of the principal component generally suffices to capture the bulk of genetic variation. Corresponding estimates of genetic covariance matrices are more parsimonious, have reduced rank and are smoothed, with the number of parameters required to model the dispersion structure reduced from k(k + 1)/2 to m(2k - m + 1)/2 for k effects and m principal components. Estimation of these parameters, the largest eigenvalues and pertaining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of the likelihood, is described. It is shown that reduced rank estimation can reduce computational requirements of multivariate analyses substantially. An application to the analysis of eight traits recorded via live ultrasound scanning of beef cattle is given.  相似文献   

11.
Benthic invertebrate data from thirty-nine lakes in south-central Ontario were analyzed to determine the effect of choosing particular data standardizations, resemblance measures, and ordination methods on the resultant multivariate summaries. Logarithmic-transformed, 0–1 scaled, and ranked data were used as standardized variables with resemblance measures of Bray-Curtis, Euclidean distance, cosine distance, correlation, covariance and chi-squared distance. Combinations of these measures and standardizations were used in principal components analysis, principal coordinates analysis, non-metric multidimensional scaling, correspondence analysis, and detrended correspondence analysis. Correspondence analysis and principal components analysis using a correlation coefficient provided the most consistent results irrespective of the choice in data standardization. Other approaches using detrended correspondence analysis, principal components analysis, principal coordinates analysis, and non-metric multidimensional scaling provided less consistent results. These latter three methods produced similar results when the abundance data were replaced with ranks or standardized to a 0–1 range. The log-transformed data produced the least consistent results, whereas ranked data were most consistent. Resemblance measures such as the Bray-Curtis and correlation coefficient provided more consistent solutions than measures such as Euclidean distance or the covariance matrix when different data standardizations were used. The cosine distance based on standardized data provided results comparable to the CA and DCA solutions. Overall, CA proved most robust as it demonstrated high consistency irrespective of the data standardizations. The strong influence of data standardization on the other ordination methods emphasizes the importance of this frequently neglected stage of data analysis.  相似文献   

12.
人类群体遗传结构的协方差阵主成分分析方法   总被引:3,自引:0,他引:3  
目的:探讨基因频率矩阵的中心化(或均值化)协方差阵主成分分析方法在人类群体遗传结构研究中的适用性和合理性。方法:从基因频率矩阵的结构特征入手,分析中心化、均值化协方差阵主成分分析与标准化相关阵主成分分析在特征根、特征向量以及降维效果等方面的差异,并通过实例比较不同方法在解释群体遗传结构特征上合理性。结果:中心化(或均值化)协方差阵的主成分不仅反映了基因变异程度的“方差信息量权”,而且反映了基因间相互影响程度的“相关信息量权”;标准化相关阵的主成分反映的仅是“相关信息量权”,不包括“方差信息量权”。通过比较中国26个汉族人群HLA-A基因座中心化协方差阵和标准化相关阵2种主成分分析结果,证实中心化协方差阵主成分分析方法在特征根与特征向量、保留主成分的个数和对主成分的群体遗传学解释的合理性等方面均优于标准化相关阵主成分分析方法。结论:在对群体遗传结构进行主成分分析时,应使用中心化(或均值化)变换消除基因频率矩阵中量级的影响,然后在用其协方差阵提取主成分。  相似文献   

13.
Meyer K  Kirkpatrick M 《Genetics》2008,180(2):1153-1166
Eigenvalues and eigenvectors of covariance matrices are important statistics for multivariate problems in many applications, including quantitative genetics. Estimates of these quantities are subject to different types of bias. This article reviews and extends the existing theory on these biases, considering a balanced one-way classification and restricted maximum-likelihood estimation. Biases are due to the spread of sample roots and arise from ignoring selected principal components when imposing constraints on the parameter space, to ensure positive semidefinite estimates or to estimate covariance matrices of chosen, reduced rank. In addition, it is shown that reduced-rank estimators that consider only the leading eigenvalues and -vectors of the "between-group" covariance matrix may be biased due to selecting the wrong subset of principal components. In a genetic context, with groups representing families, this bias is inverse proportional to the degree of genetic relationship among family members, but is independent of sample size. Theoretical results are supplemented by a simulation study, demonstrating close agreement between predicted and observed bias for large samples. It is emphasized that the rank of the genetic covariance matrix should be chosen sufficiently large to accommodate all important genetic principal components, even though, paradoxically, this may require including a number of components with negligible eigenvalues. A strategy for rank selection in practical analyses is outlined.  相似文献   

14.
D Gianola  R L Fernando  S Im  J L Foulley 《Génome》1989,31(2):768-777
Conceptual aspects of estimation of genetic components of variance and covariance under selection are discussed, with special attention to likelihood methods. Certain selection processes are described and alternative likelihoods that can be used for analysis are specified. There is a mathematical relationship between the likelihoods that permits comparing the relative amount of information contained in them. Theoretical arguments and evidence indicate that point inferences made from likelihood functions are not affected by some forms of selection.  相似文献   

15.
Robust estimation of multivariate covariance components   总被引:1,自引:0,他引:1  
Dueck A  Lohr S 《Biometrics》2005,61(1):162-169
In many settings, such as interlaboratory testing, small area estimation in sample surveys, and heritability studies, investigators are interested in estimating covariance components for multivariate measurements. However, the presence of outliers can seriously distort estimates obtained using standard procedures such as maximum likelihood. We propose a procedure based on M-estimation for robustly estimating multivariate covariance components in the presence of outliers; the procedure applies to balanced and unbalanced data. We present an algorithm for computing the robust estimates and examine the performance of the estimator through a simulation study. The estimator is used to find covariance components and identify outliers in a study of variability of egg length and breadth measurements of American coots.  相似文献   

16.
We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance–covariance matrices ( G ). Large‐sample theory shows that maximum‐likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G . This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G , and of functions of G . We refer to this as the REML‐MVN method. This has been implemented in the mixed‐model program WOMBAT. Estimates of sampling variances from REML‐MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20‐dimensional data set for Drosophila wings. REML‐MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best‐estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML‐MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.  相似文献   

17.
A method to estimate genetic variance components in populations partially pedigreed by DNA fingerprinting is presented. The focus is on aquaculture, where breeding procedures may produce thousands of individuals. In aquaculture populations the individuals available for measurement will often be selected, i.e. will come from the upper tail of a size‐at‐age distribution, or the lower tail of an age‐at‐maturity distribution etc. Selection typically occurs by size grading during grow‐out and/or choice of superior fish as broodstock. The method presented in this paper enables us to estimate genetic variance components when only a small proportion of individuals, those with extreme phenotypes, have been identified by DNA fingerprinting. We replace the usual normal density by appropriate robust least favourable densities to ensure the robustness of our estimates. Standard analysis of variance or maximum likelihood estimation cannot be used when only the extreme progeny have been pedigreed because of the biased nature of the estimates. In our model‐based procedure a full robust likelihood function is defined, in which the missing information about non‐extreme progeny has been taken into account. This robust likelihood function is transformed into a computable function which is maximized to get the estimates. The estimates of sire and dam additive variance components are significantly and uniformly more accurate than those obtained by any of the standard methods when tested on simulated population data and have desirable robustness properties.  相似文献   

18.
Lou XY  Yang MC 《Genetica》2006,128(1-3):471-484
A genetic model is developed with additive and dominance effects of a single gene and polygenes as well as general and specific reciprocal effects for the progeny from a diallel mating design. The methods of ANOVA, minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood estimation (REML), and maximum likelihood estimation (ML) are suggested for estimating variance components, and the methods of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects, while best linear unbiased prediction, linear unbiased prediction (LUP), and adjusted unbiased prediction are suggested for analyzing random effects. Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of statistical methods involving two diallel designs with commonly used sample sizes, 6 and 8 parents, with no and missing crosses, respectively. Simulation results show that GLS and OLS are almost equally efficient for estimation of fixed effects, while MINQUE (1) and REML are better estimators of the variance components and LUP is most practical method for prediction of random effects. Data from a Drosophila melanogaster experiment (Gilbert 1985a, Theor appl Genet 69:625–629) were used as a working example to demonstrate the statistical analysis. The new methodology is also applicable to screening candidate gene(s) and to other mating designs with multiple parents, such as nested (NC Design I) and factorial (NC Design II) designs. Moreover, this methodology can serve as a guide to develop new methods for detecting indiscernible major genes and mapping quantitative trait loci based on mixture distribution theory. The computer program for the methods suggested in this article is freely available from the authors.  相似文献   

19.
The rhesus macaque displays an extensive polymorphism at the transferrin locus. A principal components analysis describes the variance and covariance of alleles at the transferrin locus in eight widely dispersed sample populations. Using an eigenvectorial representation of the covariance matrix and systematically approximated geographical locations the distribution of populations and transferrin alleles is compared. Alleles with high variance prove to be the determining factor in the placement of populations in a "genetic map" and provide a means for interpreting the low congruence of genetics and geography found.  相似文献   

20.
The problem of testing the separability of a covariance matrix against an unstructured variance‐covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first‐order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ2 distribution. The tests are implemented on a real dataset from medical studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号