首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Robust estimation of multivariate covariance components   总被引:1,自引:0,他引:1  
Dueck A  Lohr S 《Biometrics》2005,61(1):162-169
In many settings, such as interlaboratory testing, small area estimation in sample surveys, and heritability studies, investigators are interested in estimating covariance components for multivariate measurements. However, the presence of outliers can seriously distort estimates obtained using standard procedures such as maximum likelihood. We propose a procedure based on M-estimation for robustly estimating multivariate covariance components in the presence of outliers; the procedure applies to balanced and unbalanced data. We present an algorithm for computing the robust estimates and examine the performance of the estimator through a simulation study. The estimator is used to find covariance components and identify outliers in a study of variability of egg length and breadth measurements of American coots.  相似文献   

2.
Mantel tests of matrix correspondence have been widely used in population genetics to examine microevolutionary processes, such as isolation-by-distance (IBD). We used partial and multiple Mantel tests to simultaneously test long-term historical effects and current divergence and equilibrium processes, such as IBD. We used these procedures to calculate genetic divergence among Eugenia dysenterica (Myrtaceae) populations in Central Brazil. The Nei's genetic distances between pairs of local populations were strongly correlated with geographic distances, suggesting an IBD process, but field observations and the geographic distribution of the samples suggest that populations may have been subjected to more complex evolutionary processes of genetic divergence. Partial Mantel regression was used to partition the effects of geographic structure and long-term divergence associated with a possible historical barrier. The R(2) of the model with both effects was 73.3%, and after the partition 21.9% of the variation in the genetic distances could be attributed to long-term historical divergence alone, whereas only 1.5% of the variation in genetic distances could be attributed to IBD. As expected, there was a large overlap between these processes when explaining genetic divergence, so it was not possible to entirely partition divergence between historical and contemporary processes.  相似文献   

3.
The Mantel test is widely used to test the linear or monotonic independence of the elements in two distance matrices. It is one of the few appropriate tests when the hypothesis under study can only be formulated in terms of distances; this is often the case with genetic data. In particular, the Mantel test has been widely used to test for spatial relationship between genetic data and spatial layout of the sampling locations. We describe the domain of application of the Mantel test and derived forms. Formula development demonstrates that the sum-of-squares (SS) partitioned in Mantel tests and regression on distance matrices differs from the SS partitioned in linear correlation, regression and canonical analysis. Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data. Examples of difference in power are given for the detection of spatial gradients. Furthermore, the Mantel test does not correctly estimate the proportion of the original data variation explained by spatial structures. The Mantel test should not be used as a general method for the investigation of linear relationships or spatial structures in univariate or multivariate data. Its use should be restricted to tests of hypotheses that can only be formulated in terms of distances.  相似文献   

4.
Individual‐based landscape genetic methods have become increasingly popular for quantifying fine‐scale landscape influences on gene flow. One complication for individual‐based methods is that gene flow and landscape variables are often correlated with geography. Partial statistics, particularly Mantel tests, are often employed to control for these inherent correlations by removing the effects of geography while simultaneously correlating measures of genetic differentiation and landscape variables of interest. Concerns about the reliability of Mantel tests prompted this study, in which we use simulated landscapes to evaluate the performance of partial Mantel tests and two ordination methods, distance‐based redundancy analysis (dbRDA) and redundancy analysis (RDA), for detecting isolation by distance (IBD) and isolation by landscape resistance (IBR). Specifically, we described the effects of suitable habitat amount, fragmentation and resistance strength on metrics of accuracy (frequency of correct results, type I/II errors and strength of IBR according to underlying landscape and resistance strength) for each test using realistic individual‐based gene flow simulations. Mantel tests were very effective for detecting IBD, but exhibited higher error rates when detecting IBR. Ordination methods were overall more accurate in detecting IBR, but had high type I errors compared to partial Mantel tests. Thus, no one test outperformed another completely. A combination of statistical tests, for example partial Mantel tests to detect IBD paired with appropriate ordination techniques for IBR detection, provides the best characterization of fine‐scale landscape genetic structure. Realistic simulations of empirical data sets will further increase power to distinguish among putative mechanisms of differentiation.  相似文献   

5.
Fitness is expected to decrease with inbreeding in proportion to the amount of deleterious genetic variation present in a population. The effect of inbreeding on survivorship is usually modeled as a negative exponential relationship, and this model has been widely used to estimate the amount of deleterious genetic variation in populations. Linear regression has traditionally been used to estimate the parameters of the model, including the number of lethal equivalents. This article describes an alternative method for estimating parameters and their confidence limits: the maximum likelihood approach. The accuracy of regression and maximum likelihood estimates of the number of lethal equivalents is compared through simulation. The maximum likelihood approach is found to be both median unbiased and capable of estimating confidence limits with nearly the stated degree of accuracy, while the linear regression approach is found to be median biased. The significance of this on previous estimates of inbreeding depression is discussed. Zoo Biol 17:481–497, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

6.
The aim of the present study was to investigate the daily measured traits milk yield, water intake and dry matter intake with fixed and random regression models added with different error covariance structures. It was analysed whether these models deliver better model fitting in contrast to conventional fixed and random regression models. Furthermore, possible autocorrelation between repeated measures was investigated. The effect of model choice on statistical inference was also tested. Data recording was performed on the Futterkamp dairy research farm of the Chamber of Agriculture of Schleswig-Holstein. A dataset of about 21 000 observations from 178 Holstein cows was used. Average milk yield, water intake and dry matter intake were 34.9, 82.4 and 19.8 kg, respectively. Statistical analysis was performed using different linear mixed models. Lactation number, test day and the parameters to model the function of lactation day were included as fixed effects. Different structures were tested for the residuals; they were compared for their ability to fit the model using the likelihood ratio test, and Akaike's and Bayesian's information criteria. Different autocorrelation patterns were found. Adjacent repeated measures of daily milk yield were highest correlated (p1 = 0.32) in contrast to measures further apart, while for water intake and dry matter intake, the measurements with a lag of two units had the highest correlations with p2 = 0.11 and 0.12. The covariance structure of TOEPLITZ was most suitable to indicate the dependencies of the repeated measures for all traits. Generally, the most complex model, random regression with the additional covariance structure TOEPLITZ(4), provided the lowest information criteria. Furthermore, the model choice influenced the significance values of one fixed effect and therefore the general inference of the data analysis. Thus, the random regression + TOEPLITZ(4) model is recommended for use for the analysis of equally spaced datasets of milk yield, water intake and dry matter intake.  相似文献   

7.
1. Observations of different organisms can often be used to infer environmental conditions at a site. These inferences may be useful for diagnosing the causes of degradation in streams and rivers. 2. When used for diagnosis, biological inferences must not only provide accurate, unbiased predictions of environmental conditions, but also pairs of inferred environmental variables must covary no more strongly than actual measurements of those same environmental variables. 3. Mathematical analysis of the relationship between the measured and inferred values of different environmental variables provides an approach for comparing the covariance between measurements with the covariance between inferences. Then, simulated and field‐collected data are used to assess the performance of weighted average and maximum likelihood inference methods. 4. Weighted average inferences became less accurate as covariance in the calibration data increased, whereas maximum likelihood inferences were unaffected by covariance in the calibration data. In contrast, the accuracy of weighted average inferences was unaffected by changes in measurement error, whilst the accuracy of maximum likelihood inferences decreased as measurement error increased. Weighted average inferences artificially increased the covariance of environmental variables beyond what was expected from measurements, whereas maximum likelihood inference methods more accurately reproduced the expected covariances. 5. Multivariate maximum likelihood inference methods can potentially provide more useful diagnostic information than single variable inference models.  相似文献   

8.
9.
Genetic variation of teak (Tectona grandis Linn. f.) in 16 populations in Myanmar was investigated using ten nuclear microsatellite markers. Eight population pairs from two main regions in the north and the south of Myanmar were sampled. Each population pair consisted of an unlogged and a recently logged forest, each represented by 50 adult trees and 50 seedlings from the natural regeneration. For comparison, two land races from Benin (West Africa) were included. The major objectives of the study are to characterize the patterns of genetic variation of teak in natural populations, to examine genetic differentiation between adult trees and natural regeneration, and to investigate the impact of selective logging on genetic structures of teak. Genetic variation was high in all investigated populations. Slightly elevated levels of inbreeding were observed in the regeneration in comparison to the adults. Populations from the northern and the southern regions were strongly differentiated, but the differentiation between adults and natural regeneration and between unlogged and logged forests was low and not significant. Mantel tests indicated an isolation by distance (IBD) within the northern and the southern regions. High genetic diversity was also observed within the land races from Benin, which grouped to the southern populations. We failed to detect effects of logging on genetic diversity patterns or inbreeding in adults and regeneration, suggesting that high genetic diversity can even be sampled and maintained in disturbed forests. The observation of significant IBD and high differentiation between the populations of the north and the south of Myanmar suggests to include populations from widely separated forests in conservation programs, and to delineate provenance regions for the harvest and transfer of teak seeds and seedlings.  相似文献   

10.
We applied a new approach based on Mantel statistics to analyze the Genetic Analysis Workshop 14 simulated data with prior knowledge of the answers. The method was developed in order to improve the power of a haplotype sharing analysis for gene mapping in complex disease. The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes from case-control studies. The genetic similarity is measured as the shared length between haplotype pairs around a genetic marker. The phenotypic similarity is measured as the mean corrected cross-product based on the respective phenotypes. Cases with phenotype P1 and unrelated controls were drawn from the population of Danacaa. Power to detect main effects was compared to the X2-test for association based on 3-marker haplotypes and a global permutation test for haplotype association to test for main effects. Power to detect gene x gene interaction was compared to unconditional logistic regression. The results suggest that the Mantel statistics might be more powerful than alternative tests.  相似文献   

11.
The inference of population divergence times and branching patterns is of fundamental importance in many population genetic analyses. Many methods have been developed for estimating population divergence times, and recently, there has been particular attention towards genome-wide single-nucleotide polymorphisms (SNP) data. However, most SNP data have been affected by an ascertainment bias caused by the SNP selection and discovery protocols. Here, we present a modification of an existing maximum likelihood method that will allow approximately unbiased inferences when ascertainment is based on a set of outgroup populations. We also present a method for estimating trees from the asymmetric dissimilarity measures arising from pairwise divergence time estimation in population genetics. We evaluate the methods by simulations and by applying them to a large SNP data set of seven East Asian populations.  相似文献   

12.
Generalized linear model analyses of repeated measurements typically rely on simplifying mathematical models of the error covariance structure for testing the significance of differences in patterns of change across time. The robustness of the tests of significance depends, not only on the degree of agreement between the specified mathematical model and the actual population data structure, but also on the precision and robustness of the computational criteria for fitting the specified covariance structure to the data. Generalized estimating equation (GEE) solutions utilizing the robust empirical sandwich estimator for modeling of the error structure were compared with general linear mixed model (GLMM) solutions that utilized the commonly employed restricted maximum likelihood (REML) procedure. Under the conditions considered, the GEE and GLMM procedures were identical in assuming that the data are normally distributed and that the variance‐covariance structure of the data is the one specified by the user. The question addressed in this article concerns relative sensitivity of tests of significance for treatment effects to varying degrees of misspecification of the error covariance structure model when fitted by the alternative procedures. Simulated data that were subjected to monte carlo evaluation of actual Type I error and power of tests of the equal slopes hypothesis conformed to assumptions of ordinary linear model ANOVA for repeated measures except for autoregressive covariance structures and missing data due to dropouts. The actual within‐groups correlation structures of the simulated repeated measurements ranged from AR(1) to compound symmetry in graded steps, whereas the GEE and GLMM formulations restricted the respective error structure models to be either AR(1), compound symmetry (CS), or unstructured (UN). The GEE‐based tests utilizing empirical sandwich estimator criteria were documented to be relatively insensitive to misspecification of the covariance structure models, whereas GLMM tests which relied on restricted maximum likelihood (REML) were highly sensitive to relatively modest misspecification of the error correlation structure even though normality, variance homogeneity, and linearity were not an issue in the simulated data.Goodness‐of‐fit statistics were of little utility in identifying cases in which relatively minor misspecification of the GLMM error structure model resulted in inadequate alpha protection for tests of the equal slopes hypothesis. Both GEE and GLMM formulations that relied on unstructured (UN) error model specification produced nonconservative results regardless of the actual correlation structure of the repeated measurements. A random coefficients model produced robust tests with competitive power across all conditions examined. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
One of the primary goals of macroevolutionary biology has been to explain general trends in long‐term diversity patterns, including whether such patterns correspond to an upscaling of processes occurring at lower scales. Reconstructed phylogenies often show decelerated lineage accumulation over time. This pattern has often been interpreted as the result of diversity‐dependent (DD) diversification, where the accumulation of species causes diversification to decrease through niche filling. However, other processes can also produce such a slowdown, including time dependence without diversity dependence. To test whether phylogenetic branching patterns can be used to distinguish these two mechanisms, we formulated a time‐dependent, but diversity‐independent model that matches the expected diversity through time of a DD model. We simulated phylogenies under each model and studied how well likelihood methods could recover the true diversification mode. Standard model selection criteria always recovered diversity dependence, even when it was not present. We correct for this bias by using a bootstrap method and find that neither model is decisively supported. This implies that the branching pattern of reconstructed trees contains insufficient information to detect the presence or absence of diversity dependence. We advocate that tests encompassing additional data, for example, traits or range distributions, are needed to evaluate how diversity drives macroevolutionary trends.  相似文献   

14.
Extreme discordant sibling-pair (EDSP) designs have been shown in theory to be very powerful for mapping quantitative-trait loci (QTLs) in humans. However, their practical applicability has been somewhat limited by the need to phenotype very large populations to find enough pairs that are extremely discordant. In this paper, we demonstrate that there is also substantial power in pairs that are only moderately discordant, and that designs using moderately discordant pairs can yield a more practical balance between phenotyping and genotyping efforts. The power we demonstrate for moderately discordant pairs stems from a new statistical result. Statistical analysis in discordant-pair studies is generally done by testing for reduced identity by descent (IBD) sharing in the pairs. By contrast, the most commonly-used statistical methods for more standard QTL mapping are Haseman-Elston regression and variance-components analysis. Both of these use statistics that are functions of the trait values given IBD information for the pedigree. We show that IBD sharing statistics and "trait value given IBD" statistics contribute complementary rather than redundant information, and thus that statistics of the two types can be combined to form more powerful tests of linkage. We propose a simple composite statistic, and test it with simulation studies. The simulation results show that our composite statistic increases power only minimally for extremely discordant pairs. However, it boosts the power of moderately discordant pairs substantially and makes them a very practical alternative. Our composite statistic is straightforward to calculate with existing software; we give a practical example of its use by applying it to a Genetic Analysis Workshop (GAW) data set.  相似文献   

15.
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance–covariance parameters in hierarchically structured data. Although hierarchical models have occasionally been used in the analysis of ecological data, their full potential to describe scales of association, diagnose variance explained, and to partition uncertainty has not been employed. In this paper we argue that the use of the HLM framework can enable significantly improved inference about ecological processes across levels of organization. After briefly describing the principals behind HLM, we give two examples that demonstrate a protocol for building hierarchical models and answering questions about the relationships between variables at multiple scales. The first example employs maximum likelihood methods to construct a two-level linear model predicting herbivore damage to a perennial plant at the individual- and patch-scale; the second example uses Bayesian estimation techniques to develop a three-level logistic model of plant flowering probability across individual plants, microsites and populations. HLM model development and diagnostics illustrate the importance of incorporating scale when modelling associations in ecological systems and offer a sophisticated yet accessible method for studies of populations, communities and ecosystems. We suggest that a greater coupling of hierarchical study designs and hierarchical analysis will yield significant insights on how ecological processes operate across scales.  相似文献   

16.
Since the seminal work of Prentice and Pyke, the prospective logistic likelihood has become the standard method of analysis for retrospectively collected case‐control data, in particular for testing the association between a single genetic marker and a disease outcome in genetic case‐control studies. In the study of multiple genetic markers with relatively small effects, especially those with rare variants, various aggregated approaches based on the same prospective likelihood have been developed to integrate subtle association evidence among all the markers considered. Many of the commonly used tests are derived from the prospective likelihood under a common‐random‐effect assumption, which assumes a common random effect for all subjects. We develop the locally most powerful aggregation test based on the retrospective likelihood under an independent‐random‐effect assumption, which allows the genetic effect to vary among subjects. In contrast to the fact that disease prevalence information cannot be used to improve efficiency for the estimation of odds ratio parameters in logistic regression models, we show that it can be utilized to enhance the testing power in genetic association studies. Extensive simulations demonstrate the advantages of the proposed method over the existing ones. A real genome‐wide association study is analyzed for illustration.  相似文献   

17.
Tests for a monotonic trend between an ordered categorical exposure and disease status are routinely carried out from case‐control data using the Mantel‐extension trend test or the asymptotically equivalent Cochran‐Armitage test. In this study, we considered two alternative tests based on isotonic regression, namely an order‐restricted likelihood ratio test and an isotonic modification of the Mantel‐extension test extending the recent proposal by Mancuso, Ahn and Chen (2001) to case‐control data. Furthermore, we considered three tests based on contrasts, namely a single contrast (SC) test based on Schaafsma's coefficients, the Dosemeci and Benichou (DB) test, a multiple contrast (MC) test based on the Helmert, reverse‐Helmert and linear contrasts and we derived their case‐control versions. Using simulations, we compared the statistical properties of these five alternative tests to those of the Mantel‐extension test under various patterns including no relationship, as well as monotonic and non‐monotonic relationships between exposure and disease status. In the case of no relationship, all tests had close to nominal type I error except in situations combining a very unbalanced exposure distribution and small sample size, where the asymptotic versions of the three tests based on contrasts were highly anticonservative. The use of bootstrap instead of asymptotic versions corrected this anticonservatism. For monotonic patterns, all tests had close powers. For non monotonic patterns, the DB‐test showed the most favourable results as it was the least powerful test. The two tests based on isotonic regression were the most powerful tests and the Mantel‐extension test, the SC‐ and MC‐tests had in‐between powers. The six tests were applied to data from a case‐control study investigating the relationship between alcohol consumption and risk of laryngeal cancer in Turkey. In situations with no evidence of a monotonic relationship between exposure and disease status, the three tests based on contrasts did not conclude in favour of a significant trend whereas all the other tests did. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

18.
ARE PARTIAL MANTEL TESTS ADEQUATE?   总被引:7,自引:0,他引:7  
Partial Mantel tests were designed to test for correlation among three matrices of pairwise distances. We show through an example that these tests may be inadequate, because the associated P-value is not indicative of the type I error.  相似文献   

19.
Likelihood analysis of ongoing gene flow and historical association   总被引:3,自引:0,他引:3  
Abstract.— We develop a Monte Carlo-based likelihood method for estimating migration rates and population divergence times from data at unlinked loci at which mutation rates are sufficiently low that, in the recent past, the effects of mutation can be ignored. The method is applicable to restriction fragment length polymorphisms (RFLPs) and single nucleotide polymorphisms (SNPs) sampled from a subdivided population. The method produces joint maximum-likelihood estimates of the migration rate and the time of population divergence, both scaled by population size, and provides a framework in which to test either for no ongoing gene flow or for population divergence in the distant past. We show the method performs well and provides reasonably accurate estimates of parameters even when the assumptions under which those estimates are obtained are not completely satisfied. Furthermore, we show that, provided that the number of polymorphic loci is sufficiently large, there is some power to distinguish between ongoing gene flow and historical association as causes of genetic similarity between pairs of populations.  相似文献   

20.
S H Bryant  C E Lawrence 《Proteins》1991,9(2):108-119
A statistical analysis of ion pairs in protein crystal structures shows that their abundance with respect to uncharged controls is accurately predicted by a Boltzmann-like function of electrostatic potential. It appears that the mechanisms of protein folding and/or evolution combine to produce a "thermal" distribution of local nonbonded interactions, as has been suggested by statistical-mechanical theories. Using this relationship, we develop a maximum likelihood methodology for estimation of apparent energetic parameters from the data base of known structures, and we derive electrostatic potential functions that lead to optimal agreement of observed and predicted ion-pair frequencies. These are similar to potentials of mean force derived from electrostatic theory, but departure from Coulombic behavior is less than has been suggested.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号