首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background  

The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances.  相似文献   

2.
A confidence region for topologies is a data-dependent set of topologies that, with high probability, can be expected to contain the true topology. Because of the connection between confidence regions and hypothesis tests, implicitly or explicitly, the construction of confidence regions for topologies is a component of many phylogenetic studies. Existing methods for constructing confidence regions, however, often give conflicting results. The Shimodaira-Hasegawa test seems too conservative, including too many topologies, whereas the other commonly used method, the Swofford-Olsen-Waddell-Hillis test, tends to give confidence regions with too few topologies. Confidence regions are constructed here based on a generalized least squares test statistic. The methodology described is computationally inexpensive and broadly applicable to maximum likelihood distances. Assuming the model used to construct the distances is correct, the coverage probabilities are correct with large numbers of sites.  相似文献   

3.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

4.
We present fast new algorithms for evaluating trees with respectto least squares and minimum evolution (ME), the most commonlyused criteria for inferring phylogenetic trees from distancedata. The new algorithms include an optimal O(N2) time algorithmfor calculating the edge (branch or internode) lengths on atree according to ordinary or unweighted least squares (OLS);an O(N3) time algorithm for edge lengths under weighted leastsquares (WLS) including the Fitch-Margoliash method; and anoptimal O(N4) time algorithm for generalized least-squares (GLS)edge lengths (where N is the number of taxa in the tree). TheME criterion is based on the sum of edge lengths. Consequently,the edge lengths algorithms presented here lead directly toO(N2), O(N3), and O(N4) time algorithms for ME under OLS, WLS,and GLS, respectively. All of these algorithms are as fast asor faster than any of those previously published, and the algorithmsfor OLS and GLS are the fastest possible (with respect to orderof computational complexity). A major advantage of our new methodsis that they are as well adapted to multifurcating trees asthey are to binary trees. An optimal algorithm for determiningpath lengths from a tree with given edge lengths is also developed.This leads to an optimal O(N2) algorithm for OLS sums of squaresevaluation and corresponding O(N3) and O(N4) time algorithmsfor WLS and GLS sums of squares, respectively. The GLS algorithmis time-optimal if the covariance matrix is already inverted.The speed of each algorithm is assessed analytically—thespeed increases we calculate are confirmed by the dramatic speedincreases resulting from their implementation in PAUP* 4.0.The new algorithms enable far more extensive tree searches andstatistical evaluations (e.g., bootstrap, parametric bootstrap,or jackknife) in the same amount of time. Hopefully, the fastalgorithms for WLS and GLS will encourage the use of these criteriafor evaluating trees and their edge lengths (e.g., for approximatedivergence time estimates), since they should be more statisticallyefficient than OLS.  相似文献   

5.
Proportionality of phenotypic and genetic distance is of crucial importance to adequately focus on population history and structure, and it depends on the proportionality of genetic and phenotypic covariance. Constancy of phenotypic covariances is unlikely without constancy of genetic covariation if the latter is a substantial component of the former. If phenotypic patterns are found to be relatively stable, the most probable explanation is that genetic covariance matrices are also stable. Factors like morphological integration account for such stability. Morphological integration can be studied by analyzing the relationships among morphological traits. We present here a comparison of phenotypic correlation and covariance structure among worldwide human populations. Correlation and covariance matrices between 47 cranial traits were obtained for 28 populations, and compared with design matrices representing functional and developmental constraints. Among-population differences in patterns of correlation and covariation were tested for association with matrices of genetic distances (obtained after an examination of 10 Alu-insertions) and with Mahalanobis distances (computed after craniometrical traits). All matrix correlations were estimated by means of Mantel tests. Results indicate that correlation and covariance structure in our species is stable, and that among-group correlation/covariance similarity is not related to genetic or phenotypic distance. Conversely, genetic and morphological distance matrices were highly correlated. Correlation and covariation patterns were largely associated with functional and developmental factors, which probably account for the stability of covariance patterns.  相似文献   

6.
Coordinated variation among positions in amino acid sequence alignments can reveal genetic dependencies at noncontiguous positions, but methods to assess these interactions are incompletely developed. Previously, we found genome-wide networks of covarying residue positions in the hepatitis C virus genome (R. Aurora, M. J. Donlin, N. A. Cannon, and J. E. Tavis, J. Clin. Invest. 119:225-236, 2009). Here, we asked whether such networks are present in a diverse set of viruses and, if so, what they may imply about viral biology. Viral sequences were obtained for 16 viruses in 13 species from 9 families. The entire viral coding potential for each virus was aligned, all possible amino acid covariances were identified using the observed-minus-expected-squared algorithm at a false-discovery rate of ≤1%, and networks of covariances were assessed using standard methods. Covariances that spanned the viral coding potential were common in all viruses. In all cases, the covariances formed a single network that contained essentially all of the covariances. The hepatitis C virus networks had hub-and-spoke topologies, but all other networks had random topologies with an unusually large number of highly connected nodes. These results indicate that genome-wide networks of genetic associations and the coordinated evolution they imply are very common in viral genomes, that the networks rarely have the hub-and-spoke topology that dominates other biological networks, and that network topologies can vary substantially even within a given viral group. Five examples with hepatitis B virus and poliovirus are presented to illustrate how covariance network analysis can lead to inferences about viral biology.  相似文献   

7.
Scheffe's confidence intervals for linear functions of some subvectors of a vector of parameters are presented. The considered subvectors are such that covariance matrices of their estimators are known non-negative definite matrices multiplied by unknown positive constants. This property is characteristic of the least squares estimators of vectors of main and interaction effects in the analysis of covariance models of the following experimental designs: split-block, split-plot, completely randomized two-factor design and randomized complete block design. The formulas for confidence intervals for linear functions of vectors of main or interaction effects in the designs mentioned above are given in the paper. The practical example is given as an illustration.  相似文献   

8.
Efficiency of regression estimates for clustered data   总被引:1,自引:0,他引:1  
Mancl LA  Leroux BG 《Biometrics》1996,52(2):500-511
Statistical methods for clustered data, such as generalized estimating equations (GEE) and generalized least squares (GLS), require selecting a correlation or convariance structure to specify the dependence between observations within a cluster. Valid regression estimates can be obtained that do not depend on correct specification of the true correlation, but inappropriate specifications can result in a loss of efficiency. We derive general expressions for the asymptotic relative efficiency of GEE and GLS estimators under nested correlation structures. Efficiency is shown to depend on the covariate distribution, the cluster sizes, the response variable correlation, and the regression parameters. The results demonstrate that efficiency is quite sensitive to the between- and within-cluster variation of the covariates, and provide useful characterizations of models for which upper and lower efficiency bounds are attained. Efficiency losses for simple working correlation matrices, such as independence, can be large even for small to moderate correlations and cluster sizes.  相似文献   

9.
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.  相似文献   

10.
Aim  In their recent paper, Kissling & Carl (2008 ) recommended the spatial error simultaneous autoregressive model (SARerr) over ordinary least squares (OLS) for modelling species distribution. We compared these models with the generalized least squares model (GLS) and a variant of SAR (SARvario). GLS and SARvario are superior to standard implementations of SAR because the spatial covariance structure is described by a semivariogram model.
Innovation  We used the complete datasets employed by Kissling & Carl (2008 ), with strong spatial autocorrelation, and two datasets in which the spatial structure was degraded by sample reduction and grid coarsening. GLS performed consistently better than OLS, SARerr and SARvario in all datasets, especially in terms of goodness of fit. SARvario was marginally better than SARerr in the degraded datasets.
Main conclusions  GLS was more reliable than SAR-based models, so its use is recommended when dealing with spatially autocorrelated data.  相似文献   

11.
A procedure is presented for constructing an exact confidence interval for the ratio of the two variance components in a possibly unbalanced mixed linear model that contains a single set of m random effects. This procedure can be used in animal and plant breeding problems to obtain an exact confidence interval for a heritability. The confidence interval can be defined in terms of the output of a least squares analysis. It can be computed by a graphical or iterative technique requiring the diagonalization of an m X m matrix or, alternatively, the inversion of a number of m X m matrices. Confidence intervals that are approximate can be obtained with much less computational burden, using either of two approaches. The various confidence interval procedures can be extended to some problems in which the mixed linear model contains more than one set of random effects. Corresponding to each interval procedure is a significance test and one or more estimators.  相似文献   

12.
We have developed four asymptotic interval estimators in closed forms for the gamma correlation under stratified random sampling, including the confidence interval based on the most commonly used weighted‐least‐squares (WLS) approach (CIWLS), the confidence interval calculated from the Mantel‐Haenszel (MH) type estimator with the Fisher‐type transformation (CIMHT), the confidence interval using the fundamental idea of Fieller's Theorem (CIFT) and the confidence interval derived from a monotonic function of the WLS estimator of Agresti's α with the logarithmic transformation (MWLSLR). To evaluate the finite‐sample performance of these four interval estimators and note the possible loss of accuracy in application of both Wald's confidence interval and MWLSLR using pooled data without accounting for stratification, we employ Monte Carlo simulation. We use the data taken from a general social survey studying the association between the income level and job satisfaction with strata formed by genders in black Americans published elsewhere to illustrate the practical use of these interval estimators.  相似文献   

13.
Abstract Comparative methods are widely used in ecology and evolution. The most frequently used comparative methods are based on an explicit evolutionary model. However, recent approaches have been popularized that are without an evolutionary basis or an underlying null model. Here we highlight the limitations of such techniques in comparative analyses by using simulations to compare two commonly used comparative methods with and without evolutionary basis, respectively: generalized least squares (GLS) and phylogenetic eigenvector regression (PVR). We find that GLS methods are more efficient at estimating model parameters and produce lower variance in parameter estimates, lower phylogenetic signal in residuals, and lower Type I error rates than PVR methods. These results can very likely be generalized to eigenvector methods that control for space and both space and phylogeny. We highlight that GLS methods can be adapted in numerous ways and that the variance structure used in these models can be flexibly optimized to each data set.  相似文献   

14.
We have two sample covariance matrices of size p × p, where p is the number of variables denoting measurements of arterial blood pressure recorded in various positions and methods for women and men. The hypothesis on equality of the two covariance matrices was rejected and we want to find, why this happened and in which covariances the two matrices differ. In § 2 we explore the primary (observed) data looking for outliers and deviations from normality. We do it by drawing χ2 plots. In § 3 we look for common principal components. We use here Flury's method. We find that the data for blood pressure systolic might have a common principal axes system (P≈?0.0866), while for blood pressure diastolic this is dubious (p≈?0.0112). The variances of the first principal component found for woman are a little larger than those for men, and this fact explains the rejection of the classical test ∑1 = ∑2 on the equality of the two covariance matrices. After applying a common rotation to the two covariance matrices we reduce them to a nearly diagonal form. Looking at the non diagonal elements we see which covariances do not fit into the model. In § 4 we repeat the calculations from § 3 for covariance matrices calculated from the same data using robust methods.  相似文献   

15.
Genetic and environmental covariances between pairs of complex traits are important quantitative measurements that characterize their shared genetic and environmental architectures. Accurate estimation of genetic and environmental covariances in genome-wide association studies (GWASs) can help us identify common genetic and environmental factors associated with both traits and facilitate the investigation of their causal relationship. Genetic and environmental covariances are often modeled through multivariate linear mixed models. Existing algorithms for covariance estimation include the traditional restricted maximum likelihood (REML) method and the recent method of moments (MoM). Compared to REML, MoM approaches are computationally efficient and require only GWAS summary statistics. However, MoM approaches can be statistically inefficient, often yielding inaccurate covariance estimates. In addition, existing MoM approaches have so far focused on estimating genetic covariance and have largely ignored environmental covariance estimation. Here we introduce a new computational method, GECKO, for estimating both genetic and environmental covariances, that improves the estimation accuracy of MoM while keeping computation in check. GECKO is based on composite likelihood, relies on only summary statistics for scalable computation, provides accurate genetic and environmental covariance estimates across a range of scenarios, and can accommodate SNP annotation stratified covariance estimation. We illustrate the benefits of GECKO through simulations and applications on analyzing 22 traits from five large-scale GWASs. In the real data applications, GECKO identified 50 significant genetic covariances among analyzed trait pairs, resulting in a twofold power gain compared to the previous MoM method LDSC. In addition, GECKO identified 20 significant environmental covariances. The ability of GECKO to estimate environmental covariance in addition to genetic covariance helps us reveal strong positive correlation between the genetic and environmental covariance estimates across trait pairs, suggesting that common pathways may underlie the shared genetic and environmental architectures between traits.  相似文献   

16.
Sensitivity and specificity are common measures used to evaluate the performance of a diagnostic test. A diagnostic test is often administrated at a subunit level, e.g. at the level of vessel, ear or eye of a patient so that the treatment can be targeted at the specific subunit. Therefore, it is essential to evaluate the diagnostic test at the subunit level. Often patients with more negative subunit test results are less likely to receive the gold standard tests than patients with more positive subunit test results. To account for this type of missing data and correlation between subunit test results, we proposed a weighted generalized estimating equations (WGEE) approach to evaluate subunit sensitivities and specificities. A simulation study was conducted to evaluate the performance of the WGEE estimators and the weighted least squares (WLS) estimators (Barnhart and Kosinski, 2003) under a missing at random assumption. The results suggested that WGEE estimator is consistent under various scenarios of percentage of missing data and sample size, while the WLS approach could yield biased estimators due to a misspecified missing data mechanism. We illustrate the methodology with a cardiology example.  相似文献   

17.
Summary The definition of covariances of half- and full sibs, and hence that of variances of general and specific combining ability with regard to a quantitative character, is extended to take into account the respective covariances between a pair of characters. The interpretation of the dispersion and correlation matrices of general and specific combining ability is discussed by considering a set of single, three- and four-way crosses, made using diallel and line × tester mating systems in Pennisetum typhoides. The general implications of the concept of covariance of combining ability in plant breeding are discussed.  相似文献   

18.
The method of generalized least squares (GLS) is used to assess the variance function for isothermal titration calorimetry (ITC) data collected for the 1:1 complexation of Ba(2+) with 18-crown-6 ether. In the GLS method, the least squares (LS) residuals from the data fit are themselves fitted to a variance function, with iterative adjustment of the weighting function in the data analysis to produce consistency. The data are treated in a pooled fashion, providing 321 fitted residuals from 35 data sets in the final analysis. Heteroscedasticity (nonconstant variance) is clearly indicated. Data error terms proportional to q(i) and q(i)/v are well defined statistically, where q(i) is the heat from the ith injection of titrant and v is the injected volume. The statistical significance of the variance function parameters is confirmed through Monte Carlo calculations that mimic the actual data set. For the data in question, which fall mostly in the range of q(i)=100-2000 microcal, the contributions to the data variance from the terms in q(i)(2) typically exceed the background constant term for q(i)>300 microcal and v<10 microl. Conversely, this means that in reactions with q(i) much less than this, heteroscedasticity is not a significant problem. Accordingly, in such cases the standard unweighted fitting procedures provide reliable results for the key parameters, K and DeltaH(degrees) and their statistical errors. These results also support an important earlier finding: in most ITC work on 1:1 binding processes, the optimal number of injections is 7-10, which is a factor of 3 smaller than the current norm. For high-q reactions, where weighting is needed for optimal LS analysis, tips are given for using the weighting option in the commercial software commonly employed to process ITC data.  相似文献   

19.
The genetic covariance and correlation matrices for five morphological traits were estimated from four populations of fruit flies, Drosophila melanogaster, to measure the extent of change in genetic covariances as a result of directional selection. Two of the populations were derived from lines that had undergone selection for large or small thorax length over the preceding 23 generations. A third population was constituted using flies from control lines that were maintained with equivalent population sizes as the selected lines. The fourth population contained flies from the original cage population from which the selected and control lines had been started. Tests of the homogeneity of covariance matrices using maximum likelihood techniques revealed significant changes in covariance structure among the selected lines. Prediction of base population trait means from selected line means under the assumption of constant genetic covariances indicated that genetic covariances for the small population differed more from the base population than did the covariances for the large population. The predicted small population means diverged farther from the expected means because the additive genetic variance associated with several traits increased in value and most of the genetic covariances associated with one trait changed in sign. These results illustrate that genetic covariances may remain nearly constant in some situations while changing markedly in others. Possible developmental reasons for the genetic changes are discussed.  相似文献   

20.
A graphical method for detecting recombination in phylogenetic data sets   总被引:9,自引:3,他引:6  
Current phylogenetic tree reconstruction methods assume that there is a single underlying tree topology for all sites along the sequence. The presence of mosaic sequences due to recombination violates this assumption and will cause phylogenetic methods to give misleading results due to the imposition of a single tree topology on all sites. The detection of mosaic sequences caused by recombination is therefore an important first step in phylogenetic analysis. A graphical method for the detection of recombination, based on the least squares method of phylogenetic estimation, is presented here. This method locates putative recombination breakpoints by moving a window along the sequence. The performance of the method is assessed by simulation and by its application to a real data set.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号