首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present fast new algorithms for evaluating trees with respectto least squares and minimum evolution (ME), the most commonlyused criteria for inferring phylogenetic trees from distancedata. The new algorithms include an optimal O(N2) time algorithmfor calculating the edge (branch or internode) lengths on atree according to ordinary or unweighted least squares (OLS);an O(N3) time algorithm for edge lengths under weighted leastsquares (WLS) including the Fitch-Margoliash method; and anoptimal O(N4) time algorithm for generalized least-squares (GLS)edge lengths (where N is the number of taxa in the tree). TheME criterion is based on the sum of edge lengths. Consequently,the edge lengths algorithms presented here lead directly toO(N2), O(N3), and O(N4) time algorithms for ME under OLS, WLS,and GLS, respectively. All of these algorithms are as fast asor faster than any of those previously published, and the algorithmsfor OLS and GLS are the fastest possible (with respect to orderof computational complexity). A major advantage of our new methodsis that they are as well adapted to multifurcating trees asthey are to binary trees. An optimal algorithm for determiningpath lengths from a tree with given edge lengths is also developed.This leads to an optimal O(N2) algorithm for OLS sums of squaresevaluation and corresponding O(N3) and O(N4) time algorithmsfor WLS and GLS sums of squares, respectively. The GLS algorithmis time-optimal if the covariance matrix is already inverted.The speed of each algorithm is assessed analytically—thespeed increases we calculate are confirmed by the dramatic speedincreases resulting from their implementation in PAUP* 4.0.The new algorithms enable far more extensive tree searches andstatistical evaluations (e.g., bootstrap, parametric bootstrap,or jackknife) in the same amount of time. Hopefully, the fastalgorithms for WLS and GLS will encourage the use of these criteriafor evaluating trees and their edge lengths (e.g., for approximatedivergence time estimates), since they should be more statisticallyefficient than OLS.  相似文献   

2.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

3.
The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them.   相似文献   

4.
Branch length estimates play a central role in maximum-likelihood (ML) and minimum-evolution (ME) methods of phylogenetic inference. For various reasons, branch length estimates are not statistically independent under ML or ME. We studied the response of correlations among branch length estimates to the degree of among-branch length heterogeneity (BLH) in the model (true) tree. The frequency and magnitude of (especially negative) correlations among branch length estimates were both shown to increase as BLH increases under simulation and analytically. For ML, we used the correct model (Jukes–Cantor). For ME, we employed ordinary least-squares (OLS) branch lengths estimated under both simple p-distances and Jukes–Cantor distances, analyzed with and without an among-site rate heterogeneity parameter. The efficiency of ME and ML was also shown to decrease in response to increased BLH. We note that the shape of the true tree will in part determine BLH and represents a critical factor in the probability of recovering the correct topology. An important finding suggests that researchers cannot expect that different branches that were in fact the same length will have the same probability of being accurately reconstructed when BLH exists in the overall tree. We conclude that methods designed to minimize the interdependencies of branch length estimates (BLEs) may (1) reduce both the variance and the covariance associated with the estimates and (2) increase the efficiency of model-based optimality criteria. We speculate on possible ways to reduce the nonindependence of BLEs under OLS and ML. Received: 9 March 1999 / Accepted: 4 May 1999  相似文献   

5.
A genetic model was proposed to simultaneously investigate genetic effects of both polygenes and several single genes for quantitative traits of diploid plants and animals. Mixed linear model approaches were employed for statistical analysis. Based on two mating designs, a full diallel cross and a modified diallel cross including F2, Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of the estimation of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects and of minimum norm quadratic unbiased estimation (MINQUE) and Henderson III for variance components. Estimates of MINQUE (1) were unbiased and efficient in both reduced and full genetic models. Henderson III could have a large bias when used to analyze the full genetic model. Simulation results also showed that GLS and OLS were good methods to estimate fixed effects in the genetic models. Data on Drosophila melanogaster from Gilbert were used as a worked example to demonstrate the parameter estimation. Received: 11 November 2000 / Accepted: 2 May 2001  相似文献   

6.

Background  

The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances.  相似文献   

7.
Minimum evolution is the guiding principle of an important class of distance-based phylogeny reconstruction methods, including neighbor-joining (NJ), which is the most cited tree inference algorithm to date. The minimum evolution principle involves searching for the tree with minimum length, where the length is estimated using various least-squares criteria. Since evolutionary distances cannot be known precisely but only estimated, it is important to investigate the robustness of phylogenetic reconstruction to imprecise estimates for these distances. The safety radius is a measure of this robustness: it consists of the maximum relative deviation that the input distances can have from the correct distances, without compromising the reconstruction of the correct tree structure. Answering some open questions, we here derive the safety radius of two popular minimum evolution criteria: balanced minimum evolution (BME) and minimum evolution based on ordinary least squares (OLS + ME). Whereas BME has a radius of \frac12\frac{1}{2}, which is the best achievable, OLS + ME has a radius tending to 0 as the number of taxa increases. This difference may explain the gap in reconstruction accuracy observed in practice between OLS + ME and BME (which forms the basis of popular programs such as NJ and FastME).  相似文献   

8.
The method of minimum evolution reconstructs a phylogenetic tree T for n taxa given dissimilarity data d. In principle, for every tree W with these n leaves an estimate for the total length of W is made, and T is selected as the W that yields the minimum total length. Suppose that the ordinary least-squares formula S W (d) is used to estimate the total length of W. A theorem of Rzhetsky and Nei shows that when d is positively additive on a completely resolved tree T, then for all WT it will be true that S W (d) > S T (d). The same will be true if d is merely sufficiently close to an additive dissimilarity function. This paper proves that as n grows large, even if the shortest branch length in the true tree T remains constant and d is additive on T, then the difference S W (d)-S T (d) can go to zero. It is also proved that, as n grows large, there is a tree T with n leaves, an additive distance function d T on T with shortest edge ε, a distance function d, and a tree W with the same n leaves such that d differs from d T by only approximately ε/4, yet minimum evolution incorrectly selects the tree W over the tree T. This result contrasts with the method of neighbor-joining, for which Atteson showed that incorrect selection of W required a deviation at least ε/2. It follows that, for large n, minimum evolution with ordinary least-squares can be only half as robust as neighbor-joining.  相似文献   

9.
Lou XY  Yang MC 《Genetica》2006,128(1-3):471-484
A genetic model is developed with additive and dominance effects of a single gene and polygenes as well as general and specific reciprocal effects for the progeny from a diallel mating design. The methods of ANOVA, minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood estimation (REML), and maximum likelihood estimation (ML) are suggested for estimating variance components, and the methods of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects, while best linear unbiased prediction, linear unbiased prediction (LUP), and adjusted unbiased prediction are suggested for analyzing random effects. Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of statistical methods involving two diallel designs with commonly used sample sizes, 6 and 8 parents, with no and missing crosses, respectively. Simulation results show that GLS and OLS are almost equally efficient for estimation of fixed effects, while MINQUE (1) and REML are better estimators of the variance components and LUP is most practical method for prediction of random effects. Data from a Drosophila melanogaster experiment (Gilbert 1985a, Theor appl Genet 69:625–629) were used as a working example to demonstrate the statistical analysis. The new methodology is also applicable to screening candidate gene(s) and to other mating designs with multiple parents, such as nested (NC Design I) and factorial (NC Design II) designs. Moreover, this methodology can serve as a guide to develop new methods for detecting indiscernible major genes and mapping quantitative trait loci based on mixture distribution theory. The computer program for the methods suggested in this article is freely available from the authors.  相似文献   

10.
Direct Calculation of a Tree Length Using a Distance Matrix   总被引:8,自引:0,他引:8  
Comparative studies of tree-building methods have shown minimum evolution to be in general an accurate criterion for selecting a true tree. To improve the use of this criterion, this paper proposes a method for rapidly and directly calculating a length of a dichotomous tree without having to resort to branch length calculations. This direct calculation (DC) method applies to the complete final topology, giving equal importance to each branch after a dichotomy. According to this method, the tree length S DC is S DC =∑ i j (D ij /2 Bij ) = (∑ i<j D ij 2 Bmax−Bij )/2 Bmax −1 where D ij is the observed distance between taxa i and j, B ij is the number of branches connecting i and j, Bmax is the greatest B ij in the tree, and the powers of two are due to the dichotomy of the tree. This tree length expression may be used as a rapid method for selecting the shortest tree from a set of hypothetical or subobtimal trees. Received: 2 March 2000 / Accepted: 24 March 2000  相似文献   

11.
Efficient determination of evolutionary distances is important for the correct reconstruction of phylogenetic trees. The performance of the pooled distance required for reconstructing a phylogenetic tree can be improved by applying large weights to appropriate distances for reconstructing phylogenetic trees and small weights to inappropriate distances. We developed two weighting methods, the modified Tajima–Takezaki method and the modified least-squares method, for reconstructing phylogenetic trees from multiple loci. By computer simulations, we found that both of the new methods were more efficient in reconstructing correct topologies than the no-weight method. Hence, we reconstructed hominoid phylogenetic trees from mitochondrial DNA using our new methods, and found that the levels of bootstrap support were significantly increased by the modified Tajima–Takezaki and by the modified least-squares method.  相似文献   

12.
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy.  相似文献   

13.
Cranston and Humphries (1988) expose Sæther's (1976) revision of the Hydrobaenus grou of enera (Chironomidae, Ditera) to the vagaries of quantitative phyletics. In the rocess they have clearly shown why at feast their method is not in accordance with the view of Hennig. In the qualitative Hennigian method the parsimony criterion is used when choosing among alternative hypotheses of explanation of single character distribution. The selection and interpretation equals the cladogenetic analysis. In neocladistic methods the parsimony criterion is usel in order to find the tree implying the fewest evolutionary ains and losses with the fewest lines. The explanation of characters enters as an afterthought. The differences between the methods are shown by analyzing a theoretical data matrix as well as by reassessment of the results obtained by Cranston and Humphries. Their data critique is met point by oint, their data matrix, which is to a large extent erroneous, is corrected, and their data reanalyzed using their and alternative outgroups. The tree topologies remain similar to each other as well as to the original qualitative analysis since there is little inside homoplasy but the changes proposed by Cranston and Humphries are shown invalid.  相似文献   

14.
Consider the model yijk=u ± ai ± bi ± cij ± eijk i=1, 2,…, t; j=1, 2,…b; k=1, 2,…,nij where μ is a constant and ai, bi, cij are distributed independently and normally with zero means and variances Δ2 Δ2/bdij and δ2 respectively. It is assumed that di's, and dij's are known (positive) constants (for all i and j). In this paper procedures for estimating the variance components (Δ2, Δ2b and Δ2a) and for testing the hypothesis Hoc2c2 = y3 and Hoa2b2 = y4 (where y2, y3, and y4, are specified constants) are presented. A generalization for the mixed model case is discussed in the last section.  相似文献   

15.
The most commonly used measure of evolutionary distance in molecular phylogenetics is the number of nucleotide substitutions per site. However, this number is not necessarily most efficient for reconstructing a phylogenetic tree. In order to evaluate the accuracy of evolutionary distance, D(t), for obtaining the correct tree topology, an accuracy index, A(t), was proposed. This index is defined as D'(t)/square root of[D(t)], where D'(t) is the first derivative of D(t) with respect to evolutionary time and V[D(t)] is the sampling variance of evolutionary distance. Using A(t), namely, finding the condition under which A(t) gives the maximum value, we can obtain an evolutionary distance which is efficient for obtaining the correct topology. Under the assumption that the transversional changes do not occur as frequently as the transitional changes, we obtained the evolutionary distances which are expected to give the correct topology more often than are the other distances.   相似文献   

16.
Consider the model Yijk=μ + ai + bij + eijk (i=1, 2,…, t; j=1,2,…, Bi; k=1,2…,nij), where μ is a constant and a1,bij and eijk are distributed independently and normally with zero means and variances σ2adij and σ2, respectively, where it is assumed that the di's and dij's are known. In this paper procedures for estimating the variance components (σ2, σ2a and σ2b) and for testing the hypothesis σ2b = 0 and σ2a = 0 are presented. In the last section the mixed model yijk, where xijkkm are known constants and βm's are unknown fixed effects (m = 1, 2,…,p), is transformed to a fixed effect model with equal variances so that least squares theory can be used to draw inferences about the βm's.  相似文献   

17.
Abstract Comparative methods are widely used in ecology and evolution. The most frequently used comparative methods are based on an explicit evolutionary model. However, recent approaches have been popularized that are without an evolutionary basis or an underlying null model. Here we highlight the limitations of such techniques in comparative analyses by using simulations to compare two commonly used comparative methods with and without evolutionary basis, respectively: generalized least squares (GLS) and phylogenetic eigenvector regression (PVR). We find that GLS methods are more efficient at estimating model parameters and produce lower variance in parameter estimates, lower phylogenetic signal in residuals, and lower Type I error rates than PVR methods. These results can very likely be generalized to eigenvector methods that control for space and both space and phylogeny. We highlight that GLS methods can be adapted in numerous ways and that the variance structure used in these models can be flexibly optimized to each data set.  相似文献   

18.
A stepwise algorithm for finding minimum evolution trees   总被引:7,自引:6,他引:1  
A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets.   相似文献   

19.
Helical conformations of infinite polymer chains may be described by the helical parameters, d and θ (the translation along the helix axis and the angle of rotation about the axis per repeat unit), pi (the distance of the ith atom from the axis), dij, and dij (the translation along the axis and the angle of rotation, respectively, on passing from the ith atom to the jth). A general method has been worked out for calculating all those helical parameters from the bond lengths, bond angles, and internal-rotation angles. The positions of the main chain and side chain atoms with respect to the axis may also be calculated. All the equations are applicable to any helical polymer chain and are readily programmed for electronic computers. A method is also presented for calculating the partial derivatives of helical parameters with respect to molecular parameters.  相似文献   

20.
Pollen and seed dispersal are key processes affecting the demographic and evolutionary dynamics of plant species and are also important considerations for the sustainable management of timber trees. Through direct and indirect genetic analyses, we studied the mating system and the extent of pollen and seed dispersal in an economically important timber species, Entandrophragma cylindricum (Meliaceae). We genotyped adult trees, seeds and saplings from a 400‐ha study plot in a natural forest from East Cameroon using eight nuclear microsatellite markers. The species is mainly outcrossed (= 0.92), but seeds from the same fruit are often pollinated by the same father (correlated paternity, rp = 0.77). An average of 4.76 effective pollen donors (Nep) per seed tree contributes to the pollination. Seed dispersal was as extensive as pollen dispersal, with a mean dispersal distance in the study plot approaching 600 m, and immigration rates from outside the plot to the central part of the plot reaching 40% for both pollen and seeds. Extensive pollen‐ and seed‐mediated gene flow is further supported by the weak, fine‐scale spatial genetic structure (Sp statistic = 0.0058), corresponding to historical gene dispersal distances (σg) reaching approximately 1,500 m. Using an original approach, we showed that the relatedness between mating individuals (Fij = 0.06) was higher than expected by chance, given the extent of pollen dispersal distances (expected Fij = 0.02 according to simulations). This remarkable pattern of assortative mating could be a phenomenon of potentially consequential evolutionary and management significance that deserves to be studied in other plant populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号