首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

2.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

3.
Summary The statistical properties of three molecular tree construction methods—the unweighted pair-group arithmetic average clustering (UPG), Farris, and modified Farris methods—are examined under the neutral mutation model of evolution. The methods are compared for accuracy in construction of the topology and estimation of the branch lengths, using statistics of these two aspects. The distribution of the statistic concerning topological construction is shown to be as important as its mean and variance for the comparison.Of the three methods, the UPG method constructs the tree topology with the least variation. The modified Farris method, however, gives the best performance when the two aspects are considered simultaneously. It is also shown that a topology based on two genes is much more accurate than that based on one gene.There is a tendency to accept published molecular trees, but uncritical acceptance may lead one to spurious conclusions. It should always be kept in mind that a tree is a statistical result that is affected strongly by the stochastic error of nucleotide substitution and the error intrinsic to the tree construction method itself.  相似文献   

4.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:702,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

5.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

6.
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis.  相似文献   

7.
Summary The methods of Fitch and Margoliash and of Farris for the construction of phylogenetic trees were compared. A phenetic clustering technique - the UPGMA method — was also considered.The three methods were applied to difference matrices obtained from comparison of macromolecules by immunological, DNA hybridization, electrophoretic, and amino acid sequencing techniques. To evaluate the results, we used the goodness-of-fit criterion. In some instances, the F-M and Farris methods gave a comparably good fit of the output to the input data, though in most cases the F-M procedure gave a much better fit. By the fit criterion, the UPGMA procedure was on the average better than the Farris method but not as good as the F-M procedure.On the basis of the results given in this report and the goodness-of-fit criterion, it is suggested that where input data are likely to include overestimates as well as true estimates and underestimates of the actual distances between taxonomic units, the F-M method is the most reasonable to use for constructing phylogenies from distance matrices. Immunological, DNA hybridization, and electrophoretic data fall into this category. By contrast, where it is known that each input datum is indeed either a true estimate or an underestimate of the actual distance between 2 taxonomic units, the Farris procedure appears, on theoretical grounds, to be the matrix method of choice. Amino acid and nucleotide sequence data are in this category.The following abbreviations are used in this work F-M Fitch-Margoliash - UPGMA unweighted pair-group method using arithmetic averages - SD percent standard deviation  相似文献   

8.
Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

9.
The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high.  相似文献   

10.
The nucleotide substitution matrix inferred from avian data sets using cytochrome b differs considerably from the models commonly used in phylogenetic analyses. To analyze the possible effects of this particular pattern of change in phylogeny estimation we performed a computer simulation in which we started with a real sequence and used the inferred model of change to produce a tree of 10 species. Maximum parsimony (MP), maximum likelihood (ML), and various distance methods were then used to recover the topology and the branch lengths. We used two kinds of data with varying levels of variation. In addition, we tested with the removal of third positions and different weighting schemes. At low levels of variation, MP was outstanding in recovering the topology (90% correct), while unweighted pair-group method, arithmetic average (UPGMA), regardless of distances used, was poor (40%). At the higher level, most methods had a chance of around 40%-58% of finding the true tree. However, in most cases, the trees found were only slightly wrong, with only one or a few branches misplaced. On the other hand, the use of a "wrong" model had serious effects on the estimation of branch lengths (distances). Although precision was high, accuracy was poor with most methods, giving branch lengths that were biased downward. When seeded with the true distance matrix, Fitch and NJ always found the true tree, while UPGMA frequently failed to do so. The effect of removing third positions was dramatic at low levels of variation, because only one MP program was able to find a true tree at all, albeit rarely, while none of the others ever did so. At higher levels, the situation was better, but still much worse than with the whole data set.  相似文献   

11.
Summary The effects of temporal (among different branches of a phylogeny) and spatial (among different nucleotide sites within a gene) nonuniformities of nucleotide substitution rates on the construction of phylogenetic trees from nucleotide sequences are addressed. Spatial nonuniformity may be estimated by using Shannon's (1948) entropy formula to measure the Relative Nucleotide Variability (RNV) at each nucleotide site in an aligned set of sequences; this is demonstrated by a comparative analysis of 5S rRNAs. New methods of constructing phylogenetic trees are proposed that augment the Unweighted Pair-Group Using Arithmetic Averages (UPGMA) algorithm by estimating and compensating for both spatial and temporal nonuniformity in substitution rates. These methods are evaluated by computer simulations of 5S rRNA evolution that include both kinds of nonuniformities. It was found that the proposed Reference Ratio Method improved both the ability to reconstruct the correct topology of a tree and also the estimation of branch lengths as compared to UPGMA. A previous method (Farris et al. 1970; Klotz et al. 1979; Li 1981) was found to be less successful in reconstructing topologies when there is high probability of multiple mutations at some sites. Phylogenetic analyses of 5S rRNA sequences support the endosymbiotic origins of both chloroplasts and mitochondria, even though the latter exhibit an accelerated rate of nucleotide substitution. Phylogenetic trees also reveal an adaptive radiation within the eubacteria and another within the eukaryotes for the origins of most major phyla within each group during the Precambrian era.  相似文献   

12.
We examine whether phylogenetic methods provide biased estimates of tree shape with respect to the random branching model. We investigate the performance of five commonly used phylogenetic methods using computer simulation: (1) maximum parsimony; (2) neighbor joining; (3) UPGMA with an outgroup taxon; (4) UPGMA without an outgroup taxon; and (5) maximum likelihood. All methods provide estimates of tree shape that are, on average, more asymmetrical than the true tree, especially when rates of evolution are high. We suggest a simple explanation for the bias and propose a modified test of tree shape that corrects for it.  相似文献   

13.
The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two- parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

14.
Summary In the maximum likelihood (ML) method for estimating a molecular phylogenetic tree, the pattern of nucleotide substitutions for computing likelihood values is assumed to be simpler than that of the actual evolutionary process, simply because the process, considered to be quite devious, is unknown. The problem, however, is that there has been no guarantee to endorse the simplification.To study this problem, we first evaluated the robustness of the ML method in the estimation of molecular trees against different nucleotide substitution patterns, including Jukes and Cantor's, the simplest ever proposed. Namely, we conducted computer simulations in which we could set up various evolutionary models of a hypothetical gene, and define a true tree to which an estimated tree by the ML method was to be compared. The results show that topology estimation by the ML method is considerably robust against different ratios of transitions to transversions and different GC contents, but branch length estimation is not so. The ML tree estimation based on Jukes and Cantor's model is also revealed to be resistant to GC content, but rather sensitive to the ratio of transitions to transversions.We then applied the ML method with different substitution patterns to nucleotide sequence data ontax gene from T-cell leukemia viruses whose evolutionary process must have been more complicated than that of the hypothetical gene. The results are in accordance with those from the simulation study, showing that Jukes and Cantor's model is as useful as a more complicated one for making inferences about molecular phylogeny of the viruses.  相似文献   

15.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

16.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

17.
蚊科三十八个已知属的系统发育数值分析   总被引:6,自引:0,他引:6  
瞿逢伊  钱国正 《昆虫学报》1993,36(1):103-109
  相似文献   

18.
The most commonly used measure of evolutionary distance in molecular phylogenetics is the number of nucleotide substitutions per site. However, this number is not necessarily most efficient for reconstructing a phylogenetic tree. In order to evaluate the accuracy of evolutionary distance, D(t), for obtaining the correct tree topology, an accuracy index, A(t), was proposed. This index is defined as D'(t)/square root of[D(t)], where D'(t) is the first derivative of D(t) with respect to evolutionary time and V[D(t)] is the sampling variance of evolutionary distance. Using A(t), namely, finding the condition under which A(t) gives the maximum value, we can obtain an evolutionary distance which is efficient for obtaining the correct topology. Under the assumption that the transversional changes do not occur as frequently as the transitional changes, we obtained the evolutionary distances which are expected to give the correct topology more often than are the other distances.   相似文献   

19.
Summary Statistical properties of the ordinary least-squares (OLS), generalized least-squares (GLS), and minimum-evolution (ME) methods of phylogenetic inference were studied by considering the case of four DNA sequences. Analytical study has shown that all three methods are statistically consistent in the sense that as the number of nucleotides examined (m) increases they tend to choose the true tree as long as the evolutionary distances used are unbiased. When evolutionary distances (dij's) are large and sequences under study are not very long, however, the OLS criterion is often biased and may choose an incorrect tree more often than expected under random choice. It is also shown that the variance-covariance matrix of dij's becomes singular as dij's approach zero and thus the GLS may not be applicable when dij's are small. The ME method suffers from neither of these problems, and the ME criterion is statistically unbiased. Computer simulation has shown that the ME method is more efficient in obtaining the true tree than the OLS and GLS methods and that the OLS is more efficient than the GLS when dij's are small, but otherwise the GLS is more efficient.Offprint requests to: M. Nei  相似文献   

20.
Summary Operator metrics are explicity designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch.In the method, lengths (operator metrics) corresponding to each of the branches of an unrooted tree are calculated. The metric length of a branch reconstructs the number of (transversion) differences between sequences at a tip and a node (or between nodes) of a tree. The theory is general and is fundamentally independent of differences in substitution rates among the organisms being compared. Mathematically, the independence has been obtained becuase the metrics are eigen vectors of fundamental equations which describe the evolution of all unrooted trees.Even under conditions when both the distance matrix method or a simple parsimony length method are show to indicate lengths than are an order of magnitude too large or too small, the operator metrics are accurate. Examples, using data calculated with evolutionary rates and branchings designed to confuse the measurement of branch lengths and to camouflage the topology of the true tree, demonstrate the validity of operator metrics. The method is robust. Operator metric distances are easy to calculated, can be extended to any number of taxa, and provide a statistical estimate of their variances.The utility of the method is demonstrated by using it to analyze the origins and evolutionary of chloroplasts, mitochondria, and eubacteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号