首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
The method of evolutionary parsimony--or operator invariants--is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree. In both parsimony and evolutionary parsimony, three patterns of nucleotide substitution are associated one-to-one with the three topologically linked trees for four taxa. In evolutionary parsimony, the three quantities are operator invariants. These invariants are the remnants of substitutions that have occurred in the interior branch of the tree and are analogous to the substitutions assigned to the central branch by parsimony. The two invariants associated with the incorrect trees must equal zero (statistically), whereas only the correct tree can have a nonzero invariant. The chi 2-test is used to ascertain the nonzero invariant and the statistically favored tree. Examples, obtained using data calculated with evolutionary rates and branchings designed to camouflage the true tree, show that the method accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail). As the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge. The method is robust and easy to use.   相似文献   

2.
Yang Z 《Systematic biology》1998,47(1):125-133
The effect of the evolutionary rate of a gene on the accuracy of phylogeny reconstruction was examined by computer stimulation. The evolutionary rate is measured by the tree length, that is, the expected total number of nucleotide substitutions per site on the phylogeny. DNA sequence data were simulated using both fixed trees with specified branch lengths and random trees with branch lengths generated from a model of cladogenesis. The parsimony and likelihood methods were used for phylogeny reconstruction, and the proportion of correctly recovered branch partitions by each method was estimated. Phylogenetic methods including parsimony appear quite tolerant of multiple substitutions at the same site. The optimum levels of sequence divergence were even higher than upper limits previously suggested for saturation of substitutions, indicating that the problem of saturation may have been exaggerated. Instead, the lack of information at low levels of divergence should be seriously considered in evaluation of a gene's phylogenetic utility, especially when the gene sequence is short. The performance of parsimony, relative to that of likelihood, does not necessarily decrease with the increase of the evolutionary rate.  相似文献   

3.
The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them.   相似文献   

4.
The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two- parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

5.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

6.
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis.  相似文献   

7.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

8.
载脂蛋白多基因家族分子进化的研究   总被引:2,自引:2,他引:0  
王乐  柴建华 《遗传学报》1994,21(2):81-95
与脂质运输有关的载脂蛋白基因构成一个复杂的多基因家族。为探讨这种演化时间长的基因家族的进化规律,本文首先建立了一种在非均衡进化速率条件下计算系统发生树中任意分支长度的简易方法,并可在此基础上算出无根分支系统树中分歧年代的期望值。进一步对本文科10个种属共26种载脂蛋白的系统演作作了实际分析,结果提示:①ApoA-I'ApoA-IV,ApoE及ApoA-II的共同祖先可能在奥陶纪水生脊椎动物中就已存  相似文献   

9.
Lake's evolutionary parsimony (EP) method of constructing a phylogenetic tree is primarily applied to four DNA sequences. In this method, three quantities--X, Y, and Z--that correspond to three possible unrooted trees are computed, and an invariance property of these quantities is used for choosing the best tree. However, Lake's method depends on a number of unrealistic assumptions. We therefore examined the theoretical basis of his method and reached the following conclusions: (1) When the rates of two transversional changes from a nucleotide are unequal, his invariance property breaks down. (2) Even if the rates of two transversional changes are equal, the invariance property requires some additional conditions. (3) When Kimura's two- parameter model of nucleotide substitution applies and the rate of nucleotide substitution varies greatly with branch, the EP method is generally better than the standard maximum-parsimony (MP) method in recovering the correct tree but is inferior to the neighbor-joining (NJ) and a few other distance matrix methods. (4) When the rate of nucleotide substitution is the same or nearly the same for all branches, the EP method is inferior to the MP method even if the proportion of transitional changes is high. (5) When Lake's assumptions fail, his chi2 test may identify an erroneous tree as the correct tree. This happens because the test is not for comparing different trees. (6) As long as a proper distance measure is used, the NJ method is better than the EP and MP methods whether there is a transition/transversion bias or whether there is variation in substitution rate among different nucleotide sites.   相似文献   

10.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p- distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree- making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.   相似文献   

11.
Functional evolution is often driven by positive natural selection. Although it is thought to be rare in evolution at the molecular level, its effects may be observed as the accelerated evolutionary rates. Therefore one of the effective ways to identify functional evolution is to identify accelerated evolution. Many methods have been developed to test the statistical significance of the accelerated evolutionary rate by comparison with the appropriate reference rate. The rates of synonymous substitution are one of the most useful and popular references, especially for large-scale analyses. On the other hand, these rates are applicable only to a limited evolutionary time period because they saturate quickly--i.e., multiple substitutions happen frequently because of the lower functional constraint. The relative rate test is an alternative method. This technique has an advantage in terms of the saturation effect but is not sufficiently powerful when the evolutionary rate differs considerably among phylogenetic lineages. For the aim to provide a universal reference tree, we propose a method to construct a standardized tree which serves as the reference for accelerated evolutionary rate. The method is based upon multiple molecular phylogenies of single genes with the aim of providing higher reliability. The tree has averaged and normalized branch lengths with standard deviations for statistical neutrality limits. The standard deviation also suggests the reliability level of the branch order. The resulting tree serves as a reference tree for the reliability level of the branch order and the test of evolutionary rate acceleration even when some of the species lineages show an accelerated evolutionary rate for most of their genes due to bottlenecking and other effects.  相似文献   

12.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:702,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

13.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

14.
Summary A procedure is presented that forms an unrooted tree-like structure from a matrix of pairwise differences. The tree is not formed a portion at a time, as methods now in use generally do, but is formed en toto without intervening estimates of branch lengths. The method is based on a relaxed additivity (four-point metric) constraint. From the tree, a classification may be formed.  相似文献   

15.
Using simulated data, we compared five methods of phylogenetic tree estimation: parsimony, compatibility, maximum likelihood, Fitch- Margoliash, and neighbor joining. For each combination of substitution rates and sequence length, 100 data sets were generated for each of 50 trees, for a total of 5,000 replications per condition. Accuracy was measured by two measures of the distance between the true tree and the estimate of the tree, one measure sensitive to accuracy of branch lengths and the other not. The distance-matrix methods (Fitch- Margoliash and neighbor joining) performed best when they were constrained from estimating negative branch lengths; all comparisons with other methods used this constraint. Parsimony and compatibility had similar results, with compatibility generally inferior; Fitch- Margoliash and neighbor joining had similar results, with neighbor joining generally slightly inferior. Maximum likelihood was the most successful method overall, although for short sequences Fitch- Margoliash and neighbor joining were sometimes better. Bias of the estimates was inferred by measuring whether the independent estimates of a tree for different data sets were closer to the true tree than to each other. Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches. When rates of evolution varied among different sites, all methods showed signs of inaccuracy and bias.   相似文献   

16.
Quartet-mapping, a generalization of the likelihood-mapping procedure.   总被引:5,自引:0,他引:5  
Likelihood-mapping (LM) was suggested as a method of displaying the phylogenetic content of an alignment. However, statistical properties of the method have not been studied. Here we analyze the special case of a four-species tree generated under a range of evolution models and compare the results with those of a natural extension of the likelihood-mapping approach, geometry-mapping (GM), which is based on the method of statistical geometry in sequence space. The methods are compared in their abilities to indicate the correct topology. The performance of both methods in detecting the star topology is especially explored. Our results show that LM tends to reject a star tree more often than GM. When assumptions about the evolutionary model of the maximum-likelihood reconstruction are not matched by the true process of evolution, then LM shows a tendency to favor one tree, whereas GM correctly detects the star tree except for very short outer branch lengths with a statistical significance of >0.95 for all models. LM, on the other hand, reconstructs the correct bifurcating tree with a probability of >0.95 for most branch length combinations even under models with varying substitution rates. The parameter domain for which GM recovers the true tree is much smaller. When the exterior branch lengths are larger than a (analytically derived) threshold value depending on the tree shape (rather than the evolutionary model), GM reconstructs a star tree rather than the true tree. We suggest a combined approach of LM and GM for the evaluation of starlike trees. This approach offers the possibility of testing for significant positive interior branch lengths without extensive statistical and computational efforts.  相似文献   

17.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

18.
We have developed a rapid parsimony method for reconstructing ancestral nucleotide states that allows calculation of initial branch lengths that are good approximations to optimal maximum-likelihood estimates under several commonly used substitution models. Use of these approximate branch lengths (rather than fixed arbitrary values) as starting points significantly reduces the time required for iteration to a solution that maximizes the likelihood of a tree. These branch lengths are close enough to the optimal values that they can be used without further iteration to calculate approximate maximum-likelihood scores that are very close to the "exact" scores found by iteration. Several strategies are described for using these approximate scores to substantially reduce times needed for maximum-likelihood tree searches.  相似文献   

19.
Recent years have seen an increasing effort to incorporate phylogenetic hypotheses to the study of community assembly processes. The incorporation of such evolutionary information has been eased by the emergence of specialized software for the automatic estimation of partially resolved supertrees based on published phylogenies. Despite this growing interest in the use of phylogenies in ecological research, very few studies have attempted to quantify the potential biases related to the use of partially resolved phylogenies and to branch length accuracy, and no work has examined how tree shape may affect inference of community phylogenetic metrics. In this study, we tested the influence of phylogenetic resolution and branch length information on the quantification of phylogenetic structure, and also explored the impact of tree shape (stemminess) on the loss of accuracy in phylogenetic structure quantification due to phylogenetic resolution. For this purpose, we used 9 sets of phylogenetic hypotheses of varying resolution and branch lengths to calculate three indices of phylogenetic structure: the mean phylogenetic distance (NRI), the mean nearest taxon distance (NTI) and phylogenetic diversity (stdPD) metrics. The NRI metric was the less sensitive to phylogenetic resolution, stdPD showed an intermediate sensitivity, and NTI was the most sensitive one; NRI was also less sensitive to branch length accuracy than NTI and stdPD, the degree of sensitivity being strongly dependent on the dating method and the sample size. Directional biases were generally towards type II errors. Interestingly, we detected that tree shape influenced the accuracy loss derived from the lack of phylogenetic resolution, particularly for NRI and stdPD. We conclude that well‐resolved molecular phylogenies with accurate branch length information are needed to identify the underlying phylogenetic structure of communities, and also that sensitivity of phylogenetic structure measures to low phylogenetic resolution can strongly vary depending on phylogenetic tree shape.  相似文献   

20.
Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a nonconvex optimization problem where the variance of log-transformed rate multipliers is minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号