首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high.  相似文献   

2.
We explore the maximum parsimony (MP) and ancestral maximum likelihood (AML) criteria in phylogenetic tree reconstruction. Both problems are NP-hard, so we seek approximate solutions. We formulate the two problems as Steiner tree problems under appropriate distances. The gist of our approach is the succinct characterization of Steiner trees for a small number of leaves for the two distances. This enables the use of known Steiner tree approximation algorithms. The approach leads to a 16/9 approximation ratio for AML and asymptotically to a 1.55 approximation ratio for MP.  相似文献   

3.
Summary The problem of determining the minimal phylogenetic tree is discussed in relation to graph theory. It is shown that this problem is an example of the Steiner problem in graphs which is to connect a set of points by a minimal length network where new points can be added. There is no reported method of solving realistically-sized Steiner problems in reasonable computing time. A heuristic method of approaching the phylogenetic problem is presented, together with a worked example with 7 mammalian cytochrome c sequences. It is shown in this case that the method develops a phylogenetic tree that has the smallest possible number of amino acid replacements. The potential and limitations of the method are discussed. It is stressed that objective methods must be used for comparing different trees. In particular it should be determined how close a given tree is to a mathematically determined lower bound. A theorem is proved which is used to establish a lower bound on the length of any tree and if a tree is found with a length equal to the lower bound, then no shorter tree can exist.  相似文献   

4.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

5.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

6.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

7.
Lake's evolutionary parsimony (EP) method of constructing a phylogenetic tree is primarily applied to four DNA sequences. In this method, three quantities--X, Y, and Z--that correspond to three possible unrooted trees are computed, and an invariance property of these quantities is used for choosing the best tree. However, Lake's method depends on a number of unrealistic assumptions. We therefore examined the theoretical basis of his method and reached the following conclusions: (1) When the rates of two transversional changes from a nucleotide are unequal, his invariance property breaks down. (2) Even if the rates of two transversional changes are equal, the invariance property requires some additional conditions. (3) When Kimura's two- parameter model of nucleotide substitution applies and the rate of nucleotide substitution varies greatly with branch, the EP method is generally better than the standard maximum-parsimony (MP) method in recovering the correct tree but is inferior to the neighbor-joining (NJ) and a few other distance matrix methods. (4) When the rate of nucleotide substitution is the same or nearly the same for all branches, the EP method is inferior to the MP method even if the proportion of transitional changes is high. (5) When Lake's assumptions fail, his chi2 test may identify an erroneous tree as the correct tree. This happens because the test is not for comparing different trees. (6) As long as a proper distance measure is used, the NJ method is better than the EP and MP methods whether there is a transition/transversion bias or whether there is variation in substitution rate among different nucleotide sites.   相似文献   

8.
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.  相似文献   

9.
Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP‐EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well‐supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree‐search routines and limitations of the MP‐EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP‐EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.  相似文献   

10.
The common intervals of two permutations on n elements are the subsets of terms contiguous in both permutations. They constitute the most basic representation of conserved local order. We use d, the size of the symmetric difference (the complement of the common intervals) of the two subsets of 2({1,n}) thus determined by two permutations, as an evolutionary distance between the gene orders represented by the permutations. We consider the Steiner Tree problem in the space (2({1,n}), d) as the basis for constructing phylogenetic trees, including ancestral gene orders. We extend this to genomes with unequal gene content and to genomes containing gene families. Applied to streptophyte phylogeny, our method does not support the positioning of the complex algae Charales as a sister group to the land plants. Simulations show that the method, though unmotivated by any specific model of genome rearrangement, accurately reconstructs a tree from artificial genome data generated by random inversions deriving each genome from its ancestor on this tree.  相似文献   

11.
This paper describes the inferential method, an approach for reconstructing protein and nucleotide sequences of ancestral species, starting from known, homologous, contemporary sequences. The method requires knowledge of the topology of the phylogenetic tree, whose nodes are the species to whom the reconstructed sequences belong.The method has been tested by computer simulation of speciation and nucleotide substitutions, starting from a single ancestral sequence, and by subsequent reconstruction of nodal sequences. Results have shown that reconstructions obtained by the inferential method are affected by limited error frequencies, which (1) are proportional to the squares of nucleotide substitution rates and of internodal distances, and (2) are little influenced by non-uniformity of transformation rates of nucleotides.Furthermore, good agreement of the results has been obtained by comparing protein-sequence reconstructions carried out with the inferential method with those obtained using the maximum parsimony method in two different cases: e.g., a reconstruction of simulated sequences and a reconstruction of mammalian ribonuclease sequences.Abbreviations used MP maximum parsimony method - ML maximum likelihood method - IM inferential method - MY millions of years - N-tree natural-like phylogenetic tree - E-tree equibranched phylogenetic tree - EA percentage number of erroneous amino acids in a reconstructed sequence - EC percentage number of erroneous codons in a reconstructed sequence - t n time interval between a P- and its - F-sequence nucleotides and amino acids are indicated by their I.U.B. codes (N.C.-I.U.B., 1985) Correspondence to: A. Di Donato  相似文献   

12.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.  相似文献   

13.
The problem of determining an optimal phylogenetic tree from a set of data is an example of the Steiner problem in graphs. There is no efficient algorithm for solving this problem with reasonably large data sets. In the present paper an approach is described that proves in some cases that a given tree is optimal without testing all possible trees. The method first uses a previously described heuristic algorithm to find a tree of relatively small total length. The second part of the method independently analyses subsets of sites to determine a lower bound on the length of any tree. We simultaneously attempt to reduce the total length of the tree and increase the lower bound. When these are equal it is not possible to make a shorter tree with a given data set and given criterion. An example is given where the only two possible minimal trees are found for twelve different mammalian cytochrome c sequences. The criterion of finding the smallest number of minimum base changes was used. However, there is no general method of guaranteeing that a solution will be found in all cases and in particular better methods of improving the estimate of the lower bound need to be developed.  相似文献   

14.
15.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

16.

Background  

Neighbor-Net is a novel method for phylogenetic analysis that is currently being widely used in areas such as virology, bacteriology, and plant evolution. Given an input distance matrix, Neighbor-Net produces a phylogenetic network, a generalization of an evolutionary or phylogenetic tree which allows the graphical representation of conflicting phylogenetic signals.  相似文献   

17.
The cladistic literature does not always specify the kind of multistate character treatment that is applied for an analysis. Characters can be treated either as unordered transformation series or as rooted [three‐item analysis (3ia)] or unrooted state trees (ordered characters). We aimed to measure the impact of these character treatments on phylogenetic inference. Discrete characters can be represented either as rows or columns in matrices (e.g. for parsimony) or as hierarchies for 3ia. In the present study, we use simulated and empirical examples to assess the relative merits of each method considering both the character treatment and representation. We measure two parameters (resolving power and artefactual resolution) using a new tree comparison metric, ITRI (inter‐tree retention index). Our results suggest that the hierarchical character representation not only results (with our simulation settings) in the greatest resolving power, but also in the highest artefactual resolution. Our empirical examples provide equivocal results. Parsimony unordered states yield less resolving power and more artefactual resolutions than parsimony ordered states, both with our simulated and empirical data. Relationships between three operational taxonomic units (OTUs), irrespective of their relationships with other OTUs, are called three‐item statements (3is). We compare the intersection tree (which reconstructs a single tree from all of the common 3is of source trees) with the traditional strict consensus and show that the intersection tree retains more of the information contained in the source trees. © 2013 The Linnean Society of London, Biological Journal of the Linnean Society, 2013, 110 , 914–930.  相似文献   

18.
A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process. Correspondence to: Z. Yang  相似文献   

19.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

20.
In phylogenetic inference, an evolutionary model describes the substitution processes along each edge of a phylogenetic tree. Misspecification of the model has important implications for the analysis of phylogenetic data. Conventionally, however, the selection of a suitable evolutionary model is based on heuristics or relies on the choice of an approximate input tree. We introduce a method for model Selection in Phylogenetics based on linear INvariants (SPIn), which uses recent insights on linear invariants to characterize a model of nucleotide evolution for phylogenetic mixtures on any number of components. Linear invariants are constraints among the joint probabilities of the bases in the operational taxonomic units that hold irrespective of the tree topologies appearing in the mixtures. SPIn therefore requires no input tree and is designed to deal with nonhomogeneous phylogenetic data consisting of multiple sequence alignments showing different patterns of evolution, for example, concatenated genes, exons, and/or introns. Here, we report on the results of the proposed method evaluated on multiple sequence alignments simulated under a variety of single-tree and mixture settings for both continuous- and discrete-time models. In the simulations, SPIn successfully recovers the underlying evolutionary model and is shown to perform better than existing approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号