首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Summary We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more. characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound.The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 9603440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a tree minimal but the method will give an upper and lower bound on the length of minimal trees.  相似文献   

3.
A branch and bound algorithm is described for searching rapidlyfor minimal length trees from biological data. The algorithmadds characters one at a time, rather than adding taxa, as inprevious branch and bound methods. The algorithm has been programmedand is available from the authors. A worked example is givenwith 33 characters and 15 taxa. About 8 x 1012 binary treesare possible with 15 taxa but the branch and bound program findsthe minimal tree in <5 min on an IBM PC. Received on January 15, 1987; accepted on February 23, 1987  相似文献   

4.
Majority-rule reduced consensus trees and their use in bootstrapping   总被引:3,自引:0,他引:3  
Bootstrap analyses are usually summarized with majority-rule component consensus trees. This consensus method is based on replicated components and, like all component consensus methods, it is insensitive to other kinds of agreement between trees. Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among multiple trees. The new methods are "strict" in the sense that they require agreement among all the trees being compared for any relationships to be represented in a consensus tree. Majority-rule reduced consensus methods are described and their use in bootstrap analyses is illustrated with a hypothetical and a real example. The new methods provide summaries of the bootstrap proportions of all n-taxon statements/partitions and facilitate the identification of hypotheses of relationships that are supported by high bootstrap proportions, in spite of a lack of support for particular components or clades. In practice majority-rule reduced consensus profiles may contain many trees. The size of the profile can be reduced by constraints on minimal bootstrap proportions and/or cardinality of the included trees. Majority-rule reduced consensus trees can also be selected a posteriori from the profile. Surrogates to the majority-rule reduced consensus methods using partition tables or tree pruning options provided by widely used phylogenetic inference software are also described. The methods are designed to produce more informative summaries of bootstrap analyses and thereby foster more informed assessment of the strengths and weaknesses of complex phylogenetic hypotheses.   相似文献   

5.
The problem of determining an optimal phylogenetic tree from a set of data is an example of the Steiner problem in graphs. There is no efficient algorithm for solving this problem with reasonably large data sets. In the present paper an approach is described that proves in some cases that a given tree is optimal without testing all possible trees. The method first uses a previously described heuristic algorithm to find a tree of relatively small total length. The second part of the method independently analyses subsets of sites to determine a lower bound on the length of any tree. We simultaneously attempt to reduce the total length of the tree and increase the lower bound. When these are equal it is not possible to make a shorter tree with a given data set and given criterion. An example is given where the only two possible minimal trees are found for twelve different mammalian cytochrome c sequences. The criterion of finding the smallest number of minimum base changes was used. However, there is no general method of guaranteeing that a solution will be found in all cases and in particular better methods of improving the estimate of the lower bound need to be developed.  相似文献   

6.
Pompei S  Loreto V  Tria F 《PloS one》2011,6(6):e20109
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.  相似文献   

7.
The strengths and weaknesses of phylogenetic analysis using computers are reviewed from the viewpoint of understanding crustacean evolution. Computerized methods require the explicit presentation of characters and character state homologies. New techniques allow investigators to design evolutionary models into a character data matrix, or to use evolutionary models that make minimal a priori assumptions. The computer analysis relieves the investigator from the highly repetitious testing of trees, allows the concentration on the character state data, and provides objective methods for comparing trees, primarily their length. These are regarded as the strengths of computerized methods. The weaknesses of these methods include the relatively inscrutable nature of the character data matrix compared with the overall ‘gestalt’ of resulting trees, the difficulties of defining discrete homologies within the Crustacea, especially for counts of segmentation, the lack of clear intermediate character states in some multistate segmental characters, and the inability to define evolutionary polarity. These difficulties may be overcome by analysing the data using the minimal assumption models of character evolution, and by a recognition that the trees are a result of the input data, and therefore the data should be criticized, rather than the trees themselves. A ‘consensus’ character data set, including most extant major groups of the Crustacea as well as several key fossils, was assembled and revised by the participants in the workshop. An artificial taxon, ‘ur-crustacean characters’, was introduced to root the tree. Three observations may be made from parsimony analyses using several weighting and tree rooting methods. (1) The currently accepted large scale phylogeny and classification of the Crustacea is not corroborated. (2) The number of supposed plesiomorphic traits possessed by a taxon is not a good index for early derivation in crustacean evolution. (3) The taxon Maxillopoda is not supported by the arrangement of any of the trees.  相似文献   

8.
Summary In this paper we present an iterative character weighting method for the construction of phyletic trees. An initial tree is used to calculate the character weights, which are the number of mutations normalized so that the possible range is corrected for. The weights obtained are used to adjust the tree; this process is iterated until a stable tree is found. Using data generated according to a model tree, we show that the trees constructed by the iterative character weighting method converge to the true underlying tree. Using biological data, the trees become closer to the systematic classification of the species concerned, and patterns conflicting with the phylogenetic pattern can be singled out. The method involves a combination of minimal length methods and similarity methods, whereby the strict parsimony criterion is relaxed.  相似文献   

9.
Synecological analyses are usually based on typological, phenetic and cladistic methods. The disadvantages of these techniques are shown. The application of the Wagner parsimony method to synecology is considered. All the methods need some prerequisites, viz. definitions of localities and characters (the most simple one being the presence/absence of taxa); the choice of taxonomic level of taxa; their autochthony. The application of Wagner parsimony needs a new terminology. The congruence of any environmental condition, including freshwater monitoring indices, can be tested on parsimonious trees. The Wagner parsimony method not only provides various indices (tree length, CI, HI, RC, RI) which allow the comparison of trees but also minimal trees which are direct tools in synecology.  相似文献   

10.
THE EFFECT OF ORDERED CHARACTERS ON PHYLOGENETIC RECONSTRUCTION   总被引:2,自引:0,他引:2  
Abstract Morphological structures are likely to undergo more than a single change during the course of evolution. As a result, multistate characters are common in systematic studies and must be dealt with. Particularly interesting is the question of whether or not multistate characters should be treated as ordered (additive) or unordered (non-additive). In accepting a particular hypothesis of order, numerous others are necessarily rejected. We review some of the criteria often used to order character states and the underlying assumptions inherent in these criteria.
The effects that ordered multistate characters can have on phylogenetic reconstruction are examined using 27 data sets. It has been suggested that hypotheses of character state order are more informative then hypotheses of unorder and may restrict the number of equally parsimonious trees as well as increase tree resolution. Our results indicate that ordered characters can produce more, equal or less equally parsimonious trees and can increase, decrease or have no effect on tree resolution. The effect on tree resolution can be a simple gain in resolution or a dramatic change in sister-taxa relationships. In cases where several outgroups are included in the data matrix, hypotheses of order can change character polarities by altering outgroup topology. Ordered characters result in a different topology from unordered characters only when the hierarchy of the cladogram disagrees with the investigator's a priori hypothesis of order. If the best criterion for assessing character evolution is congruence with other characters, the practice of ordering multistate characters is inappropriate.  相似文献   

11.
The contribution ofJ. S. L. Gilmour to numerical taxonomy is reviewed. His important concept of natural classification, as being general-purpose classifications with high predictivity, led to the development of ideas of information content, unit characters and equal character-weighting. The concept of predicitivity is extended to taxonomic trees (phenograms or cladograms). Under certain assumption of random sampling of characters it is shown that the probability of recovering the correct tree topology or tree-form may be small if characters are few. There may be very many topologies or tree-forms, every one of which has individually a low probability. It is, however, possible to estimate the aggregate probability of trees which have more than some specified resemblance to the correct tree. The practical prospects of estimating the distribution of tree probabilities are discussed.Dedicated to the memory of JohnS. L. Gilmour.  相似文献   

12.
We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127--150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151--166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.  相似文献   

13.
Phylogenetic systematics is a relatively new formal technique that increases the precision with which one can make direct estimates of the history of phylogenetic descent. These estimates are made in the form of phylogenetic trees, or cladograms. Cladograms may be converted directly into classifications or they may be used to test various hypotheses about the evolutionary process. More than 20 phylogenetic analyses of helminth groups have been published already, and these have been used to investigate evolutionary questions in developmental biology, biogeography, speciation, coevolution, and evolutionary ecology.  相似文献   

14.
The construction and interpretation of gene trees is fundamental in molecular systematics. If the gene is defined in a historical (coalescent) sense, there can be multiple gene trees within the single contiguous set of nucleotides, and attempts to construct a single tree for such a sequence must deal with homoplasy created by conflict among divergent histories. On a larger scale, incongruence is expected among gene tree topologies at different loci of individuals within sexually reproducing species, and it has been suggested that this discordance can be used to delimit species. A practical concern for such topological methods is that polymorphisms may be maintained through numerous cladogenic events; this polymorphism problem is less of a concern for nontopological approaches to species delimitation using molecular data. Although a central theoretical concern in molecular systematics is discordance between a given gene tree and the true "species tree," the primary empirical problem faced in reconstructing taxic phylogeny is incongruence among the trees inferred from different sequences. Linkage relationships limit character independence and thus have important implications for handling multiple data sets in phylogenetic analysis, particularly at the species level, where incongruence among different historically associated loci is expected. Gene trees can also be reconstructed for loci that influence phenotypic characters, but there is at best a tenuous relationship between phenotypic homoplasy and homoplasy in such gene trees. Nevertheless, expression patterns and orthology relationships of genes involved in the expression of phenotypes can in theory provide criteria for homology assessment of morphological characters.  相似文献   

15.
Summary The problem of determining the minimal phylogenetic tree is discussed in relation to graph theory. It is shown that this problem is an example of the Steiner problem in graphs which is to connect a set of points by a minimal length network where new points can be added. There is no reported method of solving realistically-sized Steiner problems in reasonable computing time. A heuristic method of approaching the phylogenetic problem is presented, together with a worked example with 7 mammalian cytochrome c sequences. It is shown in this case that the method develops a phylogenetic tree that has the smallest possible number of amino acid replacements. The potential and limitations of the method are discussed. It is stressed that objective methods must be used for comparing different trees. In particular it should be determined how close a given tree is to a mathematically determined lower bound. A theorem is proved which is used to establish a lower bound on the length of any tree and if a tree is found with a length equal to the lower bound, then no shorter tree can exist.  相似文献   

16.
A portion of mitochondrial 12S rDNA sequences (337-355 base pairs) and 63 morphological characters of 36 hard-tick species belonging to 7 genera were analyzed to determine the phylogenetic relationships among groups and species of Rhipicephalus and between the genera Rhipicephalus and Boophilus. Molecular and morphological data sets were first examined separately. The molecular data were analyzed by maximum parsimony (MP), maximum likelihood, and neighbor-joining distance methods; the morphological data were analyzed by MP After their level of congruence was evaluated by a partition homogeneity test, all characters were combined and analyzed by MP. The branches of the tree obtained by combining the data sets were better resolved than those of the trees inferred from the separate analyses. Boophilus is monophyletic and arose within Rhipicephalus. Boophilus species clustered with species of the Rhipicephalus evertsi group. Most of the clustering within Rhipicephalus was, however, consistent with previous classifications based on morphological data. Morphological characters were traced on the molecular reconstruction in order to identify characters diagnostic for monophyletic clades. Within the Rhipicephalus sanguineus complex, the sequences of specimens morphologically identified as Rhipicephalus turanicus were characterized by a high level of variability, indicating that R. turanicus-like morphology may cover a spectrum of distinct species.  相似文献   

17.
Parsimony methods infer phylogenetic trees by minimizing number of character changes required to explain observed character states. From the perspective of applicability of parsimony methods, it is important to assess whether the characters used to infer phylogeny are likely to provide a correct tree. We introduce a graph theoretical characterization that helps to assess whether given set of characters is appropriate to use with parsimony methods. Given a set of characters and a set of taxa, we construct a network called character overlap graph. We show that the character overlap graph for characters that are appropriate to use in parsimony methods is characterized by significant under-representation of subnetworks known as holes, and provide a validation for this observation. This characterization explains success in constructing evolutionary trees using parsimony method for some characters (e.g., protein domains) and lack of such success for other characters (e.g., introns). In the latter case, the understanding of obstacles to applying parsimony methods in a direct way has lead us to a new approach for detecting inconsistent and/or noisy data. Namely, we introduce the concept of stable characters which is similar but less restrictive than the well known concept of pairwise compatible characters. Application of this approach to introns produces the evolutionary tree consistent with the Coelomata hypothesis.  相似文献   

18.

Background

We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids.

Results

Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously.

Conclusions

The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.  相似文献   

19.
The maximum-likelihood (ML) solution to a simple phylogenetic estimation problem is obtained analytically The problem is estimation of the rooted tree for three species using binary characters with a symmetrical rate of substitution under the molecular clock. ML estimates of branch lengths and log-likelihood scores are obtained analytically for each of the three rooted binary trees. Estimation of the tree topology is equivalent to partitioning the sample space (space of possible data outcomes) into subspaces, within each of which one of the three binary trees is the ML tree. Distance-based least squares and parsimony-like methods produce essentially the same estimate of the tree topology, although differences exist among methods even under this simple model. This seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogeny estimation. The solution to this real phylogeny estimation problem will be useful for studying the problem of significance evaluation.  相似文献   

20.
Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号