首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Nye TM 《Systematic biology》2008,57(5):785-794
Phylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree." A meta-tree clusters phylogenies with similar topologies together in the same way that a phylogeny clusters species with similar DNA sequences. Leaf nodes on a meta-tree correspond to the original set of phylogenies given by some analysis, whereas interior nodes correspond to certain consensus topologies. The construction of meta-trees is motivated by analogy with construction of a most parsimonious tree for DNA data, but instead of using DNA letters, in a meta-tree the characters are partitions or splits of the set of taxa. An efficient algorithm for meta-tree construction is described that makes use of a known relationship between the majority consensus and parsimony in terms of gain and loss of splits. To illustrate these ideas meta-trees are constructed for two datasets: a set of gene trees for species of yeast and trees from a bootstrap analysis of a set of gene trees in ray-finned fish. A software tool for constructing meta-trees and comparing alternative phylogenies is available online, and the source code can be obtained from the author.  相似文献   

2.
The cladistic literature does not always specify the kind of multistate character treatment that is applied for an analysis. Characters can be treated either as unordered transformation series or as rooted [three‐item analysis (3ia)] or unrooted state trees (ordered characters). We aimed to measure the impact of these character treatments on phylogenetic inference. Discrete characters can be represented either as rows or columns in matrices (e.g. for parsimony) or as hierarchies for 3ia. In the present study, we use simulated and empirical examples to assess the relative merits of each method considering both the character treatment and representation. We measure two parameters (resolving power and artefactual resolution) using a new tree comparison metric, ITRI (inter‐tree retention index). Our results suggest that the hierarchical character representation not only results (with our simulation settings) in the greatest resolving power, but also in the highest artefactual resolution. Our empirical examples provide equivocal results. Parsimony unordered states yield less resolving power and more artefactual resolutions than parsimony ordered states, both with our simulated and empirical data. Relationships between three operational taxonomic units (OTUs), irrespective of their relationships with other OTUs, are called three‐item statements (3is). We compare the intersection tree (which reconstructs a single tree from all of the common 3is of source trees) with the traditional strict consensus and show that the intersection tree retains more of the information contained in the source trees. © 2013 The Linnean Society of London, Biological Journal of the Linnean Society, 2013, 110 , 914–930.  相似文献   

3.
Entomopathogenic nematodes of the genus Steinernema are lethal parasites of insects that are used as biological control agents of several lepidopteran, dipteran and coleopteran pests. Phylogenetic relationships among 25 Steinernema species were estimated using nucleotide sequences from three genes and 22 morphological characters. Parsimony analysis of 28S (LSU) sequences yielded a well-resolved phylogenetic hypothesis with reliable bootstrap support for 13 clades. Parsimony analysis of mitochondrial DNA sequences (12S rDNA and cox 1 genes) yielded phylogenetic trees with a lower consistency index than for LSU sequences, and with fewer reliably supported clades. Combined phylogenetic analysis of the 3-gene dataset by parsimony and Bayesian methods yielded well-resolved and highly similar trees. Bayesian posterior probabilities were high for most clades; bootstrap (parsimony) support was reliable for approximately half of the internal nodes. Parsimony analysis of the morphological dataset yielded a poorly resolved tree, whereas total evidence analysis (molecular plus morphological data) yielded a phylogenetic hypothesis consistent with, but less resolved than trees inferred from combined molecular data. Parsimony mapping of morphological characters on the 3-gene trees showed that most structural features of steinernematids are highly homoplastic. The distribution of nematode foraging strategies on these trees predicts that S. hermaphroditum, S. diaprepesi and S. longicaudum (US isolate) have cruise forager behaviours.  相似文献   

4.
Entomopathogenic nematodes of the genus Steinernema are lethal parasites of insects that are used as biological control agents of several lepidopteran, dipteran and coleopteran pests. Phylogenetic relationships among 25 Steinernema species were estimated using nucleotide sequences from three genes and 22 morphological characters. Parsimony analysis of 28S (LSU) sequences yielded a well-resolved phylogenetic hypothesis with reliable bootstrap support for 13 clades. Parsimony analysis of mitochondrial DNA sequences (12S rDNA and cox 1 genes) yielded phylogenetic trees with a lower consistency index than for LSU sequences, and with fewer reliably supported clades. Combined phylogenetic analysis of the 3-gene dataset by parsimony and Bayesian methods yielded well-resolved and highly similar trees. Bayesian posterior probabilities were high for most clades; bootstrap (parsimony) support was reliable for approximately half of the internal nodes. Parsimony analysis of the morphological dataset yielded a poorly resolved tree, whereas total evidence analysis (molecular plus morphological data) yielded a phylogenetic hypothesis consistent with, but less resolved than trees inferred from combined molecular data. Parsimony mapping of morphological characters on the 3-gene trees showed that most structural features of steinernematids are highly homoplastic. The distribution of nematode foraging strategies on these trees predicts that S. hermaphroditum, S. diaprepesi and S. longicaudum (US isolate) have cruise forager behaviours.  相似文献   

5.
The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis   总被引:26,自引:2,他引:26  
The Parsimony Ratchet 1 1 This method, the Parsimony Ratchet, was originally presented at the Numerical Cladistics Symposium at the American Museum of Natural History, New York, in May 1998 (see Horovitz, 1999) and at the Meeting of the Willi Hennig Society (Hennig XVII) in September 1998 in São Paulo, Brazil.
is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days.  相似文献   

6.
A numerical cladistic analysis of the conodont family Palmatolepidae has been undertaken to determine the applicability of the technique to group-wide systematic revision. Results suggest a new hypothesis of relationships that is considerably more parsimonious than trees compatible with existing hypotheses of relationships, or trees that are even loosely constrained stratigraphically. This may occur either because the fossil record is incomplete, because taxon sampling for the cladistic analysis is low, or because the most parsimonious trees approximate the true tree less well than do stratigraphically-constrained trees (or because of a combination of these factors). Although more taxa and more characters would be preferable in choosing between these possibilities, the tree derived solely from morphological data is adopted. Thus, stratigraphic data can be used to test hypotheses of relationships and construct phylogenies; hypotheses of relationships can be used to test the completeness of the conodont fossil record. Existing schemes of classification within the Palmatolepidae are rejected because most groups within them are either polyphyletic or paraphyletic. A new scheme is presented. Character changes suggest correlated, progressive and mosaic evolution within the Palmatolepidae. Parsimony analysis of partitioned datasets indicates that more phylogenetic information can be recovered from S rather than P or M element positions, although data from all three positional groups are preferable to data from just one. Thus, multielement taxonomy is essential to the resolution of conodont interrelationships.  相似文献   

7.
In this paper, we investigate a conjecture by Arndt von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on 5 taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. Apart from showing that a most parsimonious tree cannot generally be reduced to a most parsimonious tree on fewer taxa, we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models.  相似文献   

8.
Using a simple example and simulations, we explore the impact of input tree shape upon a broad range of supertree methods. We find that input tree shape can affect how conflict is resolved by several supertree methods and that input tree shape effects may be substantial. Standard and irreversible matrix representation with parsimony (MRP), MinFlip, duplication-only Gene Tree Parsimony (GTP), and an implementation of the average consensus method have a tendency to resolve conflict in favor of relationships in unbalanced trees. Purvis MRP and the average dendrogram method appear to have an opposite tendency. Biases with respect to tree shape are correlated with objective functions that are based upon unusual asymmetric tree-to-tree distance or fit measures. Split, quartet, and triplet fit, most similar supertree, and MinCut methods (provided the latter are interpreted as Adams consensus-like supertrees) are not revealed to have any bias with respect to tree shape by our example, but whether this holds more generally is an open problem. Future development and evaluation of supertree methods should consider explicitly the undesirable biases and other properties that we highlight. In the meantime, use of a single, arbitrarily chosen supertree method is discouraged. Use of multiple methods and/or weighting schemes may allow practical assessment of the extent to which inferences from real data depend upon methodological biases with respect to input tree shape or size.  相似文献   

9.
Although long-branch attraction, the incorrect grouping of long lineages in a phylogeny because of systematic error, has been identified as a potential source of error in phylogenetic analysis for almost two decades, no empirical examples of the phenomenon exist. Here, I outline several criteria for identifying long-branch attraction and apply these criteria to 18S ribosomal DNA (rDNA) sequence data for 13 insects. Parsimony and minimum evolution with p distances group the two longest branches together (those leading to Strepsiptera and Diptera). Simulation studies show that the long branches are long enough to attract. When a tree is assumed in which Strepsiptera and Diptera are separated and many data sets are simulated for that tree (using the parameter estimates for that tree for the original data), parsimony analysis of the simulated data consistently groups Strepsiptera and Diptera. Analyses of the 18S rDNA sequences using methods that are less sensitive to the problem of long-branch attraction estimate trees in which the long branches are separate.  相似文献   

10.
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.  相似文献   

11.
Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent "fourth domain" of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data.  相似文献   

12.
13.
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.  相似文献   

14.
The construction and interpretation of gene trees is fundamental in molecular systematics. If the gene is defined in a historical (coalescent) sense, there can be multiple gene trees within the single contiguous set of nucleotides, and attempts to construct a single tree for such a sequence must deal with homoplasy created by conflict among divergent histories. On a larger scale, incongruence is expected among gene tree topologies at different loci of individuals within sexually reproducing species, and it has been suggested that this discordance can be used to delimit species. A practical concern for such topological methods is that polymorphisms may be maintained through numerous cladogenic events; this polymorphism problem is less of a concern for nontopological approaches to species delimitation using molecular data. Although a central theoretical concern in molecular systematics is discordance between a given gene tree and the true "species tree," the primary empirical problem faced in reconstructing taxic phylogeny is incongruence among the trees inferred from different sequences. Linkage relationships limit character independence and thus have important implications for handling multiple data sets in phylogenetic analysis, particularly at the species level, where incongruence among different historically associated loci is expected. Gene trees can also be reconstructed for loci that influence phenotypic characters, but there is at best a tenuous relationship between phenotypic homoplasy and homoplasy in such gene trees. Nevertheless, expression patterns and orthology relationships of genes involved in the expression of phenotypes can in theory provide criteria for homology assessment of morphological characters.  相似文献   

15.
We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.  相似文献   

16.
Phylogenetic studies based on different types and treatment of data provide substantially conflicting hypotheses of relationships among seed plants. We conducted phylogenetic analyses of sequences of two highly conserved chloroplast genes, psaA and psbB, for a comprehensive taxonomic sample of seed plants and land plants. Parsimony analyses of two different codon position partitions resulted in well-supported, but significantly conflicting, phylogenetic trees. First and second codon positions place angiosperms and gymnosperms as sister clades and Gnetales as sister to Pinaceae. Third positions place Gnetales as sister to all other seed plants. Maximum likelihood trees for the two partitions are also in conflict. Relationships among the main seed plant clades according to first and second positions are similar to those found in parsimony analysis for the same data, but the third position maximum likelihood tree is substantially different from the corresponding parsimony tree, although it agrees partially with the first and second position trees in placing Gnetales as the sister group of Pinaceae. Our results document high rate heterogeneity among lineages, which, together with the greater average rate of substitution for third positions, may reduce phylogenetic signal due to long-branch attraction in parsimony reconstructions. Whereas resolution of relationships among major seed plant clades remains pending, this study provides increased support for relationships within major seed plant clades.  相似文献   

17.
Partitioned Bremer support (PBS) is a valuable means of assessing congruence in combined data sets, but some aspects require clarification. When more than one equally parsimonious tree is found during the constrained search for trees lacking the node of interest, averaging PBS for each data set across these trees can conceal conflict, and PBS should ideally be examined for each constrained tree. Similarly, when multiple most parsimonious trees (MPTs) are generated during analysis of the combined data, PBS is usually calculated on the consensus tree. However, extra information can be obtained if PBS is calculated on each of the MPTs or even suboptimal trees.  相似文献   

18.
The relationship between species is usually represented as a bifurcating tree with the branching points representing speciation events. The ancestry of genes taken from these species can also be represented as a tree, with the branching points representing ancestral genes. The time back to the branching points, and even the branching order, can be different between the two trees. This possibility is widely recognized, but the discrepancies are often thought to be small. A different picture is emerging from new empirical evidence, particularly that based on multiple loci or on surveys with a wide geographical scope. The discrepancies must be taken into account when estimating the timing of speciation events, especially the more recent branches. On the positive side, the different timings at different loci provide information about the ancestral populations.  相似文献   

19.
To tree or not to tree   总被引:2,自引:1,他引:1  
The practice of tracking geographical divergence along a phylogenetic tree has added an evolutionary perspective to biogeographic analysis within single species. In spite of the popularity of phylogeography, there is an emerging problem. Recurrent mutation and recombination both create homoplasy, multiple evolutionary occurrences of the same character that are identical in state but not identical by descent. Homoplasic molecular data are phylogenetically ambiguous. Converting homoplasic molecular data into a tree represents an extrapolation, and there can be myriad candidate trees among which to choose. Derivative biogeographic analyses of 'the tree' are analyses of that extrapolation, and the results depend on the tree chosen. I explore the informational aspects of converting a multicharacter data set into a phylogenetic tree, and then explore what happens when that tree is used for population analysis. Three conclusions follow: (i) some trees are better than others; good trees are true to the data, whereas bad trees are not; (ii) for biogeographic analysis, we should use only good trees, which yield the same biogeographic inference as the phenetic data, but little more; and (iii) the reliable biogeographic inference is inherent in the phenetic data, not the trees.  相似文献   

20.
Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes; the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号