首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
A recent article published in Cladistics is critical of a number of heuristic methods for phylogenetic inference based on parsimony scores. One of my papers is among those criticized, and I would appreciate the opportunity to make a public response. The specific criticism is that I have re‐invented an algorithm for economizing parsimony calculations on trees that differ by a subtree pruning and regrafting (SPR) rearrangement. This criticism is justified, and I apologize for incorrectly claiming originality for my presentation of this algorithm. However, I would like to clarify the intent of my paper, if I can do so without detracting from the sincerity of my apology. My paper is not about that algorithm, nor even primarily about parsimony. Rather, it is about a novel strategy for Markov chain Monte Carlo (MCMC) sampling in a state space consisting of trees. The sampler involves drawing from conditional distributions over sets of trees: a Gibbs‐like strategy that had not previously been used to sample tree‐space. I would like to see this technique incorporated into MCMC samplers for phylogenetics, as it may have advantages over commonly used Metropolis‐like strategies. I have recently used it to sample phylogenies of a biological invasion, and I am finding many applications for it in agent‐based Bayesian ecological modelling. It is thus my contention that my 2005 paper retains substantial value.  相似文献   

2.
Two commonly used heuristic approaches to the generalized tree alignment problem are compared in the context of phylogenetic analysis of DNA sequence data. These approaches, multiple sequence alignment + phylogenetic tree reconstruction (MSA+TR) and direct optimization (DO), are alternative heuristic procedures used to approach the nested NP‐Hard optimizations presented by the phylogenetic analysis of unaligned sequences under maximum parsimony. Multiple MSA+TR implementations and DO were compared in terms of optimality score (phylogenetic tree cost) over multiple empirical and simulated datasets with differing levels of heuristic intensity. In all cases examined, DO outperformed MSA+TR with average improvement in parsimony score of 14.78% (5.64–52.59%).  相似文献   

3.
A new parsimony analysis of 27 complete mitochondrial genomic sequences is conducted to investigate the phylogenetic relationships of plethodontid salamanders. This analysis focuses on the amount of character conflict between phylogenetic trees recovered from newly conducted parsimony searches and the Bayesian and maximum likelihood topology reported by Mueller et al. (2004 ; PNAS, 101, 13820–13825). Strong support for Hemidactylium as the sister taxon to all other plethodontids is recovered from parsimony analyses. Plotting area relationships on the most parsimonious phylogenetic tree suggests that eastern North America is the origin of the family Plethodontidae supporting the “Out of Appalachia” hypothesis. A new taxonomy that recognizes clades recovered from phylogenetic analyses is proposed. © The Willi Hennig Society 2005.  相似文献   

4.
We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.  相似文献   

5.
We have examined the molecular-phylogenetic relationships between nonmulberry and mulberry silkworm species that belong to the families Saturniidae, Bombycidae and Lasiocampidae using 16S ribosomal RNA (16S rRNA) and cytochrome oxidase subunit I (coxI) gene sequences. Aligned nucleotide sequences of 16S rRNA andcoxI from 14 silk-producing species were used for construction of phylogenetic trees by maximum likelihood and maximum parsimony methods. The tree topology on the basis of 16S rRNA supports monophyly for members of Saturniidae and Bombycidae. Weighted parsimony analysis weighted towards transversions relative to transitions (ts, tv4) forcoxI resulted in more robust bootstrap support over unweighted parsimony and favours the 16S rRNA tree topology. Combined analysis reflected clear biogeographic pattern, and agrees with morphological and cytological data.  相似文献   

6.
The small parsimony problem is studied for reconstructing recombination networks from sequence data. The small parsimony problem is polynomial-time solvable for phylogenetic trees. However, the problem is proved NP-hard even for galled recombination networks. A dynamic programming algorithm is also developed to solve the small parsimony problem. It takes O(dn2(3h)) time on an input recombination network over length-d sequences in which there are h recombination and n - h tree nodes.  相似文献   

7.
Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

8.
The maximum likelihood (ML) method of phylogenetic tree construction is not as widely used as other tree construction methods (e.g., parsimony, neighbor-joining) because of the prohibitive amount of time required to find the ML tree when the number of sequences under consideration is large. To overcome this difficulty, we propose a stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm. The algorithm works by moving through tree space by way of a "local rearrangement" strategy so that topologies that improve the likelihood are always accepted, whereas those that decrease the likelihood are accepted with a probability that is related to the proportionate decrease in likelihood. Besides greatly reducing the time required to estimate the ML tree, the stochastic search strategy is less likely to become trapped in local optima than are existing algorithms for ML tree estimation. We demonstrate the success of the modified simulated annealing algorithm by comparing it with two existing algorithms (Swofford's PAUP* and Felsenstein's DNAMLK) for several theoretical and real data examples.  相似文献   

9.
Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.  相似文献   

10.
The Channichthyidae is a lineage of 16 species in the Notothenioidei, a clade of fishes that dominate Antarctic near-shore marine ecosystems with respect to both diversity and biomass. Among four published studies investigating channichthyid phylogeny, no two have produced the same tree topology, and no published study has investigated the degree of phylogenetic incongruence between existing molecular and morphological datasets. In this investigation we present an analysis of channichthyid phylogeny using complete gene sequences from two mitochondrial genes (ND2 and 16S) sampled from all recognized species in the clade. In addition, we have scored all 58 unique morphological characters used in three previous analyses of channichthyid phylogenetic relationships. Data partitions were analyzed separately to assess the amount of phylogenetic resolution provided by each dataset, and phylogenetic incongruence among data partitions was investigated using incongruence length difference (ILD) tests. We utilized a parsimony-based version of the Shimodaira-Hasegawa test to determine if alternative tree topologies are significantly different from trees resulting from maximum parsimony analysis of the combined partition dataset. Our results demonstrate that the greatest phylogenetic resolution is achieved when all molecular and morphological data partitions are combined into a single maximum parsimony analysis. Also, marginal to insignificant incongruence was detected among data partitions using the ILD. Maximum parsimony analysis of all data partitions combined results in a single tree, and is a unique hypothesis of phylogenetic relationships in the Channichthyidae. In particular, this hypothesis resolves the phylogenetic relationships of at least two species (Channichthys rhinoceratus and Chaenocephalus aceratus), for which there was no consensus among the previous phylogenetic hypotheses. The combined data partition dataset provides substantial statistical power to discriminate among alternative hypotheses of channichthyid relationships. These findings suggest the optimal strategy for investigating the phylogenetic relationships of channichthyids is one that uses all available phylogenetic data in analyses of combined data partitions.  相似文献   

11.
We report that for population data, where sequences are very similar to one another, it is often possible to use a two-pronged (MinMax Squeeze) approach to prove that a tree is the shortest possible under the parsimony criterion. Such population data can be in a range where parsimony is a maximum likelihood estimator. This is in sharp contrast to the case with species data, where sequences are much further apart and the problem of guaranteeing an optimal phylogenetic tree is known to be computationally prohibitive for realistic numbers of species, irrespective of whether likelihood or parsimony is the optimality criterion. The Squeeze uses both an upper bound (the length of the shortest tree known) and a lower bound derived from partitions of the columns (the length of the shortest tree possible). If the two bounds meet, the shortest known tree is thus proven to be a shortest possible tree. The implementation is first tested on simulated data sets and then applied to 53 complete human mitochondrial genomes. The shortest possible trees for those data have several significant improvements from the published tree. Namely, a pair of Australian lineages comes deeper in the tree (in agreement with archaeological data), and the non-African part of the tree shows greater agreement with the geographical distribution of lineages.  相似文献   

12.
In this paper we investigate mathematical questions concerning the reliability (reconstruction accuracy) of Fitch's maximum parsimony algorithm for reconstructing the ancestral state given a phylogenetic tree and a character. In particular, we consider the question whether the maximum parsimony method applied to a subset of taxa can reconstruct the ancestral state of the root more accurately than when applied to all taxa, and we give an example showing that this indeed is possible. A surprising feature of our example is that ignoring a taxon closer to the root improves the reliability of the method. On the other hand, in the case of the two-state symmetric substitution model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that under a molecular clock the probability that the state at a single taxon is a correct guess of the ancestral state is a lower bound on the reconstruction accuracy of Fitch's method applied to all taxa.  相似文献   

13.
Characters derived from advertisement calls, morphology, allozymes, and the sequences of the small subunit of the mitochondrial ribosomal gene (12S) and the cytochrome oxidase I (COI) mitochondrial gene were used to estimate the phylogeny of frogs of the Physalaemus pustulosus group (Leptodactylidae). The combinability of these data partitions was assessed in several ways: measures of phylogenetic signal, character support for trees, congruence of tree topologies, compatibility of data partitions with suboptimal trees, and homogeneity of data partitions. Combined parsimony analysis of all data equally weighted yielded the same tree as the 12S partition analyzed under parsimony and maximum likelihood. The COI, allozyme, and morphology partitions were generally congruent and compatible with the tree derived from combined data. The call data were significantly different from all other partitions, whether considered in terms of tree topology alone, partition homogeneity, or compatibility of data with trees derived from other partitions. The lack of effect of the call data on the topology of the combined tree is probably due to the small number of call characters. The general incongruence of the call data with other data partitions is consistent with the idea that the advertisement calls of this group of frogs are under strong sexual selection.  相似文献   

14.
Gai YH  Song DX  Sun HY  Zhou KY 《Zoological science》2006,23(12):1101-1108
Myriapods play a pivotal position in the arthropod phylogenetic tree. The monophyly of Myriapoda and its internal relationships have been difficult to resolve. This study combined nearly complete 28S and 18S ribosomal RNA gene sequences (3,826 nt in total) to estimate the phylogenetic position of Myriapoda and phylogenetic relationships among four myriapod classes. Our data set consists of six new myriapod sequences and homologous sequences for 18 additional species available in GenBank. Among the six new myriapod sequences, those of the one pauropod and two symphylans are very important additions because they were such difficult taxa to classify in past molecular-phylogenetic studies. Phylogenetic trees were constructed with maximum parsimony, maximum likelihood, and Bayesian analyses. All methods yielded moderate to strong support for the monophyly of Myriapoda. Symphyla grouped strongly with Pauropoda under all analytical conditions. The KH test rejected the traditional view of Dignatha and Progoneata, and the topology obtained here, though not significantly supported, was Diplopoda versus ((Symphyla + Pauropoda) + Chilopoda).  相似文献   

15.
Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.  相似文献   

16.
Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon sampling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.  相似文献   

17.
Clitellata (earthworms, leeches, and allies) is a clade of segmented annelid worms that comprise more than 5000 species found worldwide in many aquatic and terrestrial habitats. According to current views, the first clitellates were either aquatic (marine or freshwater) or terrestrial. To address this question further, we assessed the phylogenetic relationships among clitellates using parsimony, maximum likelihood and Bayesian analyses of 175 annelid 18S ribosomal DNA sequences. We then defined two ecological characters (Habitat and Aquatic‐environment preferences) and mapped those characters on the trees from the three analyses, using parsimony character‐state reconstruction (i.e. Fitch optimization). We accommodated phylogenetic uncertainty in the character mapping by reconstructing character evolution on all the trees resulting from parsimony and maximum likelihood bootstrap analyses and, in the Bayesian inference, on the trees sampled using the Markov chain Monte Carlo algorithm. Our analyses revealed that an ‘aquatic’ ancestral state for clitellates is a robust result. By using alterations of coding characters and constrained analyses, we also demonstrated that the hypothesis for a terrestrial origin of clitellates is not supported. Our analyses also suggest that the most recent ancestor of clitellates originated from a freshwater environment. However, we stress the importance of adding sequences of some rare marine taxa to more rigorously assess the freshwater origin of Clitellata. © 2008 The Linnean Society of London, Biological Journal of the Linnean Society, 2008, 95 , 447–464.  相似文献   

18.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

19.
The codon-degeneracy model (CDM) predicts relative frequencies of substitution for any set of homologous protein-coding DNA sequences based on patterns of nucleotide degeneracy, codon composition, and the assumption of selective neutrality. However, at present, the CDM is reliant on outside estimates of transition bias. A new method by which the power of the CDM can be used to find a synonymous transition bias that is optimal for any given phylogenetic tree topology is presented. An example is illustrated that utilizes optimized transition biases to generate CDM GF-scores for every possible phylogenetic tree for pocket gophers of the genus Orthogeomys. The resulting distribution of CDM GF-scores is compared and contrasted with the results of maximum parsimony and maximum likelihood methods. Although convergence on a single tree topology by the CDM and another method indicates greater support for that particular tree, the value of CDM GF-score as the sole optimality criterion for phylogeny reconstruction remains to be determined. It is clear, however, that the a priori estimation of an optimum transition bias from codon composition has a direct application to differentiating between alternative trees. Received: 13 October 1999 / Accepted: 28 April 2000  相似文献   

20.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号