首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

2.
Supertree methods are used to assemble separate phylogenetic trees with shared taxa into larger trees (supertrees) in an effort to construct more comprehensive phylogenetic hypotheses. In spite of much recent interest in supertrees, there are still few methods for supertree construction. The flip supertree problem is an error correction approach that seeks to find a minimum number of changes (flips) to the matrix representation of the set of input trees to resolve their incompatibilities. A previous flip supertree algorithm was limited to finding exact solutions and was only feasible for small input trees. We developed a heuristic algorithm for the flip supertree problem suitable for much larger input trees. We used a series of 48- and 96-taxon simulations to compare supertrees constructed with the flip supertree heuristic algorithm with supertrees constructed using other approaches, including MinCut (MC), modified MC (MMC), and matrix representation with parsimony (MRP). Flip supertrees are generally far more accurate than supertrees constructed using MC or MMC algorithms and are at least as accurate as supertrees built with MRP. The flip supertree method is therefore a viable alternative to other supertree methods when the number of taxa is large.  相似文献   

3.
We investigated the usefulness of a parallel genetic algorithm for phylogenetic inference under the maximum-likelihood (ML) optimality criterion. Parallelization was accomplished by assigning each "individual" in the genetic algorithm "population" to a separate processor so that the number of processors used was equal to the size of the evolving population (plus one additional processor for the control of operations). The genetic algorithm incorporated branch-length and topological mutation, recombination, selection on the ML score, and (in some cases) migration and recombination among subpopulations. We tested this parallel genetic algorithm with large (228 taxa) data sets of both empirically observed DNA sequence data (for angiosperms) as well as simulated DNA sequence data. For both observed and simulated data, search-time improvement was nearly linear with respect to the number of processors, so the parallelization strategy appears to be highly effective at improving computation time for large phylogenetic problems using the genetic algorithm. We also explored various ways of optimizing and tuning the parameters of the genetic algorithm. Under the conditions of our analyses, we did not find the best-known solution using the genetic algorithm approach before terminating each run. We discuss some possible limitations of the current implementation of this genetic algorithm as well as of avenues for its future improvement.  相似文献   

4.
Reconstructing the duplication history of tandemly repeated genes   总被引:4,自引:0,他引:4  
We present a novel approach to deal with the problem of reconstructing the duplication history of tandemly repeated genes that are supposed to have arisen from unequal recombination. We first describe the mathematical model of evolution by tandem duplication and introduce duplication histories and duplication trees. We then provide a simple recursive algorithm which determines whether or not a given rooted phylogeny can be a duplication history and another algorithm that simulates the unequal recombination process and searches for the best duplication trees according to the maximum parsimony criterion. We use real data sets of human immunoglobulins and T-cell receptors to validate our methods and algorithms. Identity between most parsimonious duplication trees and most parsimonious phylogenies for the same data, combined with the agreement with additional knowledge about the sequences, such as the presence of polymorphisms, shows strong evidence that our reconstruction procedure provides good insights into the duplication histories of these loci.  相似文献   

5.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.  相似文献   

6.
Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly.  相似文献   

7.
The complete mitochondrial cytochrome oxidase II gene was sequenced from 17 black flies, representing 13 putative species, and used to infer phylogenetic relationships. A midge (Paratanytarsus sp.) and three mosquitoes (Aedes aegypti, Anopheles quadrimaculatus, and Culex quinquefasciatus) were used as outgroup taxa. All outgroup taxa were highly divergent from black flies. Phylogenetic trees based on weighted parsimony (a priori and a posteriori), maximum likelihood, and neighbor-joining (log-determinant distances) differed topologically, with deeper nodes being the least well-supported. All analyses supported current classification into species groups but relationships among those groups were poorly resolved. The majority of phylogenetic signal came from closely related sister taxa. The CO-II gene may be useful for exploring relationships at or below the subgeneric level, but is of questionable value at higher taxonomic levels. The weighting method employed gave phylogenetic results similar to those reported by other authors for other insect CO-II data sets. A best estimate of phylogenetic relationships based on the CO-II gene is presented and discussed in relation to current black fly classification.  相似文献   

8.
Model‐based approaches (e.g. maximum likelihood, Bayesian inference) are widely used with molecular data, where they might be more appropriate than maximum parsimony for estimating phylogenies under various models of molecular evolution. Recently, there has been an increase in the application of model‐based approaches with morphological (mainly fossil) data; however, there is some doubt as to the effectiveness of the model of morphological evolution. The input parameters (prior probabilities) for the model are unclear, particularly when concerned with unobserved character states. Despite this, some systematists are suggesting superiority of these model‐based methods over maximum parsimony based on, for example, increased resolution or, in the current study, the preferred phylogenetic placement of an iconic taxon. Here, we revisit a recently published analysis implying such superiority and document the discrepancies between parsimony‐based and model‐based approaches to phylogeny estimation. We find that although some taxa are shifted back to their “traditional” phylogenetic placement, other clades are disturbed. The model‐based phylogenies are better resolved; however, due to the lack of an appropriate model of morphological evolution, the increase in resolving power is probably not meaningful. Similarly, some of the preferred phylogenetic positions of taxa, particularly of labile taxa such as Archaeopteryx, are based solely on analyses employing maximum parsimony as the optimality criterion. Poor resolution and labile taxa indicate a need for further examination of the morphology and not a change in method.  相似文献   

9.
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.  相似文献   

10.
The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a four-taxon tree in the "Felsenstein zone," representing a difficult phylogenetic problem with an extreme situation of long branch attraction. Taxa were added sequentially to this tree in a manner specifically designed to break up the long branches, and for each tree data matrices of different sizes were simulated. Phylogenetic trees were reconstructed from these data using the criteria of parsimony and maximum likelihood. Phylogenetic accuracy was measured in three ways: (1) proportion of trees that are completely correct, (2) proportion of correctly reconstructed branches in all trees, and (3) proportion of trees in which the original four-taxon statement is correctly reconstructed. Accuracy improved dramatically with the addition of taxa and much more slowly with the addition of characters. If taxa can be added to break up long branches, it is much more preferable to add taxa than characters.  相似文献   

11.
Maximum parsimony and maximum likelihood phylogenetic trees were constructed for 21 taxa of Lophozia s. str. and the related genera, Schistochilopsis (5 species), Protolophozia elongate, and Obtusifolium obtusum based on pooled nuclear ITS 1-2 and chloroplast trnL-F DNA sequences. The trees were characterized by similar topology. It was demonstrated that the genus Lophozia s. str. was monophyletic, excluding L. sudetica, which deserved isolation into a distinct cryptic genus. The species distribution among the clades disagreed with the sections distinguished based on anatomical and morphological data. The relationships within the genus Schistochilopsis were consistent with the sectioning of the genus, based on morphological characters. Analysis of molecular data provided more precise definition of the systematic position of a number of taxa. Small genetic divergence of geographically distant forms was demonstrated.  相似文献   

12.
Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer confidence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the efficiency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wide-ranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 protein-coding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required approximately 80% less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and the paraphyly of "S. undulatus" with respect to S. belli, S. cautus, and S. woodi. Parallel evolution of ecomorphs within "S. undulatus" has masked the actual number of species within this group. This evidence, along with convincing patterns of phylogeographic differentiation suggests "S. undulatus" represents at least four lineages that should be recognized as evolutionary species.  相似文献   

13.
A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA. Received: 4 January 2001 / Accepted: 16 May 2001  相似文献   

14.
The phylogenetic relationships of Acomys and Uranomys within Muridae were investigated using nuclear pancreatic ribonuclease A gene sequences. The various kinds of substitutions in the data matrix (15 taxa x 375 nucleotides) were examined for saturation, in order to apply a weighted parsimony approach. Phylogenies were derived by maximum parsimony (weighted and unweighted) and maximum likelihood procedures, using a dormouse (Gliridae) as outgroup. Maximum likelihood gave the most robust results. All analyses cluster some traditional taxa with a strong robustness, such as three species of the genus Mus, two South-East Asian rats, and two genera in each of the gerbil and vole families. When analyzed with those of other murid rodents representing Murinae, Gerbillinae, Arvicolinae, Cricetinae, and Sigmodontinae, sequences of the ribonuclease gene suggest that Acomys and Uranomys constitute a monophyletic clade at the subfamily level, denoted "Acomyinae." The relationships between the six subfamilies of Muridae appear poorly resolved, except for a clade uniting Murinae, Acomyinae, and Gerbillinae. Within this clade, the sister group of Acomyinae could not be identified, as the branch length defining a Gerbillinae + Murinae cluster is extremely short. The poor resolution of our phylogenetic inferences is probably the result of two confounding factors, namely the limited size of the pancreatic ribonuclease sequence and the probable short time intervals during the radiation of the six murid subfamilies involved in this study.  相似文献   

15.
Despite the growing popularity of supertree construction for combining phylogenetic information to produce more inclusive phylogenies, large-scale performance testing of this method has not been done. Through simulation, we tested the accuracy of the most widely used supertree method, matrix representation with parsimony analysis (MRP), with respect to a (maximum parsimony) total evidence solution and a known model tree. When source trees overlap completely, MRP provided a reasonable approximation of the total evidence tree; agreement was usually > 85%. Performance improved slightly when using smaller, more numerous, or more congruent source trees, and especially when elements were weighted in proportion to the bootstrap frequencies of the nodes they represented on each source tree ("weighted MRP"). Although total evidence always estimated the model tree slightly better than nonweighted MRP methods, weighted MRP in turn usually out-performed total evidence slightly. When source studies were even moderately nonoverlapping (i.e., sharing only three-quarters of the taxa), the high proportion of missing data caused a loss in resolution that severely degraded the performance for all methods, including total evidence. In such cases, even combining more trees, which had positive effects elsewhere, did not improve accuracy. Instead, "seeding" the supertree or total evidence analyses with a single largely complete study improved performance substantially. This finding could be an important strategy for any studies that seek to combine phylogenetic information. Overall, our results suggest that MRP supertree construction provides a reasonable approximation of a total evidence solution and that weighted MRP should be used whenever possible.  相似文献   

16.
Parsimony methods infer phylogenetic trees by minimizing number of character changes required to explain observed character states. From the perspective of applicability of parsimony methods, it is important to assess whether the characters used to infer phylogeny are likely to provide a correct tree. We introduce a graph theoretical characterization that helps to assess whether given set of characters is appropriate to use with parsimony methods. Given a set of characters and a set of taxa, we construct a network called character overlap graph. We show that the character overlap graph for characters that are appropriate to use in parsimony methods is characterized by significant under-representation of subnetworks known as holes, and provide a validation for this observation. This characterization explains success in constructing evolutionary trees using parsimony method for some characters (e.g., protein domains) and lack of such success for other characters (e.g., introns). In the latter case, the understanding of obstacles to applying parsimony methods in a direct way has lead us to a new approach for detecting inconsistent and/or noisy data. Namely, we introduce the concept of stable characters which is similar but less restrictive than the well known concept of pairwise compatible characters. Application of this approach to introns produces the evolutionary tree consistent with the Coelomata hypothesis.  相似文献   

17.
Empirical data sets of Artiodactyla (Antilocapridae, Bovidae, Cervidae, Suidae), Carnivora (Mustelidae) and Rodentia (Sciuridae, Cricetidae, Arvicolidae, Muridae), obtained by horizontal starch el electrophoresis of 15–34 isoenzyme sstems, were used to calculate genetic distances and to construct phylogenetic trees by the following methods: Nei's D (corrected for small sample sizes) - UPGMA, FITCH, KITSCH (out of Felsenstein's PHYLIP-package); Rogers -distance - distance-Wanger tree; maximum likelihood approach (cavalli -Sforza -Edwards ); maximum parsimony method (wagner ); Hennigian cladogram. The results were re-examined using the statisticar methods of jackknife and bootstrap. The following problems became apparent and were studied in more detail: inconstancy of molecular evolutionary rate among taxa, non-uniformity of evolutionary rate among isoenzymes, possible convergence of alloenzymes, different evolutionary histories of taxa (radiations/bottlenecks), methodological influences sample sizes / rare alleles, comparability of data sets). The results show, that many branches of the various phylogenetic trees are fairly constant. The ambiguous position of the remaining OTU's is due to insufficient evidence in the primary data rather than to theroperties of cluster algorithms. However, since these problematic cases are also uncertain in phylogenies based on morphological characters and palaeontological results, even an increased data set may not lead to a cyear decision unless additional taxa of crucial importance are examined. Molecular evolutionary rate among taxa seems to be accelerated in some cases, possibly due to random fixation of different alleles during bottlenecks, when a highly polymorpic ancestral form underwent a series of adaptive radiations. Isoenzymes can be divided into groups with different evolutionary rates. Thus, data sets are only comparable with respect to genetic variability and differentiation, when they contain a similar amount of representatives of each of these categories.  相似文献   

18.
Phylogenetic trees based on gene content   总被引:2,自引:0,他引:2  
Comparing gene content between species can be a useful approach for reconstructing phylogenetic trees. In this paper, we derive a maximum-likelihood estimation of evolutionary distance between species under a simple model of gene genesis and gene loss. Using simulated data on a biological tree with 107 taxa (and on a number of randomly generated trees), we compare the accuracy of tree reconstruction using this ML distance measure to an earlier ad hoc distance. We then compare these distance-based approaches to a character-based tree reconstruction method (Dollo parsimony) which seems well suited to the analysis of gene content data. To simplify simulations, we give a formal proof of the well-known 'fact' that the Dollo parsimony score is independent of the choice of root. Our results show a consistent trend, with the character-based method and ML distance measure outperforming the earlier ad hoc distance method. AVAILABILITY: http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html  相似文献   

19.
The assumption that maximum parsimony in a phylogenetic tree is achieved by the reduction of the number of overall homoplasies to a minimum is questioned and an alternative approach of using the most parsimonius distribution of taxa at each dichotomy suggested. This approach can be put into practice using Hennigian logic to determine relationships, DNA sequences data as the character set and standard statistical techniques to determine the significance to be placed on the resultipng phylogenetic tree. The logic and robustness of such an approach is described. A software package, SYNAPO, is availabe to assist such analyses.  相似文献   

20.
Clitellata (earthworms, leeches, and allies) is a clade of segmented annelid worms that comprise more than 5000 species found worldwide in many aquatic and terrestrial habitats. According to current views, the first clitellates were either aquatic (marine or freshwater) or terrestrial. To address this question further, we assessed the phylogenetic relationships among clitellates using parsimony, maximum likelihood and Bayesian analyses of 175 annelid 18S ribosomal DNA sequences. We then defined two ecological characters (Habitat and Aquatic‐environment preferences) and mapped those characters on the trees from the three analyses, using parsimony character‐state reconstruction (i.e. Fitch optimization). We accommodated phylogenetic uncertainty in the character mapping by reconstructing character evolution on all the trees resulting from parsimony and maximum likelihood bootstrap analyses and, in the Bayesian inference, on the trees sampled using the Markov chain Monte Carlo algorithm. Our analyses revealed that an ‘aquatic’ ancestral state for clitellates is a robust result. By using alterations of coding characters and constrained analyses, we also demonstrated that the hypothesis for a terrestrial origin of clitellates is not supported. Our analyses also suggest that the most recent ancestor of clitellates originated from a freshwater environment. However, we stress the importance of adding sequences of some rare marine taxa to more rigorously assess the freshwater origin of Clitellata. © 2008 The Linnean Society of London, Biological Journal of the Linnean Society, 2008, 95 , 447–464.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号