共查询到20条相似文献,搜索用时 0 毫秒
1.
The phylogenetic tree (PT) problem has been studied by a number of researchers as an application of the Steiner tree problem,
a well-known network optimisation problem. Of all the methods developed for phylogenies the maximum parsimony (MP) method is a simple and commonly used method because it relies on directly observable changes in the input nucleotide
or amino acid sequences. In this paper we show that the non-uniqueness of the evolutionary pathways in the MP method leads
us to consider a new model of PTs. In this so-called probability representation model, for each site a node in a PT is modelled
by a probability distribution of nucleotide or amino acid states, and hence the PT at a given site is a probability Steiner tree, i.e. a Steiner tree in a high-dimensional vector space. In spite of the generality of the probability representation model,
in this paper we restrict our study to constructing probability phylogenetic trees (PPT) using the parsimony criterion, as well as discussing and comparing our approach with the classical MP method. We show
that for a given input set although the optimal topology as well as the total tree length of the PPT is the same as the PT
constructed by the classical MP method, the inferred ancestral states and branch lengths are different and the results given
by our method provide a plausible alternative to the classical ones. 相似文献
2.
DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. Availability: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree 相似文献
3.
This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NP-hard and can be formulated as an integer quadratic programming (IQP) problem. To solve the IQP problem, we propose an iterative semidefinite programming-based approximation algorithm, (called SDPHapInfer). We show that this algorithm finds a solution within a factor of O(log n) of the optimal solution, where n is the number of genotypes. This algorithm has been implemented and tested on a variety of simulated and biological data. In comparison with three other methods, (1) HAPAR, which was implemented based on the branching and bound algorithm, (2) HAPLOTYPER, which was implemented based on the expectation-maximization algorithm, and (3) PHASE, which combined the Gibbs sampling algorithm with an approximate coalescent prior, the experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error rates. In addition, the results generated by PHASE have lower error rates on some data but higher error rates on others. The error rates of HAPAR are higher than the others on biological data. In terms of efficiency, SDPHapInfer, HAPLOTYPER, and PHASE output a solution in a stable and consistent way, and they run much faster than HAPAR when the number of genotypes becomes large. 相似文献
4.
Horizontal gene transfer (HGT) may result in genes whose evolutionary histories disagree with each other, as well as with the species tree. In this case, reconciling the species and gene trees results in a network of relationships, known as the "phylogenetic network" of the set of species. A phylogenetic network that incorporates HGT consists of an underlying species tree that captures vertical inheritance and a set of edges which model the "horizontal" transfer of genetic material. In a series of papers, Nakhleh and colleagues have recently formulated a maximum parsimony (MP) criterion for phylogenetic networks, provided an array of computationally efficient algorithms and heuristics for computing it, and demonstrated its plausibility on simulated data. In this article, we study the performance and robustness of this criterion on biological data. Our findings indicate that MP is very promising when its application is extended to the domain of phylogenetic network reconstruction and HGT detection. In all cases we investigated, the MP criterion detected the correct number of HGT events required to map the evolutionary history of a gene data set onto the species phylogeny. Furthermore, our results indicate that the criterion is robust with respect to both incomplete taxon sampling and the use of different site substitution matrices. Finally, our results show that the MP criterion is very promising in detecting HGT in chimeric genes, whose evolutionary histories are a mix of vertical and horizontal evolution. Besides the performance analysis of MP, our findings offer new insights into the evolution of 4 biological data sets and new possible explanations of HGT scenarios in their evolutionary history. 相似文献
5.
Schulmeister S 《Systematic biology》2004,53(4):521-528
Felsenstein (1978, Syst. Zool. 27:401-410) showed that the method of maximum parsimony can be inconsistent, i.e., lead to an incorrect result with an infinite amount of data. The situation in which this inconsistency occurs is often called the "Felsenstein zone," the phenomenon also known as "long-branch attraction." Felsenstein derived a sufficient inconsistency condition from a model for four taxa with only two different parameters for the probability of change on the five branches connecting the four taxa. In the present paper, his approach is used to derive the inconsistency condition of maximum parsimony from the most general model for four taxa, i.e., with five different parameters for the probabilities of change on the five branches and, for the first time, for characters with k states (k = 2, 3, 4, 5, 6, ...) This is used to determine the factors that can cause the inconsistency of maximum parsimony. It is shown that the probability of change on all five branches and the number of character states play a role in causing inconsistency. 相似文献
6.
Ruchi Chaudhary Mukul S Bansal André Wehe David Fernández-Baca Oliver Eulenstein 《BMC bioinformatics》2010,11(1):574
Background
The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle. 相似文献7.
Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree 总被引:13,自引:1,他引:13
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small. 相似文献
8.
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data. 相似文献
9.
10.
The use of parsimony in testing phylogenetic hypotheses 总被引:1,自引:0,他引:1
A. L. PANCHEN 《Zoological Journal of the Linnean Society》1982,74(3):305-328
With the advance of cladistic theory differences in principle between it and other systematic techniques are few but of fundamental importance. In the mechanics of classification they are confined to ranking and the rejection of paraphyletic taxa. In cladistic analysis, leading to cladograms, trees and phylogeny reconstruction, inconsistencies in apparent synapomorphies are said to be resolved using Popper's hypothetico-deductive method together with the principle of parsi However, not only do cladists not use Popper's methodology, which is inconsistent with parsimony, but their use of parsimony is invalid as a test of phylo The only accepted extrinsic test of a classification is that enunciated by John Stuart Mill. It has been claimed that cladistic classifications yield the best results when judged by Mill's criteria, but this is only possibly the case with analytic classifications produced by numerical techniques. No satisfactory test exists in normal (synthetic) cladism for distinguishing synapomorphy from homoplasy. The effects of this are particularly dire in cladograms and classifications involving fossils in which a Stufenreihe arrangement is adopted. 相似文献
11.
12.
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods 总被引:194,自引:0,他引:194
Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 《Molecular biology and evolution》2011,28(10):2731-2739
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. 相似文献
13.
Barker D 《Bioinformatics (Oxford, England)》2004,20(2):274-275
The program LVB seeks parsimonious phylogenies from nucleotide alignments, using the simulated annealing heuristic. LVB runs fast and gives high quality results. AVAILABILITY: The software is available at http://www.rubic.reading.ac.uk/lvb/ Supplementary information: Supplementary information may be downloaded from http://www.rubic.reading.ac.uk/~daniel/ 相似文献
14.
Two different methods of using paralogous genes for phylogenetic inference have been proposed: reconciled trees (or gene tree parsimony) and uninode coding. Gene tree parsimony suffers from 10 serious problems, including differential weighting of nucleotide and gap characters, undersampling which can be misinterpreted as synapomorphy, all of the characters not being allowed to interact, and conflict between gene trees being given equal weight, regardless of branch support. These problems are largely avoided by using uninode coding. The uninode coding method is elaborated to address multiple gene duplications within a single gene tree family and handle problems caused by lack of gene tree resolution. An example of vertebrate phylogeny inferred from nine genes is reanalyzed using uninode coding. We suggest that uninode coding be used instead of gene tree parsimony for phylogenetic inference from paralogous genes. 相似文献
15.
Marietta L. Baba Linda L. Darga Morris Goodman John Czelusniak 《Journal of molecular evolution》1981,17(4):197-213
Summary Rates of evolution for cytochromec over the past one billion years were calculated from a maximum parsimony dendrogram which approximates the phylogeny of 87 lineages. Two periods of evolutionary acceleration and deceleration apparently occurred for the cytochromec molecule. The tempo of evolutionary change indicated by this analysis was compared to the patterns of acceleration and deceleration in the ancestry of several other proteins The synchrony of these tempos of molecular change supports the notion that rapid genetic evolution accompanied periods of major adaptive radiations.Rates of change at different times in several structural-functional areas of cytochromec were also investigated in order to test the Darwinian hypothesis that during periods of rapid evolution, functional sites accumulate proportionately more substitutions than areas with no known function. Rates of change in four proposed functional groupings of sites were therefore compared to rates in areas of unknown function for several different time periods. This analysis revealed a significant increase in the rate of evolution for sites associated with the regions of cytochromec oxidase and reductase interaction during the period between the emergence of the eutherian ancestor to the emergence of the anthropoid ancestor. 相似文献
16.
It is argued that both the principle of parsimony and the theory of evolution, especially that of natural selection, are essential analytical tools in phylogenetic systematics, whereas the widely used outgroup analysis is completely useless and may even be misleading. In any systematic analysis, two types of patterns of characters and character states must be discriminated which are referred to as completely and incompletely resolved. In the former, all known species are presented in which the characters and their states studied occur, whereas in the latter this is not the case. Dependent on its structure, a pattern of characters and their states may be explained by either a unique or by various conflicting, equally most parsimonious hypotheses of relationships. The so-called permutation method is introduced which facilitates finding the conflicting, equally most parsimonious hypotheses of relationships. The utility of the principle of parsimony is limited by the uncertainty as to whether its application in systematics must refer to the minimum number of steps needed to explain a pattern of characterts and their states most parsimoniously or to the minimum number of evolutionary events assumed to have caused these steps. Although these numbers may differ, the former is usually preferred for simplicity. The types of outgroup analysis are shown to exist which are termed parsimony analysis based on test samples and cladistic type of outgroup analysis. Essentially, the former is used for analysing incompletely resolved patterns of characters and their states, the latter for analysing completely resolved ones. Both types are shown to be completely useless for rejecting even one of various conflicting, equally most parsimonious hypotheses of relationships. According to contemporary knowledge, this task can be accomplished only by employing the theory of evolution (including the theory of natural selection). But even then, many phylogenetic-systematic problems will remain unsolved. In such cases, arbitrary algorithms like those offered by phenetics can at best offer pseudosolutions to open problems. Despite its limitations, phylogenetic systematics is superior to any kind of aphylogenetic systematics (transformed cladistics included) in approaching a (not: the) “general reference system” of organisms. 相似文献
17.
Using data provided by the Collaborative Study on the Genetics of Alcoholism we studied the genetics of a quantitative trait: the maximum number of drinks consumed in a 24-hour period. A two-stage method was used. First, linkage analysis was performed, followed by association analysis in regions where linkage was detected. Additionally, the extent of linkage disequilibrium among single-nucleotide polymorphisms (SNP) associated with the phenotype was assessed. Linkage to chromosomes 2 and 7 was detected, and follow-up association analysis found multiple trait-associated SNPs in the chromosome 7 linkage region. Chromosome 4, which has been implicated in previous studies of the maximum drinks phenotype, did not pass our threshold for linkage evidence in stage 1, but secondary analyses of this chromosome indicated modest evidence for both linkage and association. The evidence suggests that chromosome 7 may harbor an additional locus influencing the maximum drinks consumption phenotype. 相似文献
18.
Stochastic models of nucleotide substitution are playing an increasingly important role in phylogenetic reconstruction through
such methods as maximum likelihood. Here, we examine the behaviour of a simple substitution model, and establish some links
between the methods of maximum parsimony and maximum likelihood under this model. 相似文献
19.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case. 相似文献
20.