共查询到20条相似文献,搜索用时 15 毫秒
1.
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data. 相似文献
2.
DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. Availability: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree 相似文献
3.
In this paper, we propose a new method (uninode coding) for coding duplicate (paralogous) genes to infer species trees. Uninode coding incorporates data from duplicated and unduplicated gene copies in phylogenetic analyses of taxa. Uninode coding utilizes global parsimony through the inclusion of both duplicated and unduplicated gene copies, allows one to code all data sources from a taxon into a single terminal, and overcomes problems of character dependence among duplicated and unduplicated gene copies. We present an example of uninode coding using the phytochrome A and phytochrome C data from a study by Donoghue and Mathews. 相似文献
4.
5.
Background
Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n 2) in the number of states, making it impractical for large values of n. 相似文献6.
Silva AE Villanueva WJ Knidel H Bonato VC Reis SF Von Zuben FJ 《Genetics and molecular research : GMR》2005,4(3):525-534
The computationally challenging problem of reconstructing the phylogeny of a set of contemporary data, such as DNA sequences or morphological attributes, was treated by an extended version of the neighbor-joining (NJ) algorithm. The original NJ algorithm provides a single-tree topology, after a cascade of greedy pairing decisions that tries to simultaneously optimize the minimum evolution and the least squares criteria. Given that some sub-trees are more stable than others, and that the minimum evolution tree may not be achieved by the original NJ algorithm, we propose a multi-neighbor-joining (MNJ) algorithm capable of performing multiple pairing decisions at each level of the tree reconstruction, keeping various partial solutions along the recursive execution of the NJ algorithm. The main advantages of the new reconstruction procedure are: 1) as is the case for the original NJ algorithm, the MNJ algorithm is still a low-cost reconstruction method; 2) a further investigation of the alternative topologies may reveal stable and unstable sub-trees; 3) the chance of achieving the minimum evolution tree is greater; 4) tree topologies with very similar performances will be simultaneously presented at the output. When there are multiple unrooted tree topologies to be compared, a visualization tool is also proposed, using a radial layout to uniformly distribute the branches with the help of well-known metaheuristics used in computer science. 相似文献
7.
Ruchi Chaudhary Mukul S Bansal André Wehe David Fernández-Baca Oliver Eulenstein 《BMC bioinformatics》2010,11(1):574
Background
The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle. 相似文献8.
Motivation
Species tree estimation from gene trees can be complicated by gene duplication and loss, and “gene tree parsimony” (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species).Results
We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the “standard” calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.9.
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths. 相似文献
10.
A new problem in phylogenetic inference is presented, based on recent biological findings indicating a strong association between reversals (i.e., inversions) and repeats. These biological findings are formalized here in a new mathematical model, called repeat-annotated phylogenetic trees (RAPT). We show that, under RAPT, the evolutionary process--including both the tree-topology as well as internal node genome orders--is uniquely determined, a property that is of major significance both in theory and in practice. Furthermore, the repeats are employed to provide linear-time algorithms for reconstructing both the genomic orders and the phylogeny, which are NP-hard problems under the classical model of sorting by reversals (SBR). 相似文献
11.
Sridhar S Dhamdhere K Blelloch G Halperin E Ravi R Schwartz R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):561-571
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity. 相似文献
12.
Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree 总被引:13,自引:1,他引:13
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small. 相似文献
13.
We have developed a phylogenetic tree reconstruction method that detects and reports multiple topologically distant low-cost solutions. Our method is a generalization of the neighbor-joining method of Saitou and Nei and affords a more thorough sampling of the solution space by keeping track of multiple partial solutions during its execution. The scope of the solution space sampling is controlled by a pair of user-specified parameters--the total number of alternate solutions and the number of alternate solutions that are randomly selected--effecting a smooth trade-off between run time and solution quality and diversity. This method can discover topologically distinct low-cost solutions. In tests on biological and synthetic data sets using either the least-squares distance or minimum-evolution criterion, the method consistently performed as well as, or better than, both the neighbor-joining heuristic and the PHYLIP implementation of the Fitch-Margoliash distance measure. In addition, the method identified alternative tree topologies with costs within 1% or 2% of the best, but with topological distances of 9 or more partitions from the best solution (16 taxa); with 32 taxa, topologies were obtained 17 (least-squares) and 22 (minimum-evolution) partitions from the best topology when 200 partial solutions were retained. Thus, the method can find lower-cost tree topologies and near-best tree topologies that are significantly different from the best topology. 相似文献
14.
15.
The first analyses of gene sequence data indicated that the eukaryotic tree of life consisted of a long stem of microbial groups "topped" by a crown-containing plants, animals, and fungi and their microbial relatives. Although more recent multigene concatenated analyses have refined the relationships among the many branches of eukaryotes, the root of the eukaryotic tree of life has remained elusive. Inferring the root of extant eukaryotes is challenging because of the age of the group (~1.7-2.1 billion years old), tremendous heterogeneity in rates of evolution among lineages, and lack of obvious outgroups for many genes. Here, we reconstruct a rooted phylogeny of extant eukaryotes based on minimizing the number of duplications and losses among a collection of gene trees. This approach does not require outgroup sequences or assumptions of orthology among sequences. We also explore the impact of taxon and gene sampling and assess support for alternative hypotheses for the root. Using 20 gene trees from 84 diverse eukaryotic lineages, this approach recovers robust eukaryotic clades and reveals evidence for a eukaryotic root that lies between the Opisthokonta (animals, fungi and their microbial relatives) and all remaining eukaryotes. 相似文献
16.
Hollich V Milchert L Arvestad L Sonnhammer EL 《Molecular biology and evolution》2005,22(11):2257-2264
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods. 相似文献
17.
棉铃虫核型多角体病毒几丁质酶基因及杆状病毒
几丁质酶基因的分子进化 总被引:9,自引:0,他引:9
用PCR方法扩增了棉铃虫Helicoverpa armigera单粒包埋型核型多角体病毒(HaSNPV)几丁质酶基因,测定了基因编码区的核苷酸全序列。基因编码区全长1.713 bp,可编码570个氨基酸残基组成的多肽,预计分子量为63.6 kD。将所推导的HaSNPV几丁质酶氨基酸序列与其它已知杆状病毒几丁质酶氨基酸序列进行联配比较,结果表明HaSNPV 与谷实夜蛾H.zea单粒包埋型核型多角体病毒(HzSNPV)的氨基酸序列非常相似,同源性高达90.7%,与苜蓿丫纹夜蛾Autographa californica多粒包埋型核型多角体病毒(AcMNPV)、家蚕Bombyx mori核型多角体病毒(BmNPV)、美国白蛾Hyphantria cunea核型多角体病毒(HcNPV)、舞毒蛾Lymantria dispar多粒包埋型核型多角体病毒(LdMNPV)、黄杉毒蛾Orgyia pseudotsugata多粒包埋型核型多角体病毒(OpMNPV)和云杉卷叶蛾Choristoneura fumiferana核型多角体病毒(CfMNPV)氨基酸序列同源性分别为64.4%、64.9%、64.2%、62.9%、66.2%和61.5%。根据氨基酸序列用PC\GENE程序绘制已知杆状病毒几丁质酶的分子系统树,并与杆状病毒中最为保守的多角体蛋白基因系统树作了比较,结果表明几丁质酶基因和多角体蛋白基因的进化速率是不尽相同的。 相似文献
18.
Inferring phylogeny is a difficult computational problem. For example, for only 13 taxa, there are more then 13 billion possible unrooted phylogenetic trees. Heuristics are necessary to minimize the time spent evaluating non-optimal trees. We describe here an approach for heuristic searching, using a genetic algorithm, that can reduce the time required for weighted maximum parsimony phylogenetic inference, especially for data sets involving a large number of taxa. It is the first implementation of a weighted maximum parsimony criterion using amino acid sequences. To validate the weighted criterion, we used an artificial data set and compared it to a number of other phylogenetic methods. Genetic algorithms mimic the natural selection's ability to solve complex problems. We have identified several parameters affecting the genetic algorithm. Methods were developed to validate these parameters, ensuring optimal performance. This approach allows the construction of phylogenetic trees with over 200 taxa in practical time on a regular PC. 相似文献
19.
Background
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.Results
We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.Conclusions
We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.20.
Gene duplication and divergence is a major evolutionary force. Despite the growing number of fully sequenced genomes, methods for investigating these events on a genome-wide scale are still in their infancy. Here, we present SYNERGY, a novel and scalable algorithm that uses sequence similarity and a given species phylogeny to reconstruct the underlying evolutionary history of all genes in a large group of species. In doing so, SYNERGY resolves homology relations and accurately distinguishes orthologs from paralogs. We applied our approach to a set of nine fully sequenced fungal genomes spanning 150 million years, generating a genome-wide catalog of orthologous groups and corresponding gene trees. Our results are highly accurate when compared to a manually curated gold standard, and are robust to the quality of input according to a novel jackknife confidence scoring. The reconstructed gene trees provide a comprehensive view of gene evolution on a genomic scale. Our approach can be applied to any set of sequenced eukaryotic species with a known phylogeny, and opens the way to systematic studies of the evolution of individual genes, molecular systems and whole genomes. Supplementary information: Supplementary data are available at Bioinformatics online. 相似文献