期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction

Sayyari Erfan Mirarab Siavash 《BMC genomics》2016,17(10):783-113

Background

Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.

Results

We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.

Conclusions

We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

相似文献

2.

Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction

Hollich V Milchert L Arvestad L Sonnhammer EL 《Molecular biology and evolution》2005,22(11):2257-2264

Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods. 相似文献

3.

RadCon: phylogenetic tree comparison and consensus

Thorley JL Page RD 《Bioinformatics (Oxford, England)》2000,16(5):486-487

SUMMARY: RadCon is a Macintosh program for manipulating and analysing phylogenetic trees. The program can determine the Cladistic Information Content of individual trees, the stability of leaves across a set of bootstrap trees, produce the strict basic Reduced Cladistic Consensus profile of a set of trees and convert a set of trees into its matrix representation for supertree construction. AVAILABILITY: The program is free and available at http://taxonomy.zoology.gla.ac.uk/ approximately jthorley/radcon/radcon.html. 相似文献

4.

A new sequence distance measure for phylogenetic tree construction 总被引：5，自引：0，他引：5

Otu HH Sayood K 《Bioinformatics (Oxford, England)》2003,19(16):2122-2130

MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors. 相似文献

5.

On the repeat-annotated phylogenetic tree reconstruction problem.

Firas Swidan Michal Ziv-Ukelson Ron Y Pinter 《Journal of computational biology》2006,13(8):1397-1418

A new problem in phylogenetic inference is presented, based on recent biological findings indicating a strong association between reversals (i.e., inversions) and repeats. These biological findings are formalized here in a new mathematical model, called repeat-annotated phylogenetic trees (RAPT). We show that, under RAPT, the evolutionary process--including both the tree-topology as well as internal node genome orders--is uniquely determined, a property that is of major significance both in theory and in practice. Furthermore, the repeats are employed to provide linear-time algorithms for reconstructing both the genomic orders and the phylogeny, which are NP-hard problems under the classical model of sorting by reversals (SBR). 相似文献

6.

Algorithms for efficient near-perfect phylogenetic tree reconstruction in theory and practice

Sridhar S Dhamdhere K Blelloch G Halperin E Ravi R Schwartz R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):561-571

We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity. 相似文献

7.

Tie trees generated by distance methods of phylogenetic reconstruction 总被引：2，自引：0，他引：2

Takezaki N 《Molecular biology and evolution》1998,15(6):727-737

In examining genetic data in recent publications, Backeljau et al. showedcases in which two or more different trees (tie trees) were constructedfrom a single data set for the neighbor-joining (NJ) method and theunweighted pair group method with arithmetic mean (UPGMA). However, it isstill unclear how often and under what conditions tie trees are generated.Therefore, I examined these problems by computer simulation. Examination ofcases in which tie trees occur shows that tie trees can appear when nosubstitutions occur along some interior branch(es) on a tree. However, evenwhen some substitutions occur along interior branches, tie trees can appearby chance if parallel or backward substitutions occur at some sites. Thesimulation results showed that tie trees occur relatively frequently forsequences with low divergence levels or with small numbers of sites. Forsuch data, UPGMA sometimes produced tie trees quite frequently, whereas tietrees for the NJ method were generally rare. In the simulation, bootstrapvalues for clusters (tie clusters) that differed among tie trees weremostly low (< 60%). With a small probability, relatively high bootstrapvalues (at most 70%-80%) appeared for tie clusters. The bias of thebootstrap values caused by an input order of sequence can be avoided if oneof the different paths in the cycles of making an NJ or UPGMA tree ischosen at random in each bootstrap replication. 相似文献

8.

Gene tree parsimony vs uninode coding for phylogenetic reconstruction

Cotton JA Page RD 《Molecular phylogenetics and evolution》2003,29(2):298-308

have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data. 相似文献

9.

Generalized neighbor-joining: more reliable phylogenetic tree reconstruction. 总被引：4，自引：0，他引：4

W R Pearson G Robins T Zhang 《Molecular biology and evolution》1999,16(6):806-816

We have developed a phylogenetic tree reconstruction method that detects and reports multiple topologically distant low-cost solutions. Our method is a generalization of the neighbor-joining method of Saitou and Nei and affords a more thorough sampling of the solution space by keeping track of multiple partial solutions during its execution. The scope of the solution space sampling is controlled by a pair of user-specified parameters--the total number of alternate solutions and the number of alternate solutions that are randomly selected--effecting a smooth trade-off between run time and solution quality and diversity. This method can discover topologically distinct low-cost solutions. In tests on biological and synthetic data sets using either the least-squares distance or minimum-evolution criterion, the method consistently performed as well as, or better than, both the neighbor-joining heuristic and the PHYLIP implementation of the Fitch-Margoliash distance measure. In addition, the method identified alternative tree topologies with costs within 1% or 2% of the best, but with topological distances of 9 or more partitions from the best solution (16 taxa); with 32 taxa, topologies were obtained 17 (least-squares) and 22 (minimum-evolution) partitions from the best topology when 200 partial solutions were retained. Thus, the method can find lower-cost tree topologies and near-best tree topologies that are significantly different from the best topology. 相似文献

10.

Disk-covering, a fast-converging method for phylogenetic tree reconstruction. 总被引：2，自引：0，他引：2

D H Huson S M Nettles T J Warnow 《Journal of computational biology》1999,6(3-4):369-386

The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths. 相似文献

11.

Genetic distances,geographic distances and migration between four villages of a single region in Slovakia

Vladimír Ferák Daniela Siváková Zuzana Kroupová 《Journal of human evolution》1980,9(7):573-581

Genetic microdifferentiation has been studied among four endogamous villages of the Bystrica Valley in the Kysuce region of northwest Slovakia, which arose from a single ancestral population about 15 generations ago. Genetic distances and sample kinship between the villages were estimated from the gene frequencies of seven serum-group and isozyme genetic markers. The genetic distance was found to correlate positively (though insignificantly) with the geographic distance, and negatively with the intensity of migration between villages; the sample kinship correlates negatively with geography as well as with the genetic distance. This pattern of genetic structure within the area indicates that the genetic variation among the villages is attributable to the genetic drift. Thus, drift has brought about a detectable differentiation in the area within a limited period of approximately 300 years, in spite of the fact that the villages were not completely genetically isolated from each other. The finding of a negative correlation between the sample kinship and geographic distance indicates that the recent breakdown of genetic isolates in Slovakia is likely to be accompanied with an overall increase of heterozygosity. 相似文献

12.

A robust distance coefficient between distribution areas incorporating geographic distances 总被引：1，自引：0，他引：1

Hennig C Hausdorf B 《Systematic biology》2006,55(1):170-175

相似文献

13.

Uninode coding vs gene tree parsimony for phylogenetic reconstruction using duplicate genes

Simmons MP Freudenstein JV 《Molecular phylogenetics and evolution》2002,23(3):481-498

Two different methods of using paralogous genes for phylogenetic inference have been proposed: reconciled trees (or gene tree parsimony) and uninode coding. Gene tree parsimony suffers from 10 serious problems, including differential weighting of nucleotide and gap characters, undersampling which can be misinterpreted as synapomorphy, all of the characters not being allowed to interact, and conflict between gene trees being given equal weight, regardless of branch support. These problems are largely avoided by using uninode coding. The uninode coding method is elaborated to address multiple gene duplications within a single gene tree family and handle problems caused by lack of gene tree resolution. An example of vertebrate phylogeny inferred from nine genes is reanalyzed using uninode coding. We suggest that uninode coding be used instead of gene tree parsimony for phylogenetic inference from paralogous genes. 相似文献

14.

A short proof that phylogenetic tree reconstruction by maximum likelihood is hard 总被引：1，自引：0，他引：1

Roch S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(1):92-94

Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 相似文献

15.

Genetic distance between Jews and non-Jews of four regions

A K Roychoudhury 《Human heredity》1982,32(4):259-263

相似文献

16.

Long-range comparison between genes and languages based on syntactic distances

Colonna V Boattini A Guardiano C Dall'ara I Pettener D Longobardi G Barbujani G 《Human heredity》2010,70(4):245-254

相似文献

17.

Flavonoids and phylogenetic reconstruction

Tod F. Stuessy Daniel J. Crawford 《Plant Systematics and Evolution》1983,143(1-2):83-107

Flavonoids have been used successfully for interpreting evolutionary relationships in many groups of angiosperms. These interpretations often have been presented in narrative fashion without specific indications of the kinds of relationships expressed. In this paper a method of phylogeny reconstruction with flavonoid data showing cladistic, patristic, and phenetic relationships is presented. Such a phylogram contains maximal information about flavonoid evolution. As an example, relationships in the North American species ofCoreopsis (Compositae), containing 46 species in 11 sections, are analyzed by this approach. A phylogeny of sections of the genus from previous morphological, chromosomal and hybridization data is compared with that from data on anthochlors (chalcones and aurones). Strong correspondence of these evolutionary interpretations gives support to the hypothesized evolutionary trends within the group. 相似文献

18.

Statistical distance between texts and filtration methods in sequence comparison 总被引：1，自引：0，他引：1

Pevzner Pavel A. 《Bioinformatics (Oxford, England)》1992,8(2):121-127

Upon searching local similarities in long sequences, the necessityof a ‘rapid’ similarity search becomes acute. Quadraticcomplexity of dynamic programming algorithms forces the employmentof filtration methods that allow elimination of the sequenceswith a low similarity level. The paper is devoted to the theoreticalsubstantiations of the filtration method based on the statisticaldistance between texts. The notion of the filtration efficiencyis introduced and the efficiency of several filters is estimated.It is shown that the efficiency of the statistical l-tuple filtrationupon DNA database search is associated with a potential extensionof the original four–letter alphabet and grows exponentiallywith increasing l. The formula that allows one to estimate thefiltration parameters is presented. 相似文献

19.

Hypothesis testing and phylogenetic reconstruction

M. Cartmill 《Journal of Zoological Systematics and Evolutionary Research》1981,19(2):73-96

相似文献

20.

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

Shixiang Wan Quan Zou 《Algorithms for molecular biology : AMB》2017,12(1):25

Background

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types.

Methods

Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction.

Results

The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource.

Conclusions

THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

相似文献