期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Estimating species trees from unrooted gene trees

Liu L Yu L 《Systematic biology》2011,60(5):661-667

In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals. 相似文献

2.

SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees

Tobias Hill Karl JV Nordström Mikael Thollesson Tommy M Säfström Andreas KE Vernersson Robert Fredriksson Helgi B Schiöth 《BMC evolutionary biology》2010,10(1):42

Background

Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees. 相似文献

3.

Enumeration of compact coalescent histories for matching gene trees and species trees

Disanto Filippo Rosenberg Noah A. 《Journal of mathematical biology》2019,78(1-2):155-188

Journal of Mathematical Biology - Compact coalescent histories are combinatorial structures that describe for a given gene tree G and species tree S possibilities for the numbers of coalescences of... 相似文献

4.

BEST: Bayesian estimation of species trees under the coalescent model

Liu L 《Bioinformatics (Oxford, England)》2008,24(21):2542-2543

BEST implements a Bayesian hierarchical model to jointly estimate gene trees and the species tree from multilocus sequences. It provides a new option for estimating species phylogenies within the popular Bayesian phylogenetic program MrBayes. The technique of simulated annealing is adopted along with Metropolis coupling as performed in MrBayes to improve the convergence rate of the Markov Chain Monte Carlo algorithm. AVAILABILITY: http://www.stat.osu.edu/~dkp/BEST. 相似文献

5.

The probability distribution of ranked gene trees on a species tree

Degnan JH Rosenberg NA Stadler T 《Mathematical biosciences》2012,235(1):45-55

The properties of random gene tree topologies have recently been studied under a coalescent model that treats a species tree as a fixed parameter. Here we develop the analogous theory for random ranked gene tree topologies, in which both the topology and the sequence of coalescences for a random gene tree are considered. We derive the probability distribution of ranked gene tree topologies conditional on a fixed species tree. We then show that similar to the unranked case, ranked gene trees that do not match either the ranking or the topology of the species tree can have greater probability than the matching ranked gene tree. 相似文献

6.

Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis

Bryant D Bouckaert R Felsenstein J Rosenberg NA RoyChoudhury A 《Molecular biology and evolution》2012,29(8):1917-1932

The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove). 相似文献

7.

Genomic duplication problems for unrooted gene trees

Paszek Jaros&#;aw G&#;recki Pawe&#; 《BMC genomics》2016,17(1):165-175

Background

Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods.

Results

In this article we show the first solution to the open problem of episode clustering where the input gene family trees are unrooted. In particular, by using theoretical properties of unrooted reconciliation, we show an efficient algorithm that reduces this problem into the episode clustering problems defined for rooted trees. We show theoretical properties of the reduction algorithm and evaluation of empirical datasets.

Conclusions

We provided algorithms and tools that were successfully applied to several empirical datasets. In particular, our comparative study shows that we can improve known results on genomic duplication inference from real datasets.

相似文献

8.

A maximum pseudo-likelihood approach for estimating species trees under the coalescent model

Liang Liu Lili Yu Scott V Edwards 《BMC evolutionary biology》2010,10(1):302

Background

Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units. 相似文献

9.

Gene tree distributions under the coalescent process 总被引：10，自引：0，他引：10

Degnan JH Salter LA 《Evolution; international journal of organic evolution》2005,59(1):24-37

Under the coalescent model for population divergence, lineage sorting can cause considerable variability in gene trees generated from any given species tree. In this paper, we derive a method for computing the distribution of gene tree topologies given a bifurcating species tree for trees with an arbitrary number of taxa in the case that there is one gene sampled per species. Applications for gene tree distributions include determining exact probabilities of topological equivalence between gene trees and species trees and inferring species trees from multiple datasets. In addition, we examine the shapes of gene tree distributions and their sensitivity to changes in branch lengths, species tree shape, and tree size. The method for computing gene tree distributions is implemented in the computer program COAL. 相似文献

10.

Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions 总被引：5，自引：0，他引：5

Liu L Pearl DK 《Systematic biology》2007,56(3):504-514

The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication. 相似文献

11.

Clanistics: a multi-level perspective for harvesting unrooted gene trees

François-Joseph Lapointe Philippe Lopez Yan Boucher Jeremy Koenig Eric Bapteste 《Trends in microbiology》2010,18(8):341-347

相似文献

12.

From gene trees to species trees II: species tree inference by minimizing deep coalescence events

Zhang L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1685-1691

When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With incomplete lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we present the following results: 1) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in linear time for any arbitrary gene tree and species tree. 2) The deep coalescence cost is always not less than the gene duplication cost in the reconciliation of an arbitrary gene tree and a species tree. 3) Species tree inference by minimizing deep coalescence events is NP-hard. 相似文献

13.

AUGIST: inferring species trees while accommodating gene tree uncertainty

Oliver JC 《Bioinformatics (Oxford, England)》2008,24(24):2932-2933

SUMMARY: AUGIST (accomodating uncertainty in genealogies while inferring species tress) is a new software package for inferring species trees while accommodating uncertainty in gene genealogies. It is written for the Mesquite software system and provides sampling procedures to incorporate uncertainty in gene tree reconstruction while providing confidence estimates for inferred species trees. AVAILABILITY: http://www.lycaenid.org/augist/ 相似文献

14.

Factors affecting the concordance between orthologous gene trees and species tree in bacteria

Santiago Castillo-Ramírez Víctor González 《BMC evolutionary biology》2008,8(1):300

Background

As originally defined, orthologous genes implied a reflection of the history of the species. In recent years, many studies have examined the concordance between orthologous gene trees and species trees in bacteria. These studies have produced contradictory results that may have been influenced by orthologous gene misidentification and artefactual phylogenetic reconstructions. Here, using a method that allows the detection and exclusion of false positives during identification of orthologous genes, we address the question of whether putative orthologous genes within bacteria really reflect the history of the species. 相似文献

15.

Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood

Wu Y 《Evolution; international journal of organic evolution》2012,66(3):763-775

Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. 相似文献

16.

BIMLR: A method for constructing rooted phylogenetic networks from rooted phylogenetic trees

Juan Wang Maozu Guo Linlin Xing Kai Che Xiaoyan Liu Chunyu Wang 《Gene》2013

Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The C_ASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved C_ASS algorithm, BIMLR. We show that BIMLR is faster than C_ASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. 相似文献

17.

Correlation between Shapley values of rooted phylogenetic trees under the beta-splitting model

Fuchs Michael Paningbatan Ariel R. 《Journal of mathematical biology》2020,80(3):627-653

Journal of Mathematical Biology - In recent years, several different versions of the Shapley value have been introduced in phylogenetics for the purpose of ranking biodiversity data in order to... 相似文献

18.

Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny

Page RD 《Molecular phylogenetics and evolution》2000,14(1):89-106

Paralogy is a pervasive problem in trying to use nuclear gene sequences to infer species phylogenies. One strategy for dealing with this problem is to infer species phylogenies from gene trees using reconciled trees, rather than directly from the sequences themselves. In this approach, the optimal species tree is the tree that requires the fewest gene duplications to be invoked. Because reconciled trees can identify orthologous from paralogous sequences, there is no need to do this prior to the analysis. Multiple gene trees can be analyzed simultaneously; however, the problem of nonuniform gene sampling raises practical problems which are discussed. In this paper the technique is applied to phylogenies for nine vertebrate genes (aldolase, alpha-fetoprotein, lactate dehydrogenase, prolactin, rhodopsin, trypsinogen, tyrosinase, vassopressin, and Wnt-7). The resulting species tree shows much similarity with currently accepted vertebrate relationships. 相似文献

19.

Protein classification based on propagation of unrooted binary trees

Kocsor A Busa-Fekete R Pongor S 《Protein and peptide letters》2008,15(5):428-434

We present two efficient network propagation algorithms that operate on a binary tree, i.e., a sparse-edged substitute of an entire similarity network. TreeProp-N is based on passing increments between nodes while TreeProp-E employs propagation to the edges of the tree. Both algorithms improve protein classification efficiency. 相似文献

20.

Relationships between gene trees and species trees 总被引：49，自引：10，他引：39

Pamilo P; Nei M 《Molecular biology and evolution》1988,5(5):568-583

It is well known that a phylogenetic tree (gene tree) constructed from DNA sequences for a genetic locus does not necessarily agree with the tree that represents the actual evolutionary pathway of the species involved (species tree). One of the important factors that cause this difference is genetic polymorphism in the ancestral species. Under the assumption of neutral mutations, this problem can be studied by evaluating the probability (P) that a gene tree has the same topology as that of the species tree. When one gene (allele) is used from each of the species involved, the probability can be expressed as a simple function of Ti = ti/(2N), where ti is the evolutionary time measured in generations for the ith internodal branch of the species tree and N is the effective population size. When any of the Ti's is less than 1, the probability P becomes considerably less than 1.0. This probability cannot be substantially increased by increasing the number of alleles sampled from a locus. To increase the probability, one has to use DNA sequences from many different loci that have evolved independently of each other. 相似文献