共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals. 相似文献
2.
3.
Luo CW Chen MC Chen YC Yang RW Liu HF Chao KM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(1):260-265
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the MULTIPLE GENE DUPLICATION problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the MULTIPLE GENE DUPLICATION problems: the EPISODE-CLUSTERING (EC) problem and the MINIMUM EPISODES (ME) problem. For the EC problem, we improve the results of Burleigh et al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm. 相似文献
4.
5.
We present two efficient network propagation algorithms that operate on a binary tree, i.e., a sparse-edged substitute of an entire similarity network. TreeProp-N is based on passing increments between nodes while TreeProp-E employs propagation to the edges of the tree. Both algorithms improve protein classification efficiency. 相似文献
6.
Elizabeth S. Allman James H. Degnan John A. Rhodes 《Journal of mathematical biology》2011,62(6):833-862
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent
populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models
ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed
species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene
trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees
are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods
are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when
there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the
unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location
of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled
per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and
all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable
for any species from which more than one gene is sampled. 相似文献
7.
8.
Bertrand D Gascuel O 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(1):15-28
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch (1977). Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR, ...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program. 相似文献
9.
A.D McLachlan 《Journal of molecular biology》1976,107(2):159-174
10.
Chaudhary R Burleigh JG Fernández-Baca D 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(4):1004-1013
A Robinson-Foulds (RF) supertree for a collection of input trees is a tree containing all the species in the input trees that is at minimum total RF distance to the input trees. Thus, an RF supertree is consistent with the maximum number of splits in the input trees. Constructing RF supertrees for rooted and unrooted data is NP-hard. Nevertheless, effective local search heuristics have been developed for the restricted case where the input trees and the supertree are rooted. We describe new heuristics, based on the Edge Contract and Refine (ECR) operation, that remove this restriction, thereby expanding the utility of RF supertrees. Our experimental results on simulated and empirical data sets show that our unrooted local search algorithms yield better supertrees than those obtained from MRP and rooted RF heuristics in terms of total RF distance to the input trees and, for simulated data, in terms of RF distance to the true tree. 相似文献
11.
Given a gene tree and a species tree, a coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. Each pair consisting of a gene tree topology and a species tree topology has some number of possible coalescent histories. Here we show that, for each n≥7, there exist a species tree topology S and a gene tree topology G≠S, both with n leaves, for which the number of coalescent histories exceeds the corresponding number of coalescent histories when the species tree topology is S and the gene tree topology is also S. This result has the interpretation that the gene tree topology G discordant with the species tree topology S can be produced by the evolutionary process in more ways than can the gene tree topology that matches the species tree topology, providing further insight into the surprising combinatorial properties of gene trees that arise from their joint consideration with species trees. 相似文献
12.
URec is a software based on a concept of unrooted reconciliation. It can be used to reconcile a set of unrooted gene trees with a rooted species tree or a set of rooted species trees. Moreover, it computes detailed distribution of gene duplications and gene losses in a species tree. It can be used to infer optimal species phylogenies for a given set of gene trees. URec is implemented in C++ and can be easily compiled under Unix and Windows systems. Availability: Software is freely available for download from our website at http://bioputer.mimuw.edu.pl/~gorecki/urec. This webpage also contains Windows executables and a number of advanced examples with explanations. 相似文献
13.
14.
Paul-Ludwig Lott Marvin Mundry Christoph Sassenberg Stefan Lorkowski Georg Fuellen 《BMC bioinformatics》2006,7(1):231-15
Background
In the genomic age, gene trees may contain large amounts of data making them hard to read and understand. Therefore, an automated simplification is important. 相似文献15.
16.
Background
The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.Results
We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.Conclusions
Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.17.
Relationships between gene trees and species trees 总被引:39,自引:10,他引:39
It is well known that a phylogenetic tree (gene tree) constructed from DNA
sequences for a genetic locus does not necessarily agree with the tree that
represents the actual evolutionary pathway of the species involved (species
tree). One of the important factors that cause this difference is genetic
polymorphism in the ancestral species. Under the assumption of neutral
mutations, this problem can be studied by evaluating the probability (P)
that a gene tree has the same topology as that of the species tree. When
one gene (allele) is used from each of the species involved, the
probability can be expressed as a simple function of Ti = ti/(2N), where ti
is the evolutionary time measured in generations for the ith internodal
branch of the species tree and N is the effective population size. When any
of the Ti's is less than 1, the probability P becomes considerably less
than 1.0. This probability cannot be substantially increased by increasing
the number of alleles sampled from a locus. To increase the probability,
one has to use DNA sequences from many different loci that have evolved
independently of each other.
相似文献
18.
19.
Journal of Mathematical Biology - Compact coalescent histories are combinatorial structures that describe for a given gene tree G and species tree S possibilities for the numbers of coalescences of... 相似文献
20.
Hughes AL 《Trends in genetics : TIG》2002,18(9):433-434
One of the two ribonuclease genes in a leaf-eating monkey has adapted to a role in the digestion of bacterial RNA. Following duplication of the ancestral ribonuclease gene, adaptation occurred through a series of changes in the amino acid sequence of the protein it encodes. This example is a good illustration of how specialization of protein function after gene duplication can be as source of novel protein functions. 相似文献