首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

2.
3.
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the MULTIPLE GENE DUPLICATION problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the MULTIPLE GENE DUPLICATION problems: the EPISODE-CLUSTERING (EC) problem and the MINIMUM EPISODES (ME) problem. For the EC problem, we improve the results of Burleigh et al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.  相似文献   

4.
We present two efficient network propagation algorithms that operate on a binary tree, i.e., a sparse-edged substitute of an entire similarity network. TreeProp-N is based on passing increments between nodes while TreeProp-E employs propagation to the edges of the tree. Both algorithms improve protein classification efficiency.  相似文献   

5.
6.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

7.
8.
We developed a recurrence relation that counts the number of tandem duplication trees (either rooted or unrooted) that are consistent with a set of n tandemly repeated sequences generated under the standard unequal recombination (or crossover) model of tandem duplications. The number of rooted duplication trees is exactly twice the number of unrooted trees, which means that on average only two positions for a root on a duplication tree are possible. Using the recurrence, we tabulated these numbers for small values of n. We also developed an asymptotic formula that for large n provides estimates for these numbers. These numbers give a priori probabilities for phylogenies of the repeated sequences to be duplication trees. This work extends earlier studies where exhaustive counts of the numbers for small n were obtained. One application showed the significance of finding that most maximum-parsimony trees constructed from repeat sequences from human immunoglobins and T-cell receptors were tandem duplication trees. Those findings provided strong support to the proposed mechanisms of tandem gene duplication. The recurrence relation also suggests efficient algorithms to recognize duplication trees and to generate random duplication trees for simulation. We present a linear-time recognition algorithm.  相似文献   

9.
Forest trees in general are out-crossing, long-lived, and at early stages of domestication. Molecular evolution at neutral sites is very slow because of the long generation times. Transferring information between closely related conifer species is facilitated by high sequence similarity. At the nucleotide level, trees have at most intermediate levels of variation relative to other plants. Importantly, in many species linkage disequilibrium within genes declines within less than 1000 bp. In contrast to the slow rate of neutral evolution, large tree populations respond rapidly to natural selection. Detecting traces of selection may be easier in tree populations than in many other species. Association studies between genotypes and phenotypes are proving to be useful tools for functional genomics.  相似文献   

10.
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch (1977). Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR, ...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program.  相似文献   

11.
12.
URec is a software based on a concept of unrooted reconciliation. It can be used to reconcile a set of unrooted gene trees with a rooted species tree or a set of rooted species trees. Moreover, it computes detailed distribution of gene duplications and gene losses in a species tree. It can be used to infer optimal species phylogenies for a given set of gene trees. URec is implemented in C++ and can be easily compiled under Unix and Windows systems. Availability: Software is freely available for download from our website at http://bioputer.mimuw.edu.pl/~gorecki/urec. This webpage also contains Windows executables and a number of advanced examples with explanations.  相似文献   

13.
Given a gene tree and a species tree, a coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. Each pair consisting of a gene tree topology and a species tree topology has some number of possible coalescent histories. Here we show that, for each n≥7, there exist a species tree topology S and a gene tree topology GS, both with n leaves, for which the number of coalescent histories exceeds the corresponding number of coalescent histories when the species tree topology is S and the gene tree topology is also S. This result has the interpretation that the gene tree topology G discordant with the species tree topology S can be produced by the evolutionary process in more ways than can the gene tree topology that matches the species tree topology, providing further insight into the surprising combinatorial properties of gene trees that arise from their joint consideration with species trees.  相似文献   

14.
A Robinson-Foulds (RF) supertree for a collection of input trees is a tree containing all the species in the input trees that is at minimum total RF distance to the input trees. Thus, an RF supertree is consistent with the maximum number of splits in the input trees. Constructing RF supertrees for rooted and unrooted data is NP-hard. Nevertheless, effective local search heuristics have been developed for the restricted case where the input trees and the supertree are rooted. We describe new heuristics, based on the Edge Contract and Refine (ECR) operation, that remove this restriction, thereby expanding the utility of RF supertrees. Our experimental results on simulated and empirical data sets show that our unrooted local search algorithms yield better supertrees than those obtained from MRP and rooted RF heuristics in terms of total RF distance to the input trees and, for simulated data, in terms of RF distance to the true tree.  相似文献   

15.
16.
Gene duplication can increase an organism's ability to mask the effect of deleterious alleles present in the population, but this is typically a small effect when the source of the genetic variation is mutation. Migration can introduce orders of magnitude more deleterious alleles per generation and may therefore be an important force acting on the structure of genomes. Using formal analytical methods, we study the invasion of haplotypes containing two copies of the resident allele, assuming that a single-locus equilibrium is already established in a continent-island model of migration. Provided that the immigrant allele can be completely masked by multiple functional gene copies, a new duplication will deterministically spread so long as duplicate haplotypes are, on average, fitter than single-copy haplotypes. When fitness depends on the number of immigrant allele copies and their masking ability then the threshold for invasion depends on the rate of immigration and the rate of recombination between the gene copies. Results from several special cases, including formation of protein dimers and Dobzhansky-Muller incompatibilities, suggest that duplications can invade in a wide range of selection regimes. We hypothesize that duplication in response to gene flow may provide an explanation for the high levels of polymorphism in gene copy number observed in natural populations.  相似文献   

17.

Background  

In the genomic age, gene trees may contain large amounts of data making them hard to read and understand. Therefore, an automated simplification is important.  相似文献   

18.
19.

Background

The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.

Results

We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.

Conclusions

Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.
  相似文献   

20.
基因倍增研究进展   总被引:2,自引:0,他引:2  
李鸿健  谭军 《生命科学》2006,18(2):150-154
基因倍增是指DNA片段在基因组中复制出一个或更多的拷贝,这种DNA片段可以是一小段基因组序列、整条染色体,甚至是整个基因组。基因倍增是基因组进化最主要的驱动力之一,是产生具有新功能的基因和进化出新物种的主要原因之一。本文综述了脊椎动物、模式植物和酵母在进化过程中基因倍增研究领域的最新进展,并讨论了基因倍增研究的发展方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号