首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.  相似文献   

2.
Application of phylogenetic networks in evolutionary studies   总被引:42,自引:0,他引:42  
The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.  相似文献   

3.
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known graph isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks.  相似文献   

4.
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.  相似文献   

5.

Background

Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.

Results

In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.

Conclusion

The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.  相似文献   

6.
Yu Y  Degnan JH  Nakhleh L 《PLoS genetics》2012,8(4):e1002660
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.  相似文献   

7.

Background  

Phylogenetic trees resulting from molecular phylogenetic analysis are available in Newick format from specialized databases but when it comes to phylogenetic networks, which provide an explicit representation of reticulate evolutionary events such as recombination, hybridization or lateral gene transfer, the lack of a standard format for their representation has hindered the publication of explicit phylogenetic networks in the specialized literature and their incorporation in specialized databases. Two different proposals to represent phylogenetic networks exist: as a single Newick string (where each hybrid node is splitted once for each parent) or as a set of Newick strings (one for each hybrid node plus another one for the phylogenetic network).  相似文献   

8.
Until recently, rigorously reconstructing the many hybrid speciation events in plants has not been practical because of the limited number of molecular markers available for plant phylogenetic reconstruction and the lack of good, biologically based methods for inferring reticulation (network) events. This situation should change rapidly with the development of multiple nuclear markers for phylogenetic reconstruction and new methods for reconstructing reticulate evolution. These developments will necessitate a much greater incorporation of population genetics into phylogenetic reconstruction than has been common. Population genetic events such as gene duplication coupled with lineage sorting and meiotic and sexual recombination have always had the potential to affect phylogenetic inference. For tree reconstruction, these problems are usually minimized by using uniparental markers and nuclear markers that undergo rapid concerted evolution. Because reconstruction of reticulate speciation events will require nuclear markers that lack these characteristics, effects of population genetics on phylogenetic inference will need to be addressed directly. Current models and methods that allow hybrid speciation to be detected and reconstructed are discussed, with a focus on how lineage sorting and meiotic and sexual recombination affect network reconstruction. Approaches that would allow inference of phylogenetic networks in their presence are suggested.  相似文献   

9.
Phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i.e. uniquely describe) the network that induces it? For the large class of level-1 (phylogenetic) networks we characterize those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. In addition, we show that three known distance measures for comparing phylogenetic networks are in fact metrics on the resulting subclass and give the diameter for two of them. Finally, we investigate the related concept of indistinguishability and also show that many properties enjoyed by level-1 networks are not satisfied by networks of higher level.  相似文献   

10.
系统发育网络目前主要应用于表达种上复杂的网状进化关系、种内个体及群体之间的关系以及相互矛盾数据集的系统发育分析结果3个方面。本文在综述网状进化现象的基础上,介绍了目前用于构建系统发育网络的几种常用方法,包括T-Rex软件包的网状图法,Tcs软件中的统计简约法,Splitstrees4的中间网法、二分裂法和邻接网法。  相似文献   

11.
We present new methods for reconstructing reticulate evolution of species due to events such as horizontal transfer or hybrid speciation; both methods are based upon extensions of Wayne Maddison's approach in his seminal 1997 paper. Our first method is a polynomial time algorithm for constructing phylogenetic networks from two gene trees contained inside the network.We allow the network to have an arbitrary number of reticulations, but we limit the reticulation in the network so that the cycles in the network are node-disjoint ("galled"). Our second method is a polynomial time algorithm for constructing networks with one reticulation, where we allow for errors in the estimated gene trees. Using simulations, we demonstrate improved performance of this method over both NeighborNet and Maddison's method.  相似文献   

12.
13.

Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree T, called a base tree, and adding arcs between arcs of T. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network N, we characterise the support trees of N, that is, the tree-based embeddings of N. We use this characterisation to give an explicit formula for the number of support trees of N when N is binary. This formula is written in terms of the components of a bipartite graph.

  相似文献   

14.
Phylogenetic networks aim to represent the evolutionary history of taxa. Within these, reticulate networks are explicitly able to accommodate evolutionary events like recombination, hybridization, or lateral gene transfer. Although several metrics exist to compare phylogenetic networks, they make several assumptions regarding the nature of the networks that are not likely to be fulfilled by the evolutionary process. In order to characterize the potential disagreement between the algorithms and the biology, we have used the coalescent with recombination to build the type of networks produced by reticulate evolution and classified them as regular, tree sibling, tree child, or galled trees. We show that, as expected, the complexity of these reticulate networks is a function of the population recombination rate. At small recombination rates, most of the networks produced are already more complex than regular or tree sibling networks, whereas with moderate and large recombination rates, no network fit into any of the standard classes. We conclude that new metrics still need to be devised in order to properly compare two phylogenetic networks that have arisen from reticulating evolutionary process.  相似文献   

15.
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that $\text{ level-1 }$ phylogenetic networks are encoded by their trinets, and also conjectured that all “recoverable” rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.  相似文献   

16.
Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations.  相似文献   

17.
Mardulyn P 《Molecular ecology》2012,21(14):3385-3390
Phylogenetic trees and networks are both used in the scientific literature to display DNA sequence variation at the intraspecific level. Should we rather use trees or networks? I argue that the process of inferring the most parsimonious genealogical relationships among a set of DNA sequences should be dissociated from the problem of displaying this information in a graph. A network graph is probably more appropriate than a strict consensus tree if many alternative, equally most parsimonious, genealogies are to be included. Within the maximum parsimony framework, current phylogenetic inference and network‐building algorithms are both unable to guarantee the finding of all most parsimonious (MP) connections. In fact, each approach can find MP connections that the other does not. Although it should be possible to improve at least the maximum parsimony approach, current implementations of these algorithms are such that it is advisable to use both approaches to increase the probability of finding all possible MP connections among a set of DNA sequences.  相似文献   

18.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

19.
The evolutionary history of a collection of species is usually represented by a phylogenetic tree. Sometimes, phylogenetic networks are used as a means of representing reticulate evolution or of showing uncertainty and incompatibilities in evolutionary datasets. This is often done using unrooted phylogenetic networks such as split networks, due in part, to the availability of software (SplitsTree) for their computation and visualization. In this paper we discuss the problem of drawing rooted phylogenetic networks as cladograms or phylograms in a number of different views that are commonly used for rooted trees. Implementations of the algorithms are available in new releases of the Dendroscope and SplitsTree programs.  相似文献   

20.
Pedigrees illustrate the genealogical relationships among individuals, and phylogenies do the same for groups of organisms (such as species, genera, etc.). Here, I provide a brief survey of current concepts and methods for calculating and displaying genealogical relationships. These relationships have long been recognized to be reticulating, rather than strictly divergent, and so both pedigrees and phylogenies are correctly treated as networks rather than trees. However, currently most pedigrees are instead presented as “family trees”, and most phylogenies are presented as phylogenetic trees. Nevertheless, the historical development of concepts shows that networks pre-dated trees in most fields of biology, including the study of pedigrees, biology theory, and biology practice, as well as in historical linguistics in the social sciences. Trees were actually introduced in order to provide a simpler conceptual model for historical relationships, since trees are a specific type of simple network. Computationally, trees and networks are a part of graph theory, consisting of nodes connected by edges. In this mathematical context they differ solely in the absence or presence of reticulation nodes, respectively. There are two types of graphs that can be called phylogenetic networks: (1) rooted evolutionary networks, and (2) unrooted affinity networks. There are quite a few computational methods for unrooted networks, which have two main roles in phylogenetics: (a) they act as a generic form of multivariate data display; and (b) they are used specifically to represent haplotype networks. Evolutionary networks are more difficult to infer and analyse, as there is no mathematical algorithm for reconstructing unique historical events. There is thus currently no coherent analytical framework for computing such networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号