首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many different phylogenetic clustering techniques are used currently. One approach is to first determine the topology with a common clustering method and then calculate the branch lengths of the tree. If the resulting tree is not optimal exchanging tree branches can make some local changes in the tree topology. The whole process can be iterated until a satisfactory result has been obtained. The efficiency of this method fully depends on the initially generated tree. Although local changes are made, the optimal tree will never be found if the initial tree is poorly chosen. In this article, genetic algorithms are applied such that the optimal tree can be found even with a bad initial tree topology. This tree generating method is tested by comparing its results with the results of the FITCH program in the PHYLIP software package. Two simulated data sets and a real data set are used.  相似文献   

2.
Placement of the mitochondrial branch on the tree of life has been problematic. Sparse sampling, the uncertainty of how lateral gene transfer might overwrite phylogenetic signals, and the uncertainty of phylogenetic inference have all contributed to the issue. Here we address this issue using a supertree approach and completed genomic sequences. We first determine that a sensible alpha-proteobacterial phylogenetic tree exists and that it can confidently be inferred using orthologous genes. We show that congruence across these orthologous gene trees is significantly better than might be expected by random chance. There is some evidence of horizontal gene transfer within the alpha-proteobacteria, but it appears to be restricted to a minority of genes ( approximately 23%) most of whom ( approximately 74%) can be categorized as operational. This means that placement of the mitochondrion should not be excessively hampered by interspecies gene transfer. We then show that there is a consistently strong signal for placement of the mitochondrion on this tree and that this placement is relatively insensitive to methodological approach or data set. A concatenated alignment was created consisting of 15 mitochondrion-encoded proteins that are unlikely to have undergone any lateral gene transfer in the timeline under consideration. This alignment infers that the sister group of the mitochondria, for the taxa that have been sampled, is the order Rickettsiales.  相似文献   

3.
A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. In a seminal paper, Wang et al.(1) studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galled-tree". They gave a polynomial-time algorithm that was intended to determine whether or not a set of sequences could be generated on galled-tree. Unfortunately, the algorithm by Wang et al.(1) is incomplete and does not constitute a necessary test for the existence of a galled-tree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galled-tree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations. We also prove that when there is a galled-tree for the data, the galled-tree minimizing the number of recombinations is "essentially unique". We also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.(8) PowerPoint slides of the conference talk can be found at our website.(7).  相似文献   

4.
The field of microbial phylogenetics has questioned the feasibility of using a tree‐like structure to the describe microbial evolution. This debate centres on two main points. First, because microorganisms are able to transfer genes from one to another in zero generations (horizontal gene transfer, or HGT), the use of molecular characters to perform phylogenetic analyses will yield an erroneous topology and HGT clearly makes the evolution of microorganisms non tree‐like. Second, the use of concatenated gene sequences in a total evidence approach to phylogenetic systematics is a verificationist endeavour, the aim of which is to bolster support. However, the goal of the total evidence approach to phylogenetic research is based in the idea of increasing explanatory power over background knowledge through test and corroboration, rather than to bolster support for nodes in a tree. In this context, the testing of phylogenetic data is a falsificationist endeavour that includes the possibility of not rejecting the null hypothesis that there is no tree‐like structure in molecular phylogenetic data. We discuss several tests that aim to test rigorously the hypothesis that a tree of life exists for microorganisms. We also discuss the philosophical ramifications of background knowledge and corroboration in microbial studies that need to be considered when suggesting that HGT confounds the tree of life. © The Willi Hennig Society 2009.  相似文献   

5.
Phenotypic behavior of a group of organisms can be studied using a range of molecular evolutionary tools that help to determine evolutionary relationships. Traditionally a gene or a set of gene sequences was used for generating phylogenetic trees. Incomplete evolutionary information in few selected genes causes problems in phylogenetic tree construction. Whole genomes are used as remedy. Now, the task is to identify the suitable parameters to extract the hidden information from whole genome sequences that truly represent evolutionary information. In this study we explored a random anchor (a stretch of 100 nucleotides) based approach (ABWGP) for finding distance between any two genomes, and used the distance estimates to compute evolutionary trees. A number of strains and species of Mycobacteria were used for this study. Anchor-derived parameters, such as cumulative normalized score, anchor order and indels were computed in a pair-wise manner, and the scores were used to compute distance/phylogenetic trees. The strength of branching was determined by bootstrap analysis. The terminal branches are clearly discernable using the distance estimates described here. In general, different measures gave similar trees except the trees based on indels. Overall the tree topology reflected the known biology of the organisms. This was also true for different strains of Escherichia coli. A new whole genome-based approach has been described here for studying evolutionary relationships among bacterial strains and species.  相似文献   

6.

Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree T, called a base tree, and adding arcs between arcs of T. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network N, we characterise the support trees of N, that is, the tree-based embeddings of N. We use this characterisation to give an explicit formula for the number of support trees of N when N is binary. This formula is written in terms of the components of a bipartite graph.

  相似文献   

7.
Many phylogenetic algorithms search the space of possible trees using topological rearrangements and some optimality criterion. FastME is such an approach that uses the {em balanced minimum evolution (BME)} principle, which computer studies have demonstrated to have high accuracy. FastME includes two variants: {em balanced subtree prune and regraft (BSPR)} and {em balanced nearest neighbor interchange (BNNI)}. These algorithms take as input a distance matrix and a putative phylogenetic tree. The tree is modified using SPR or NNI operations, respectively, to reduce the BME length relative to the distance matrix, until a tree with (locally) shortest BME length is found. Following computer simulations, it has been conjectured that BSPR and BNNI are consistent, i.e. for an input distance that is a tree-metric, they converge to the corresponding tree. We prove that the BSPR algorithm is consistent. Moreover, even if the input contains small errors relative to a tree-metric, we show that the BSPR algorithm still returns the corresponding tree. Whether BNNI is consistent remains open.  相似文献   

8.
Phenotypic behavior of a group of organisms can be studied using a range of molecular evolutionary tools that help to determine evolutionary relationships. Traditionally a gene or a set of gene sequences was used for generating phylogenetic trees. Incomplete evolutionary information in few selected genes causes problems in phylogenetic tree construction. Whole genomes are used as remedy. Now, the task is to identify the suitable parameters to extract the hidden information from whole genome sequences that truly represent evolutionary information. In this study we explored a random anchor (a stretch of 100 nucleotides) based approach (ABWGP) for finding distance between any two genomes, and used the distance estimates to compute evolutionary trees. A number of strains and species of Mycobacteria were used for this study. Anchor-derived parameters, such as cumulative normalized score, anchor order and indels were computed in a pair-wise manner, and the scores were used to compute distance/phylogenetic trees. The strength of branching was determined by bootstrap analysis. The terminal branches are clearly discernable using the distance estimates described here. In general, different measures gave similar trees except the trees based on indels. Overall the tree topology reflected the known biology of the organisms. This was also true for different strains of Escherichia coli. A new whole genome-based approach has been described here for studying evolutionary relationships among bacterial strains and species.  相似文献   

9.
Ollier S  Couteron P  Chessel D 《Biometrics》2006,62(2):471-477
In recent years, there has been an increased interest in studying the variability of a quantitative life-history trait across a set of species sharing a common phylogeny. However, such studies have suffered from an insufficient development of statistical methods aimed at decomposing the trait variance with respect to the topological structure of the tree. Here we propose a new and generic approach that expresses the topological properties of the phylogenetic tree via an orthonormal basis, which is further used to decompose the trait variance. Such a decomposition provides a structure function, referred to as an "orthogram," which is relevant to characterize in both graphical and statistical aspects the dependence of trait values on the topology of the tree ("phylogenetic dependence"). We also propose four complementary test statistics to be computed from orthogram values that help to diagnose both the intensity and the nature of phylogenetic dependence. The relevance of the method is illustrated by the analysis of three phylogenetic data sets, drawn from the literature and typifying contrasted levels and aspects of phylogenetic dependence. Freely available routines which have been programmed in the R framework are also proposed.  相似文献   

10.
The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non-zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality of such a generating set can be computed using a simple "degrees of freedom" formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set.  相似文献   

11.
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as metagenomics. The utility of these genome sequences is greatly enhanced when we have an understanding of how they are phylogenetically related to each other. Therefore, we here describe our efforts to reconstruct the phylogeny of all available bacterial and archaeal genomes. We identified 24, single-copy, ubiquitous genes suitable for this phylogenetic analysis. We used two approaches to combine the data for the 24 genes. First, we concatenated alignments of all genes into a single alignment from which a Maximum Likelihood (ML) tree was inferred using RAxML. Second, we used a relatively new approach to combining gene data, Bayesian Concordance Analysis (BCA), as implemented in the BUCKy software, in which the results of 24 single-gene phylogenetic analyses are used to generate a “primary concordance” tree. A comparison of the concatenated ML tree and the primary concordance (BUCKy) tree reveals that the two approaches give similar results, relative to a phylogenetic tree inferred from the 16S rRNA gene. After comparing the results and the methods used, we conclude that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.  相似文献   

12.
An important challenge for phylogenetic studies of closely related species is the existence of deep coalescence and gene tree heterogeneity. However, their effects can vary between species and they are often neglected in phylogenetic analyses. In addition, a practical problem in the reconstruction of shallow phylogenies is to determine the most efficient set of DNA markers for a reliable estimation. To address these questions, we conducted a multilocus simulation study using empirical values of nucleotide diversity and substitution rates obtained from a wide range of mammals and evaluated the performance of both gene tree and species tree approaches to recover the known speciation times and topological relationships. We first show that deep coalescence can be a serious problem, more than usually assumed, for the estimation of speciation times in mammals using traditional gene trees. Furthermore, we tested the performance of different sets of DNA markers in the determination of species trees using a coalescent approach. Although the best estimates of speciation times were obtained, as expected, with the use of an increasing number of nuclear loci, our results show that similar estimations can be obtained with a much lower number of genes and the incorporation of a mitochondrial marker, with its high information content. Thus, the use of the combined information of both nuclear and mitochondrial markers in a species tree framework is the most efficient option to estimate recent speciation times and, consequently, the underlying species tree.  相似文献   

13.
Estimating phylogenetic relationships among closely related species can be extremely difficult when there is incongruence among gene trees and between the gene trees and the species tree. Here we show that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism. This approach is applied to a group of montane Melanoplus grasshoppers for which genealogical discordance among loci and incomplete lineage sorting obscures any obvious phylogenetic relationships among species. Unlike traditional treatments where gene trees estimated using standard phylogenetic methods are implicitly equated with the species tree, with the coalescent-based approach the species tree is modeled probabilistically from the estimated gene trees. The estimated species phylogeny (the ESP) is calculated for the grasshoppers from multiple gene trees reconstructed for nuclear loci and a mitochondrial gene. This empirical application is coupled with a simulation study to explore the performance of the coalescent-based approach. Specifically, we test the accuracy of the ESP given the data based on analyses of simulated data matching the multilocus data collected in Melanoplus (i.e., data were simulated for each locus with the same number of base pairs and locus-specific mutational models). The results of the study show that ESPs can be computed using the coalescent-based approach long before reciprocal monophyly has been achieved, and that these statistical estimates are accurate. This contrasts with analyses of the empirical data collected in Melanoplus and simulated data based on concatenation of multiple loci, for which the incomplete lineage sorting of recently diverged species posed significant problems. The strengths and potential challenges associated with incorporating an explicit model of gene-lineage coalescence into the phylogenetic procedure to obtain an ESP, as illustrated by application to Melanoplus, versus concatenation and consensus approaches are discussed. This study represents a fundamental shift in how species relationships are estimated - the relationship between the gene trees and the species phylogeny is modeled probabilistically rather than equating gene trees with a species tree.  相似文献   

14.
One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next‐generation sequencing techniques such as RAD‐seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD‐seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next‐generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa.  相似文献   

15.
We developed a new approach for the reconstruction of phylogenetic trees using ant colony optimization metaheuristics. A tree is constructed using a fully connected graph and the problem is approached similarly to the well-known traveling salesman problem. This methodology was used to develop an algorithm for constructing a phylogenetic tree using a pheromone matrix. Two data sets were tested with the algorithm: complete mitochondrial genomes from mammals and DNA sequences of the p53 gene from several eutherians. This new methodology was found to be superior to other well-known softwares, at least for this data set. These results are very promising and suggest more efforts for further developments.  相似文献   

16.
We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127--150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151--166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.  相似文献   

17.
Blair JE  Coffey MD  Martin FN 《PloS one》2012,7(5):e37003
To better understand the evolutionary history of a group of organisms, an accurate estimate of the species phylogeny must be known. Traditionally, gene trees have served as a proxy for the species tree, although it was acknowledged early on that these trees represented different evolutionary processes. Discordances among gene trees and between the gene trees and the species tree are also expected in closely related species that have rapidly diverged, due to processes such as the incomplete sorting of ancestral polymorphisms. Recently, methods have been developed for the explicit estimation of species trees, using information from multilocus gene trees while accommodating heterogeneity among them. Here we have used three distinct approaches to estimate the species tree for five Phytophthora pathogens, including P. infestans, the causal agent of late blight disease in potato and tomato. Our concatenation-based "supergene" approach was unable to resolve relationships even with data from both the nuclear and mitochondrial genomes, and from multiple isolates per species. Our multispecies coalescent approach using both Bayesian and maximum likelihood methods was able to estimate a moderately supported species tree showing a close relationship among P. infestans, P. andina, and P. ipomoeae. The topology of the species tree was also identical to the dominant phylogenetic history estimated in our third approach, Bayesian concordance analysis. Our results support previous suggestions that P. andina is a hybrid species, with P. infestans representing one parental lineage. The other parental lineage is not known, but represents an independent evolutionary lineage more closely related to P. ipomoeae. While all five species likely originated in the New World, further study is needed to determine when and under what conditions this hybridization event may have occurred.  相似文献   

18.
Mysticetes or baleen whales are comprised of four groups: Eschrichtiidae, Neobalaenidae, Balaenidae, and Balaenopteridae. Various phylogenetic hypotheses among these four groups have been proposed. Previous studies have not satisfactorily determined relationships among the four groups with a high degree of confidence. The objective of this study is to determine the relationships among the mysticete whales. Mitochondrial and nuclear DNA were sequenced for phylogenetic analysis. Most species relationships determined using these data were well resolved and congruent. Balaenidae is the most basal group and Neobalaenidae is the second most basal and sister group to the balaenopterid-eschrichtiid clade. In this phylogenetic study, the resolution of Eschrichtiidae with two main lineages of Balaenopteridae was problematic. Some data partitions placed this group within the balaenopterids, and other partitions placed it as a sister taxon to the balaenopterids. An additive likelihood approach was used to determine the most optimal trees. Although it was not found in the combined phylogenetic analyses, the "best" tree found under the additive likelihood approach was one with a monophyletic Balaenopteridae.  相似文献   

19.
We propose a new type of unsupervised, growing, self-organizing neural network that expands itself by following the taxonomic relationships that exist among the sequences being classified. The binary tree topology of this neutral network, contrary to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. This novel approach presents a number of other interesting properties, such as a time for convergence which is, approximately, a lineal function of the number of sequences. Computer simulation and a real example show that the algorithm accurately finds the phylogenetic tree that relates the data. All this makes the neural network presented here an excellent tool for phylogenetic analysis of a large number of sequences. Received: 14 May 1996 / Accepted: 6 August 1996  相似文献   

20.
Cheng Q  Su Z  Zhong Y  Gu X 《Gene》2009,441(1-2):156-162
Recent studies have shown that heterogeneous evolution may mislead phylogenetic analysis, which has been neglected for a long time. We evaluate the effect of heterogeneous evolution on phylogenetic analysis, using 18 fish mitogenomic coding sequences as an example. Using the software DIVERGE, we identify 198 amino acid sites that have experienced heterogeneous evolution. After removing these sites, the rest of sites are shown to be virtually homogeneous in the evolutionary rate. There are some differences between phylogenetic trees built with heterogeneous sites ("before tree") and without heterogeneous sites ("after tree"). Our study demonstrates that for phylogenetic reconstruction, an effective approach is to identify and remove sites with heterogeneous evolution, and suggests that researchers can use the software DIVERGE to remove the influence of heterogeneous evolution before reconstructing phylogenetic trees.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号