首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 62 毫秒
1.
Summary We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more. characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound.The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 9603440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a tree minimal but the method will give an upper and lower bound on the length of minimal trees.  相似文献   

2.
Vos RA 《Systematic biology》2003,52(3):368-373
The existence of multiple likelihood maxima necessitates algorithms that explore a large part of the tree space. However, because of computational constraints, stepwise addition-based tree-searching methods do not allow for this exploration in reasonable time. Here, I present an algorithm that increases the speed at which the likelihood landscape can be explored. The iterative algorithm combines the computational speed of distance-based tree construction methods to arrive at approximations of the global optimum with the accuracy of optimality criterion based branch-swapping methods to improve on the result of the starting tree. The algorithm moves between local optima by iteratively perturbing the tree landscape through a process of reweighting randomly drawn samples of the underlying sequence data set. Tests on simulated and real data sets demonstrated that the optimal solution obtained using stepwise addition-based heuristic searches was found faster using the algorithm presented here. Tests on a previously published data set that established the presence of tree islands under maximum likelihood demonstrated that the algorithm identifies the same tree islands in a shorter amount of time than that needed using stepwise addition. The algorithm can be readily applied using standard software for phylogenetic inference.  相似文献   

3.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

4.
Abstract Absolute criteria for evaluating cladistic analyses are useful, not only because cladistic algorithms impose structure, but also because applications of cladistic results demand some assessment of the degree of corroboration of the cladogram. Here, a means of quantitative evaluation is presented based on tree length. The length of the most-parsimonious tree reflects the degree to which the observed characters co-vary such that a single tree topology can explain shared character states among the taxa. This “cladistic covariation” can be quantified by comparing the length of the most parsimonious tree for the observed data set to that found for data sets with random covariation of characters. A random data set is defined as one in which the original number of characters and their character states are maintained, but for each character, the states are randomly reassigned to the taxa. The cladistic permutation tail probability, PTP, is defined as the estimate of the proportion of times that a tree can be found as short or shorter than the original tree. Significant cladistic covariation exists if the PTP is less than a prescribed value, for example, 0.05. In case studies based on molecular and morphological data sets, application of the PTP shows that:
  • 1 In the comparison of four different molecular data sets for orders of mammals, the sequence data set for alpha hemoglobin does not have significant cladistic covariation, while that for alpha crystallin is highly significant. However, when each data set was reduced to the 11 common taxa in order to standardize comparison, reduced levels of cladistic covariation, with no clear superiority of the alpha crystallin data, were found. Morphological data for these 11 taxa had a highly significant PTP, producing a tree roughly congruent with those for the three molecular sets with marginal or significant PTP values. Merging of all data sets, with the exclusion of the poorly structured alpha hemoglobin data, produced a data set with a significant PTP, and provides an estimate of the phylogenetic relationships among these 11 orders of mammals.
  • 2 In an analysis of lactalbumin and lysozyme DNA sequence data for four taxa, purine-pyrimidine coding yields a data set with significant cladistic covariation, while other codings fail. The data for codon position 3 taken alone exhibit the strongest cladistic covariation.
  • 3 A data set based on flavonoids in taxa of Polygonum initially yields a significant PTP; however, deletion of identically scored taxa leaves no significant cladistic covariation.
  • 4 For mitochondrial DNA data on population genome types for four species of the crested newt, there is significant cladistic covariation for the set of all genome types, and among the five mtDNA genome types within one of the species. However, a conditional PTP test that assumes species monophyly shows that no significant cladistic covariation exists among the fur species for these data.
  • 5 In an application of the test to a group of freshwater insects, as preliminary to biological monitoring, individual subsets of the taxonomic data representing larval, pupal, and adult stages had non-significant PTPs, while the complete data set showed significant cladistic structure.
  相似文献   

5.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.  相似文献   

6.
The molecular relationship of placental mammals has attracted great interest in recent years. However, 2 crucial and conflicting hypotheses remain, one with respect to the position of the root of the eutherian tree and the other the relationship between the orders Rodentia, Lagomorpha (rabbits, hares), and Primates. Although most mitochondrial (mt) analyses have suggested that rodents have a basal position in the eutherian tree, some nuclear data in combination with mt-rRNA genes have placed the root on the so-called African clade or on a branch that includes this clade and the Xenarthra (e.g., anteater and armadillo). In order to generate a new and independent set of molecular data for phylogenetic analysis, we have established cDNA sequences from different tissues of various mammalian species. With this in mind, we have identified and sequenced 8 housekeeping genes with moderately fast rate of evolution from 22 placental mammals, representing 11 orders. In order to determine the root of the eutherian tree, the same genes were also sequenced for 3 marsupial species, which were used as outgroup. Inconsistent with the analyses of nuclear + mt-rRNA gene data, the current data set did not favor a basal position of the African clade or Xenarthra in the eutherian tree. Similarly, by joining rodents and lagomorphs on the same basal branch (Glires hypothesis), the data set is also inconsistent with the tree commonly favored in mtDNA analyses. The analyses of the currently established sequences have helped examination of problematic parts in the eutherian tree at the same time as they caution against suggestions that have claimed that basal eutherian relationships have been conclusively settled.  相似文献   

7.
8.
9.
Abstract— The stability of each clade resolved by a data set can be assessed as the minimum number of characters that, when removed, cause resolution of the clade to be lost; a clade is regarded as having been lost when it does occur in the strict consensus tree. The clade stability index (CSI) is the ratio of this minimum number of characters to the number of informative characters in the data set. The CSI of a clade can range from 0 (absence from the consensus tree of the complete data set) to 1 (all informative characters must be removed for the clade to fail to be resolved). Minimum character removal scores are discoverable by a procedure known as successive character removal, in which separate cladistic analyses are conducted of all possible data sets derived by the removal of individual characters and character combinations of successively increasing number.  相似文献   

10.
Kozak et al. (2015, Syst. Biol., 64: 505) portrayed the inference of evolutionary history among Heliconius and allied butterfly genera as a particularly difficult problem for systematics due to prevalent gene conflict caused by interspecific reticulation. To control for this, Kozak et al. conducted a series of multispecies coalescent phylogenetic analyses that they claimed revealed pervasive conflict among markers, but ultimately chose as their preferred hypothesis a phylogenetic tree generated by the traditional supermatrix approach. Intrigued by this seemingly contradictory set of conclusions, we conducted further analyses focusing on two prevalent aspects of the data set: missing data and the uneven contribution of phylogenetic signal among markers. Here, we demonstrate that Kozak et al. overstated their findings of reticulation and that evidence of gene‐tree conflict is largely lacking. The distribution of intrinsic homoplasy and incongruence homoplasy in their data set does not follow the pattern expected if phylogenetic history had been obscured by pervasive horizontal gene flow; in fact, noise within individual gene partitions is ten times higher than the incongruence among gene partitions. We show that the patterns explained by Kozak et al. as a result of reticulation can be accounted for by missing data and homoplasy. We also find that although the preferred topology is resilient to missing data, measures of support are sensitive to, and strongly eroded by too many empty cells in the data matrix. Perhaps more importantly, we show that when some taxa are missing almost all characters, adding more genes to the data set provides little or no increase in support for the tree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号