首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose a new method (uninode coding) for coding duplicate (paralogous) genes to infer species trees. Uninode coding incorporates data from duplicated and unduplicated gene copies in phylogenetic analyses of taxa. Uninode coding utilizes global parsimony through the inclusion of both duplicated and unduplicated gene copies, allows one to code all data sources from a taxon into a single terminal, and overcomes problems of character dependence among duplicated and unduplicated gene copies. We present an example of uninode coding using the phytochrome A and phytochrome C data from a study by Donoghue and Mathews.  相似文献   

2.
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this discussion we introduce a number of recent developments of gene tree parsimony methods overlooked by Simmons and Freudenstein. Finally, we present a re-analysis of data from that produces a more reasonable phylogeny than that found by Simmons and Freudenstein, suggesting that gene tree parsimony outperforms uninode coding, at least on these data.  相似文献   

3.
Conserved genes have found their way into the mainstream of molecular systematics. Many of these genes are members of multigene families. A difficulty with using single genes of multigene families for phylogenetic inference is that genes from one species may be paralogous to those from another taxon. We focus attention on this problem using heat shock 70 (HSP70) genes. Using polymerase chain reaction techniques with genomic DNA, we isolated and sequenced 123 distinct sequences from 12 species of sharks. Phylogenetic analysis indicated that the sequences cluster with constituitively expressed cytoplasmic heat shock-like genes. Three highly divergent gene clades were sampled. A number of similar sequences were sampled from each species within each distinct gene clade. Comparison of published species trees with an HSP70 gene tree inferred using Bayesian phylogenetic analysis revealed several cases of gene duplication and differential sorting of gene lineages within this group of sharks. Gene tree parsimony based on the objective criteria of duplication and losses showed that previously published hypotheses of species relationships and two novel hypothesis based on Bayesian phylogenetics were concordant with the history of HSP70 gene duplication and loss. By contrast, two published hypotheses based on morphological data were not significantly different from the null hypothesis of a random association between species relatedness and the HSP70 gene tree. These results suggest that gene tree parsimony using data from multigene families can be used for inferring species relationships or testing published alternative hypotheses. More importantly, the results suggest that systematic studies relying on phylogenetic inferences from HSP70 genes may by plagued by unrecognized paralogy of sampled genes. Our results underscore the distinction between gene and species trees and highlight an underappreciated source of discordance between gene trees and organismal phylogeny, i.e., unrecognized paralogy of sampled genes.  相似文献   

4.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

5.
The kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family and incorporation of amino acid and gap characters scored from lineage‐specific regions affect gene‐tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling, four alignment programs, and two methods of tree construction. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene‐tree inference, even for divergent sequences. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, Clustal may be expected to output alignments that result in the greatest gene‐tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene‐tree analyses relative to parsimony‐jackknife gene‐tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN‐T or MAFFT, analyzing the appropriate characters using parsimony, and assessing branch support using the jackknife, is more appropriate for inferring gene trees of divergent gene families. © The Willi Hennig Society 2007.  相似文献   

6.
7.
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.  相似文献   

8.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified.  相似文献   

9.
AFLPs (and to a lesser extent ISSRs and RAPDs) are increasingly being used for phylogenetic inference among closely related species. Presence/absence characters for each AFLP allele treat all absences as homologous to one another. With three or more alleles, terminals are grouped by their shared absence of alleles in character-based phylogenetic-inference methods in a manner that is not redundant with their shared presence of an alternative allele. We conducted simulations to quantify how severe the negative effect of using presence/absence characters of individual bands is for phylogenetic inference relative to standard multistate characters. We examined alternative tree topologies, relative branch lengths, numbers of characters, rates of evolution, and numbers of alternative alleles, using both parsimony and Nei-and-Li distance analyses. Multistate parsimony generally outperformed presence/absence parsimony, which in turn outperformed Nei-and-Li distance. Increasing the character-state space (i.e., the number of alternative character states available) was found to be advantageous for all three methods of analysis examined, but was most advantageous for multistate parsimony. However, the advantage of multistate parsimony relative to Nei-and-Li distance decreased when applied to more divergent characters. More parsimony-informative variation generally alleviated the problem associated with scoring multistate characters as presence/absence characters. The ensemble consistency index was lower for presence/absence characters relative to multistate characters.  相似文献   

10.
Conoesucidae (Trichoptera, Insecta) are restricted to SE Australia, Tasmania and New Zealand. The family includes 42 described species in 12 genera, and each genus is endemic to either New Zealand or Australia. Although monophyly has been previously assumed, no morphological characters have been proposed to represent synapomorphies for the group. We collected molecular data from two mitochondrial genes (16S and cytochrome oxidase I), one nuclear gene (elongation factor 1-α) (2237–2277 bp in total), and 12 morphological characters to produce the first phylogeny of the family. We combined the molecular and morphological characters and performed both a maximum parsimony analysis and a Bayesian analysis to test the monophyly of the family, and to hypothesize the phylogeny among its genera. The parsimony analysis revealed a single most parsimonious tree with Conoesucidae being a monophyletic taxon and sistergroup to the Calocidae. The Bayesian inference produced a distribution of trees, the consensus of which is supported with posterior probabilities of 100% for 15 out of 22 possible ingroup clades including the most basal branch of the family, indicating strong support for a monophyletic Conoesucidae. The most parsimonious tree and the tree from the Bayesian analysis were identical except that the ingroup genus Pycnocentria changed position by jumping to a neighbouring clade. Based on the assumption that the ancestral conoesucid species was present on both New Zealand and Australia, a biogeographical analysis using the dispersal-vicariance criteria demonstrated that one or two (depending on which of the two phylogenetic reconstructions were applied) sympatric speciation events took place on New Zealand prior to a single, late dispersal from New Zealand to Australia.  相似文献   

11.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

12.
The structural genes for nitrogenase, nifK, nifD, and nifH, are crucial for nitrogen fixation. Previous phylogenetic analysis of the amino acid sequence of nifH suggested that this gene had been horizontally transferred from a proteobacterium to the gram-positive/cyanobacterial clade, although the confounding effects of paralogous comparisons made interpretation of the data difficult. An additional test of nif gene horizontal transfer using nifD was made, but the NifD phylogeny lacked resolution. Here nif gene phylogeny is addressed with a phylogenetic analysis of a third and longer nif gene, nifK. As part of the study, the nifK gene of the key taxon Frankia was sequenced. Parsimony and some distance analyses of the nifK amino acid sequences provide support for vertical descent of nifK, but other distance trees provide support for the lateral transfer of the gene. Bootstrap support was found for both hypotheses in all trees; the nifK data do not definitively favor one or the other hypothesis. A parsimony analysis of NifH provides support for horizontal transfer in accord with previous reports, although bootstrap analysis also shows some support for vertical descent of the orthologous nifH genes. A wider sampling of taxa and more sophisticated methods of phylogenetic inference are needed to understand the evolution of nif genes. The nif genes may also be powerful phylogenetic tools. If nifK evolved by vertical descent, it provides strong evidence that the cyanobacteria and proteobacteria are sister groups to the exclusion of the firmicutes, whereas 16S rRNA sequences are unable to resolve the relationships of these three major eubacterial lineages.   相似文献   

13.
The phylogenetic placement of the monotypic crab plover Dromasardeola (Aves, Charadriiformes) remains controversial. Phylogenetic analysis of anatomical and behavioral traits using phenetic and cladistic methods of tree inference have resulted in conflicting tree topologies, suggesting a close association of Dromas to members of different suborders and lineages within Charadriiformes. Here, we revisited the issue by applying Bayesian and parsimony methods of tree inference to 2,012 anatomical and 5,183 molecular characters to a set of 22 shorebird genera (including Turnix). Our results suggest that Bayesian analysis of anatomical characters does not resolve the phylogenetic relationship of shorebirds with strong statistical support. In contrast, Bayesian and parsimony tree inference from molecular data provided much stronger support for the phylogenetic relationships within shorebirds, and support a sister relationship of Dromas to Glareolidae (pratincoles and coursers), in agreement with previously published DNA-DNA hybridization studies.  相似文献   

14.
Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.  相似文献   

15.
MOTIVATION: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. RESULTS: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are approximately O(n) for a gene tree of sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. AVAILABILITY: http://www.genetics.wustl.edu/eddy/forester.  相似文献   

16.
One of the lasting controversies in phylogenetic inference is the degree to which specific evolutionary models should influence the choice of methods. Model‐based approaches to phylogenetic inference (likelihood, Bayesian) are defended on the premise that without explicit statistical models there is no science, and parsimony is defended on the grounds that it provides the best rationalization of the data, while refraining from assigning specific probabilities to trees or character‐state reconstructions. Authors who favour model‐based approaches often focus on the statistical properties of the methods and models themselves, but this is of only limited use in deciding the best method for phylogenetic inference—such decision also requires considering the conditions of evolution that prevail in nature. Another approach is to compare the performance of parsimony and model‐based methods in simulations, which traditionally have been used to defend the use of models of evolution for DNA sequences. Some recent papers, however, have promoted the use of model‐based approaches to phylogenetic inference for discrete morphological data as well. These papers simulated data under models already known to be unfavourable to parsimony, and modelled morphological evolution as if it evolved just like DNA, with probabilities of change for all characters changing in concert along tree branches. The present paper discusses these issues, showing that under reasonable and less restrictive models of evolution for discrete characters, equally weighted parsimony performs as well or better than model‐based methods, and that parsimony under implied weights clearly outperforms all other methods.  相似文献   

17.
As an alternative to parsimony analyses, stochastic models have been proposed ( [Lewis, 2001] and [Nylander et al., 2004]) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed.Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.  相似文献   

18.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.  相似文献   

19.

Background  

The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.  相似文献   

20.
Toward the goal of recovering the phylogenetic relationships among elapid snakes, we separately found the shortest trees from the amino acid sequences for the venom proteins phospholipase A2and the short neurotoxin, collectively representing 32 species in 16 genera. We then applied a method we term gene tree parsimony for inferring species trees from gene trees that works by finding the species tree which minimizes the number of deep coalescences or gene duplications plus unsampled sequences necessary to fit each gene tree to the species tree. This procedure, which is both logical and generally applicable, avoids many of the problems of previous approaches for inferring species trees from gene trees. The results support a division of the elapids examined into sister groups of the Australian and marine (laticaudines and hydrophiines) species, and the African and Asian species. Within the former clade, the sea snakes are shown to be diphyletic, with the laticaudines and hydrophiines having separate origins. This finding is corroborated by previous studies, which provide support for the usefulness of gene tree parsimony.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号