首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.  相似文献   

2.
Investigations into the phylogenetics of closely related animal species are dominated by the use of mitochondrial DNA (mtDNA) sequence data. However, the near-ubiquitous use of mtDNA to infer phylogeny among closely related animal lineages is tempered by an increasing number of studies that document high rates of transfer of mtDNA genomes among closely related species through hybridization, leading to substantial discordance between phylogenies inferred from mtDNA and nuclear gene sequences. In addition, the recent development of methods that simultaneously infer a species phylogeny and estimate divergence times, while accounting for incongruence among individual gene trees, has ushered in a new era in the investigation of phylogeny among closely related species. In this study we assess if DNA sequence data sampled from a modest number of nuclear genes can resolve relationships of a species-rich clade of North American freshwater teleost fishes, the darters. We articulate and expand on a recently introduced method to infer a time-calibrated multi-species coalescent phylogeny using the computer program *BEAST. Our analyses result in well-resolved and strongly supported time-calibrated darter species tree. Contrary to the expectation that mtDNA will provide greater phylogenetic resolution than nuclear gene data; the darter species tree inferred exclusively from nuclear genes exhibits a higher frequency of strongly supported nodes than the mtDNA time-calibrated gene tree.  相似文献   

3.
Choosing among alternative trees of multigene families   总被引:4,自引:0,他引:4  
Estimation of gene trees is the first step in testing alternative hypotheses about the evolution of multigene families. The standard practice for inferring gene family history is to construct trees that meet some objective criteria based on the fit of the character state changes (nucleotide or amino acid changes) to the gene tree. Unfortunately, analysis of character state data can be misleading. In addition, this approach ignores information about the relationships of the species from which the genes have been sampled. In this paper I explore using statistics of fit between the character data and gene trees and the reconciliation of the gene and species trees for choosing among alternative evolutionary hypotheses of gene families. In particular, I advocate a two-pronged strategy for choosing among alternative gene trees. First, the character data are used to define a set of acceptable gene trees (i.e., trees that are not significantly different from the minimum length tree). Next, the set of acceptable gene trees is reconciled with a known species tree, and the gene tree requiring the fewest number of gene duplications and losses is adopted as the best estimate of evolutionary history. The approach is illustrated using three gene families: BMP, EGR, and LDH.  相似文献   

4.
Nye TM 《Systematic biology》2008,57(5):785-794
Phylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree." A meta-tree clusters phylogenies with similar topologies together in the same way that a phylogeny clusters species with similar DNA sequences. Leaf nodes on a meta-tree correspond to the original set of phylogenies given by some analysis, whereas interior nodes correspond to certain consensus topologies. The construction of meta-trees is motivated by analogy with construction of a most parsimonious tree for DNA data, but instead of using DNA letters, in a meta-tree the characters are partitions or splits of the set of taxa. An efficient algorithm for meta-tree construction is described that makes use of a known relationship between the majority consensus and parsimony in terms of gain and loss of splits. To illustrate these ideas meta-trees are constructed for two datasets: a set of gene trees for species of yeast and trees from a bootstrap analysis of a set of gene trees in ray-finned fish. A software tool for constructing meta-trees and comparing alternative phylogenies is available online, and the source code can be obtained from the author.  相似文献   

5.
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.  相似文献   

6.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

7.
Liu L  Pearl DK 《Systematic biology》2007,56(3):504-514
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.  相似文献   

8.
The estimation of a robust phylogeny is a necessary first step in understanding the biological diversification of the platyrrhines. Although the most recent phylogenies are generally robust, they differ from one another in the relationship between Aotus and other genera as well as in the relationship between Pitheciidae and other families. Here, we used coding and non-coding sequences to infer the species tree and embedded gene trees of the platyrrhine genera using the Bayesian Markov chain Monte Carlo method for the multispecies coalescent (?BEAST) for the first time and to compared the results with those of a Bayesian concatenated phylogenetic analysis. Our species tree, based on all available sequences, shows a closer phylogenetic relationship between Atelidae and Cebidae and a closer relationship between Aotus and the Cebidae clade. The posterior probabilities are lower for these conflictive tree nodes compared to those in the concatenated analysis; this finding could be explained by some gene trees showing no concordant topologies between Aotus and the other genera. Moreover, the topology of our species tree also differs from the findings of previous molecular and morphological studies regarding the position of Aotus. The existence of discrepancies between morphological data, gene trees and the species tree is widely reported and can be related to processes such as incomplete lineage sorting or selection. Although these processes are common in species trees with low divergence, they can also occur in species trees with deep and rapid divergence. The sources of the inconsistency of morphological and molecular traits with the species tree could be a main focus of further research on platyrrhines.  相似文献   

9.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

10.

Background  

Multilocus phylogenies can be used to infer the species tree of a group of closely related species. In species trees, the nodes represent the actual separation between species, thus providing essential information about their evolutionary history. In addition, multilocus phylogenies can help in analyses of species delimitation, gene flow and genetic differentiation within species. However, few adequate markers are available for such studies.  相似文献   

11.
URec is a software based on a concept of unrooted reconciliation. It can be used to reconcile a set of unrooted gene trees with a rooted species tree or a set of rooted species trees. Moreover, it computes detailed distribution of gene duplications and gene losses in a species tree. It can be used to infer optimal species phylogenies for a given set of gene trees. URec is implemented in C++ and can be easily compiled under Unix and Windows systems. Availability: Software is freely available for download from our website at http://bioputer.mimuw.edu.pl/~gorecki/urec. This webpage also contains Windows executables and a number of advanced examples with explanations.  相似文献   

12.
The problem of inferring confidence sets of gene trees is discussed without assuming that the substitution model or the branching pattern of any of the investigated trees is correct. In this case, widely used methods to compare genealogies can give highly contradicting results. Here, three methods to infer confidence sets that are robust against model misspecification are compared, including a new approach based on estimating the confidence in a specific tree using expected-likelihood weights. The power of the investigated methods is studied by analysing HIV-1 and mtDNA sequence data as well as simulated sequences. Finally, guidelines for choosing an appropriate method to compare multiple gene trees are provided.  相似文献   

13.
A previous phylogenetic study of paralogous nuclear low-copy granule-bound starch synthase (GBSSI) gene sequences from polyploid and diploid species in Geinae indicated that the clade has experienced two major allopolyploid events in its history. These were estimated to have occurred several million years ago. In this extended study we test if the reticulate phylogenetic hypothesis for Geinae can be maintained when additional sequences are added. The results are compatible with the hypothesis and strengthen it in minor aspects. We also attempt to identify extant members of one of the inferred ancestral lineages of the allopolyploids. On the basis of previous molecular phylogenies, one specific group has been proposed to be the descendants of this taxon. However, none of the additional paralogues belong to this ancestral lineage. A general method is proposed for converting a bifurcating gene tree, with multiple paralogous low-copy gene sequences from allopolyploid taxa, into a reticulate species tree.  相似文献   

14.
The construction and interpretation of gene trees is fundamental in molecular systematics. If the gene is defined in a historical (coalescent) sense, there can be multiple gene trees within the single contiguous set of nucleotides, and attempts to construct a single tree for such a sequence must deal with homoplasy created by conflict among divergent histories. On a larger scale, incongruence is expected among gene tree topologies at different loci of individuals within sexually reproducing species, and it has been suggested that this discordance can be used to delimit species. A practical concern for such topological methods is that polymorphisms may be maintained through numerous cladogenic events; this polymorphism problem is less of a concern for nontopological approaches to species delimitation using molecular data. Although a central theoretical concern in molecular systematics is discordance between a given gene tree and the true "species tree," the primary empirical problem faced in reconstructing taxic phylogeny is incongruence among the trees inferred from different sequences. Linkage relationships limit character independence and thus have important implications for handling multiple data sets in phylogenetic analysis, particularly at the species level, where incongruence among different historically associated loci is expected. Gene trees can also be reconstructed for loci that influence phenotypic characters, but there is at best a tenuous relationship between phenotypic homoplasy and homoplasy in such gene trees. Nevertheless, expression patterns and orthology relationships of genes involved in the expression of phenotypes can in theory provide criteria for homology assessment of morphological characters.  相似文献   

15.
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.  相似文献   

16.
BEST implements a Bayesian hierarchical model to jointly estimate gene trees and the species tree from multilocus sequences. It provides a new option for estimating species phylogenies within the popular Bayesian phylogenetic program MrBayes. The technique of simulated annealing is adopted along with Metropolis coupling as performed in MrBayes to improve the convergence rate of the Markov Chain Monte Carlo algorithm. AVAILABILITY: http://www.stat.osu.edu/~dkp/BEST.  相似文献   

17.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

18.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

19.
MOTIVATION: Inferring species phylogenies with a history of gene losses and duplications is a challenging and an important task in computational biology. This problem can be solved by duplication-loss models in which the primary step is to reconcile a rooted gene tree with a rooted species tree. Most modern methods of phylogenetic reconstruction (from sequences) produce unrooted gene trees. This limitation leads to the problem of transforming unrooted gene tree into a rooted tree, and then reconciling rooted trees. The main questions are 'What about biological interpretation of choosing rooting?', 'Can we find efficiently the optimal rootings?', 'Is the optimal rooting unique?'. RESULTS: In this paper we present a model of reconciling unrooted gene tree with a rooted species tree, which is based on a concept of choosing rooting which has minimal reconciliation cost. Our analysis leads to the surprising property that all the minimal rootings have identical distributions of gene duplications and gene losses in the species tree. It implies, in our opinion, that the concept of an optimal rooting is very robust, and thus biologically meaningful. Also, it has nice computational properties. We present a linear time and space algorithm for computing optimal rooting(s). This algorithm was used in two different ways to reconstruct the optimal species phylogeny of five known yeast genomes from approximately 4700 gene trees. Moreover, we determined locations (history) of all gene duplications and gene losses in the final species tree. It is interesting to notice that the top five species trees are the same for both methods. AVAILABILITY: Software and documentation are freely available from http://bioputer.mimuw.edu.pl/~gorecki/urec  相似文献   

20.
Recent computational advances provide novel opportunities to infer species trees based on multiple independent loci. Thus, single gene trees no longer need suffice as proxies for species phylogenies. Several methods have been developed to deal with the challenges posed by incomplete and stochastic lineage sorting. In this study, we employed four Bayesian methods to infer the phylogeny of a clade of 11 recently diverged oriole species within the genus Icterus. We obtained well-resolved and mostly congruent phylogenies using a set of seven unlinked nuclear intron loci and sampling multiple individuals per species. Most notably, Bayesian concordance analysis generally agreed well with concatenation; the two methods agreed fully on eight of nine nodes. The coalescent-based method BEAST further supported six of these eight nodes. The fourth method used, BEST, failed to converge despite exhaustive efforts to optimize the tree search. Overall, the results obtained by new species tree methods and concatenation generally corroborate our findings from previous analyses and data sets. However, we found striking disagreement between mitochondrial and nuclear DNA involving relationships within the northern oriole group. Our results highlight the danger of reliance on mtDNA alone for phylogenetic inference. We demonstrate that in spite of low variability and incomplete lineage sorting, multiple nuclear loci can produce largely congruent phylogenies based on multiple species tree methods, even for very closely-related species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号