首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next‐generation sequencing techniques such as RAD‐seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD‐seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next‐generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa.  相似文献   

2.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

3.
Combining data sets with different phylogenetic histories   总被引:1,自引:0,他引:1  
The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I propose a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories) until a majority of unlinked data sets support one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis for recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters, high homoplasy, or both) and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but gives an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, the separate, consensus, and combined analyses may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic, in that doing so may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.  相似文献   

4.
The statistical estimation of phylogenies is always associated with uncertainty, and accommodating this uncertainty is an important component of modern phylogenetic comparative analysis. The birth–death polytomy resolver is a method of accounting for phylogenetic uncertainty that places missing (unsampled) taxa onto phylogenetic trees, using taxonomic information alone. Recent studies of birds and mammals have used this approach to generate pseudoposterior distributions of phylogenetic trees that are complete at the species level, even in the absence of genetic data for many species. Many researchers have used these distributions of phylogenies for downstream evolutionary analyses that involve inferences on phenotypic evolution, geography, and community assembly. I demonstrate that the use of phylogenies constructed in this fashion is inappropriate for many questions involving traits. Because species are placed on trees at random with respect to trait values, the birth–death polytomy resolver breaks down natural patterns of trait phylogenetic structure. Inferences based on these trees are predictably and often drastically biased in a direction that depends on the underlying (true) pattern of phylogenetic structure in traits. I illustrate the severity of the phenomenon for both continuous and discrete traits using examples from a global bird phylogeny.  相似文献   

5.
Evolutionary relationships among cyst nematodes based on predicted ß-tubulin amino acid and DNA sequence data were compared with phylogenies inferred from ribosomal DNA (ITS1, 5.8S gene, ITS2). The ß-tubulin amino acid data were highly conserved and not useful for phylogenetic inference at the taxonomic level of genus and species. Phylogenetic trees based on ß-tubulin DNA sequence data were better resolved, but the relationships at lower taxonomic levels could not be inferred with confidence. Sequences from single species often appeared in more than one monophyletic clade, indicating the presence of ß-tubulin paralogs (confirmed by Southern blot analysis). For a subset of taxa, good congruence between the two data sets was revealed by the presence of the same putative ß-tubulin gene paralogs in monophyletic groups on the rDNA tree, corroborating the taxon relationships inferred from ribosomal DNA data.  相似文献   

6.
ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.  相似文献   

7.
Systematists and comparative biologists commonly want to make statements about relationships among taxa that have never been collectively included in any single phylogenetic analysis. Construction of phylogenetic 'supertrees' provides one solution. Supertrees are estimates of phylogeny assembled from sets of smaller estimates (source trees) sharing some but not necessarily all their taxa in common. If certain conditions are met, supertrees can retain all or most of the information from the source trees and also make novel statements about relationships of taxa that do not co-occur on any one source tree. Supertrees have commonly been constructed using subjective and informal approaches, but several explicit approaches have recently been proposed.  相似文献   

8.
Phylogenetic comparative methods play a critical role in our understanding of the adaptive origin of primate behaviors. To incorporate evolutionary history directly into comparative behavioral research, behavioral ecologists rely on strong, well-resolved phylogenetic trees. Phylogenies provide the framework on which behaviors can be compared and homologies can be distinguished from similarities due to convergent or parallel evolution. Phylogenetic reconstructions are also of critical importance when inferring the ancestral state of behavioral patterns and when suggesting the evolutionary changes that behavior has undergone. Improvements in genome sequencing technologies have increased the amount of data available to researchers. Recently, several primate phylogenetic studies have used multiple loci to produce robust phylogenetic trees that include hundreds of primate species. These trees are now commonly used in comparative analyses and there is a perception that we have a complete picture of the primate tree. But how confident can we be in those phylogenies? And how reliable are comparative analyses based on such trees? Herein, we argue that even recent molecular phylogenies should be treated cautiously because they rely on many assumptions and have many shortcomings. Most phylogenetic studies do not model gene tree diversity and can produce misleading results, such as strong support for an incorrect species tree, especially in the case of rapid and recent radiations. We discuss implications that incorrect phylogenies can have for reconstructing the evolution of primate behaviors and we urge primatologists to be aware of the current limitations of phylogenetic reconstructions when applying phylogenetic comparative methods.  相似文献   

9.
Molecular phylogenies of figs and their pollinator wasps   总被引:6,自引:0,他引:6  
Abstract. We collected and analysed nucleotide sequence and protein electrophoretic data in order to estimate phylogenies of figs and fig-pollinating wasps at several taxonomic scales. The relatively conserved chloroplast gene coding rbCl allowed the estimation of the taxonomic position of Ficus relative to other genera within the Moraceae. Further, in conjunction with chloroplast tRNA spacer genes, rbcL sequences allowed the partial resolution of the phylogenetic associations of fig species from different parts of the world with representatives from all the recognized subgenera of Ficus . The phylogeny of the corresponding wasp species that pollinate most of those taxa was estimated using mitochondrial COI-COII and 12S ribosomal genes. At a fine scale, the phylogenies of species within two subgenera of figs growing in Panama ( Urostigma , and Pharmacosycea) were estimated by using protein electrophoretic data. The phylogeny of the corresponding pollinator wasp species was estimated using COII sequence data. Although we need to extend the taxa sampled and augment the molecular database, the host and pollinator phylogenies show a high degree of congruence and the results support the predominance of strict-sense co-evolution between figs and their pollinator wasps at both global and fine scales.  相似文献   

10.
Nye TM 《Systematic biology》2008,57(5):785-794
Phylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree." A meta-tree clusters phylogenies with similar topologies together in the same way that a phylogeny clusters species with similar DNA sequences. Leaf nodes on a meta-tree correspond to the original set of phylogenies given by some analysis, whereas interior nodes correspond to certain consensus topologies. The construction of meta-trees is motivated by analogy with construction of a most parsimonious tree for DNA data, but instead of using DNA letters, in a meta-tree the characters are partitions or splits of the set of taxa. An efficient algorithm for meta-tree construction is described that makes use of a known relationship between the majority consensus and parsimony in terms of gain and loss of splits. To illustrate these ideas meta-trees are constructed for two datasets: a set of gene trees for species of yeast and trees from a bootstrap analysis of a set of gene trees in ray-finned fish. A software tool for constructing meta-trees and comparing alternative phylogenies is available online, and the source code can be obtained from the author.  相似文献   

11.
The relative contribution of taxon number and gene number to accuracy in phylogenetic inference is a major issue in phylogenetics and of central importance to the choice of experimental strategies for the successful reconstruction of a broad sketch of the tree of life. Maximization of the number of taxa sampled is the strategy favored by most phylogeneticists, although its necessity remains the subject of debate. Vast increases in gene number are now possible due to advances in genomics, but large numbers of genes will be available for only modest numbers of taxa, raising the question of whether such genome-scale phylogenies will be robust to the addition of taxa. To examine the relative benefit of increasing taxon number or gene number to phylogenetic accuracy, we have developed an assay that utilizes the symmetric difference tree distance as a measure of phylogenetic accuracy. We have applied this assay to a genome-scale data matrix containing 106 genes from 14 yeast species. Our results show that increasing taxon number correlates with a slight decrease in phylogenetic accuracy. In contrast, increasing gene number has a significant positive effect on phylogenetic accuracy. Analyses of an additional taxon-rich data matrix from the same yeast clade show that taxon number does not have a significant effect on phylogenetic accuracy. The positive effect of gene number and the lack of effect of taxon number on phylogenetic accuracy are also corroborated by analyses of two data matrices from mammals and angiosperm plants, respectively. We conclude that, for typical data sets, the number of genes utilized may be a more important determinant of phylogenetic accuracy than taxon number.  相似文献   

12.
13.

Background

Species number, functional traits, and phylogenetic history all contribute to characterizing the biological diversity in plant communities. The phylogenetic component of diversity has been particularly difficult to quantify in species-rich tropical tree assemblages. The compilation of previously published (and often incomplete) data on evolutionary relationships of species into a composite phylogeny of the taxa in a forest, through such programs as Phylomatic, has proven useful in building community phylogenies although often of limited resolution. Recently, DNA barcodes have been used to construct a robust community phylogeny for nearly 300 tree species in a forest dynamics plot in Panama using a supermatrix method. In that study sequence data from three barcode loci were used to generate a well-resolved species-level phylogeny.

Methodology/Principal Findings

Here we expand upon this earlier investigation and present results on the use of a phylogenetic constraint tree to generate a community phylogeny for a diverse, tropical forest dynamics plot in Puerto Rico. This enhanced method of phylogenetic reconstruction insures the congruence of the barcode phylogeny with broadly accepted hypotheses on the phylogeny of flowering plants (i.e., APG III) regardless of the number and taxonomic breadth of the taxa sampled. We also compare maximum parsimony versus maximum likelihood estimates of community phylogenetic relationships as well as evaluate the effectiveness of one- versus two- versus three-gene barcodes in resolving community evolutionary history.

Conclusions/Significance

As first demonstrated in the Panamanian forest dynamics plot, the results for the Puerto Rican plot illustrate that highly resolved phylogenies derived from DNA barcode sequence data combined with a constraint tree based on APG III are particularly useful in comparative analysis of phylogenetic diversity and will enhance research on the interface between community ecology and evolution.  相似文献   

14.
Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes; the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.  相似文献   

15.
A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are chosen specifically to break up long branches. We explore this hypothesis by using empirical data from noctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes, elongation factor-1 alpha (EF-1 alpha) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. However, support levels were quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic signal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data from the two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and decreased support levels for deeper nodes. We appear to have inadvertently introduced new long branches, and breaking these up may require a yet larger taxon sample. Sampling additional characters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect of combining data from independent genes with collection of the same total number of characters from a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets. Support levels for combined data were at least as high as those for the bootstrap-augmented data set for DDC and were much higher than those for the augmented EF-1 alpha data set. This supports the view that in obtaining additional sequence data to solve a refractory systematic problem, it is prudent to take them from an independent gene.  相似文献   

16.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

17.
ABSTRACT: BACKGROUND: The evolutionary relationships of closely related species have long been of interest to biologists since these species experienced different evolutionary processes in a relatively short period of time. Comparison of phylogenies inferred from DNA sequences with differing inheritance patterns, such as mitochondrial, autosomal, and X and Y chromosomal loci, can provide more comprehensive inferences of the evolutionary histories of species. Gibbons, especially the genus Hylobates, are particularly intriguing as they consist of multiple closely related species which emerged rapidly and live in close geographic proximity. Our current understanding of relationships among Hylobates species is largely based on data from the maternally-inherited mitochondrial DNAs (mtDNAs). RESULTS: To infer the paternal histories of gibbon taxa, we sequenced multiple Y chromosomal loci from 26 gibbons representing 10 species. As expected, we find levels of sequence variation some five times lower than observed for the mitochondrial genome (mtgenome). Although our Y chromosome phylogenetic tree shows relatively low resolution compared to the mtgenome tree, our results are consistent with the monophyly of gibbon genera suggested by the mtgenome tree. In a comparison of the molecular dating of divergences and on the branching patterns of phylogeny trees between mtgenome and Y chromosome data, we found: 1) the inferred divergence estimates were more recent for the Y chromosome than for the mtgenome, 2) the species H. lar and H. pileatus are reciprocally monophyletic in the mtgenome phylogeny but a H. pileatus individual falls into the H. lar Y chromosome clade. CONCLUSIONS: Based on the ~6.4 kb of Y chromosomal DNA sequence data generated for each of the 26 individuals in this study, we provide molecular inferences on gibbon and particularly on Hylobates evolution complementary to those from mtDNA data. Overall, our results illustrate the utility of comparative studies of loci with different inheritance patterns for investigating potential sex specific processes on the evolutionary histories of closely related taxa, and emphasize the need for further sampling of gibbons of known provenance.  相似文献   

18.
Elongation factor 1 alpha (EF-1 alpha) is a highly conserved ubiquitous protein involved in translation that has been suggested to have desirable properties for phylogenetic inference. To examine the utility of EF-1 alpha as a phylogenetic marker for eukaryotes, we studied three properties of EF-1 alpha trees: congruency with other phyogenetic markers, the impact of species sampling, and the degree of substitutional saturation occurring between taxa. Our analyses indicate that the EF-1 alpha tree is congruent with some other molecular phylogenies in identifying both the deepest branches and some recent relationships in the eukaryotic line of descent. However, the topology of the intermediate portion of the EF-1 alpha tree, occupied by most of the protist lineages, differs for different phylogenetic methods, and bootstrap values for branches are low. Most problematic in this region is the failure of all phylogenetic methods to resolve the monophyly of two higher-order protistan taxa, the Ciliophora and the Alveolata. JACKMONO analyses indicated that the impact of species sampling on bootstrap support for most internal nodes of the eukaryotic EF-1 alpha tree is extreme. Furthermore, a comparison of observed versus inferred numbers of substitutions indicates that multiple overlapping substitutions have occurred, especially on the branch separating the Eukaryota from the Archaebacteria, suggesting that the rooting of the eukaryotic tree on the diplomonad lineage should be treated with caution. Overall, these results suggest that the phylogenies obtained from EF-1 alpha are congruent with other molecular phylogenies in recovering the monophyly of groups such as the Metazoa, Fungi, Magnoliophyta, and Euglenozoa. However, the interrelationships between these and other protist lineages are not well resolved. This lack of resolution may result from the combined effects of poor taxonomic sampling, relatively few informative positions, large numbers of overlapping substitutions that obscure phylogenetic signal, and lineage-specific rate increases in the EF-1 alpha data set. It is also consistent with the nearly simultaneous diversification of major eukaryotic lineages implied by the "big-bang" hypothesis of eukaryote evolution.  相似文献   

19.
How much horizontal gene transfer (HGT) between species influences bacterial phylogenomics is a controversial issue. This debate, however, lacks any quantitative assessment of the impact of HGT on phylogenies and of the ability of tree-building methods to cope with such events. I introduce a Markov model of genome evolution with HGT, accounting for the constraints on time -- an HGT event can only occur between concomitantly living species. This model is used to simulate multigene sequence data sets with or without HGT. The consequences of HGT on phylogenomic inference are analyzed and compared to other well-known phylogenetic artefacts. It is found that supertree methods are quite robust to HGT, keeping high levels of performance even when gene trees are largely incongruent with each other. Gene tree incongruence per se is not indicative of HGT. HGT, however, removes the (otherwise observed) positive relationship between sequence length and gene tree congruence to the estimated species tree. Surprisingly, when applied to a bacterial and a eukaryotic multigene data set, this criterion rejects the HGT hypothesis for the former, but not the latter data set.  相似文献   

20.
External morphological characters are the basis of our understanding of diversity and species relationships in many darter clades. The past decade has seen the publication of many studies utilizing mtDNA sequence data to investigate darter phylogenetics, but only recently have nuclear genes been used to investigate darter relationships. Despite a long tradition of use in darter systematics few studies have examined the phylogenetic utility of external morphological characters in estimating relationships among species in darter clades. We present DNA sequence data from the mitochondrial cytochrome b (cytb) gene, the nuclear encoded S7 intron 1, and discretely coded external morphological characters for all 20 species in the darter clade Nothonotus. Bayesian phylogenetic analyses result in phylogenies that are in broad agreement with previous studies. The cytb gene tree is well resolved, while the nuclear S7 gene tree lacks phylogenetic resolution, node support, and is characterized by a lack of reciprocal monophyly for many of the Nothonotus species. The phylogenies resulting from analysis of the morphological dataset lack resolution, but nodes present are found in the cytb and S7 gene trees. The highest resolution and node support is found in the Bayesian combined data phylogeny. Based on our results we propose continued exploration of the phylogenetic utility of external morphological characters in other darter clades. Given the extensive lack of reciprocal monophyly of species observed in the S7 gene tree we predict that nuclear gene sequences may have limited utility in intraspecific phylogeographic studies of Nothonotus darters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号