首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.

Background

Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.

Methodology

We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.

Conclusions

We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.  相似文献   

2.
MOTIVATION: Through the most extensive phylogenomic analysis carried out to date, complete genomes of 11 eukaryotic species have been examined in order to find the homologous of more than 25,000 amino acid sequences. These sequences correspond to the exons of more than 3000 genes and were used as presence/absence characters to test one of the most controversial hypotheses concerning animal evolution, namely the Ecdysozoa hypothesis. Distance, maximum parsimony and Bayesian methods of phylogenetic reconstruction were used to test the hypothesis. RESULTS: The reliability of the ecdysozoa, grouping arthropods and nematodes in a single clade was unequivocally rejected in all the consensus trees. The Coelomata clade, grouping arthropods and chordates, was supported by the highest statistical confidence in all the reconstructions. The study of the dependence of the genomes' tree accuracy on the number of exons used, demonstrated that an unexpectedly larger number of characters are necessary to obtain robust phylogenies. Previous studies supporting ecdysozoa, could not guarantee an accurate phylogeny because the number of characters used was clearly below the minimum required.  相似文献   

3.
The phylogenetic position of turtles within the vertebrate tree of life remains controversial. Conflicting conclusions from different studies are likely a consequence of systematic error in the tree construction process, rather than random error from small amounts of data. Using genomic data, we evaluate the phylogenetic position of turtles with both conventional concatenated data analysis and a “genes as characters” approach. Two datasets were constructed, one with seven species (human, opossum, zebra finch, chicken, green anole, Chinese pond turtle, and western clawed frog) and 4584 orthologous genes, and the second with four additional species (soft-shelled turtle, Nile crocodile, royal python, and tuatara) but only 1638 genes. Our concatenated data analysis strongly supported turtle as the sister-group to archosaurs (the archosaur hypothesis), similar to several recent genomic data based studies using similar methods. When using genes as characters and gene trees as character-state trees with equal weighting for each gene, however, our parsimony analysis suggested that turtles are possibly sister-group to diapsids, archosaurs, or lepidosaurs. None of these resolutions were strongly supported by bootstraps. Furthermore, our incongruence analysis clearly demonstrated that there is a large amount of inconsistency among genes and most of the conflict relates to the placement of turtles. We conclude that the uncertain placement of turtles is a reflection of the true state of nature. Concatenated data analysis of large and heterogeneous datasets likely suffers from systematic error and over-estimates of confidence as a consequence of a large number of characters. Using genes as characters offers an alternative for phylogenomic analysis. It has potential to reduce systematic error, such as data heterogeneity and long-branch attraction, and it can also avoid problems associated with computation time and model selection. Finally, treating genes as characters provides a convenient method for examining gene and genome evolution.  相似文献   

4.
Hess J  Goldman N 《PloS one》2011,6(8):e22783
Phylogenomic approaches to the resolution of inter-species relationships have become well established in recent years. Often these involve concatenation of many orthologous genes found in the respective genomes followed by analysis using standard phylogenetic models. Genome-scale data promise increased resolution by minimising sampling error, yet are associated with well-known but often inappropriately addressed caveats arising through data heterogeneity and model violation. These can lead to the reconstruction of highly-supported but incorrect topologies. With the aim of obtaining a species tree for 18 species within the ascomycetous yeasts, we have investigated the use of appropriate evolutionary models to address inter-gene heterogeneities and the scalability and validity of supermatrix analysis as the phylogenetic problem becomes more difficult and the number of genes analysed approaches truly phylogenomic dimensions. We have extended a widely-known early phylogenomic study of yeasts by adding additional species to increase diversity and augmenting the number of genes under analysis. We have investigated sophisticated maximum likelihood analyses, considering not only a concatenated version of the data but also partitioned models where each gene constitutes a partition and parameters are free to vary between the different partitions (thereby accounting for variation in the evolutionary processes at different loci). We find considerable increases in likelihood using these complex models, arguing for the need for appropriate models when analyzing phylogenomic data. Using these methods, we were able to reconstruct a well-supported tree for 18 ascomycetous yeasts spanning about 250 million years of evolution.  相似文献   

5.
6.
Despite great progress over the past decade, some portions of the mammalian tree of life remain unresolved. In particular, relationships among the different orders included within the supraordinal group Laurasiatheria have been proven difficult to determine, and have received poor support in the vast majority of phylogenomic studies of mammalian systematics. We estimated interordinal relationships within Laurasiatheria using sequence data from 3733 protein-coding genes. Our study included data from from 11 placental mammals, corresponding to five of the six orders of Laurasiatheria, plus five outgroup species. Ingroup and outgroup species were chosen to maximize the number single-copy ortholog genes for which sequence data was available for all species in our study. Phylogenetic analyses of the concatenated dataset using maximum likelihood and Bayesian methods resulted on an identical and well supported topology in all alignment strategies compared. Our analyses provide high support for the sister relationship between Chiroptera and Cetartiodactyla and also provide support for placing Perissodactyla as sister to Carnivora. We obtained maximal estimates of bootstrap support (100%) and posterior probability (1.00) for all nodes within Laurasiatheria. Our study provides a further demonstration of the utility of very large and conserved genomic dataset to clarify our understanding of the evolutionary relationships among mammals.  相似文献   

7.
The first phylogenomic analysis of the antlions is presented, based on 325 genes captured using anchored hybrid enrichment. A concatenated matrix including 207 species of Myrmeleontoidea (170 Myrmeleontidae) was analysed under maximum likelihood and Bayesian inference. Both Myrmeleontidae (antlions) and Ascalaphidae (owlflies) were recovered as paraphyletic with respect to each other. The majority of the subfamilies traditionally assigned to both Myrmeleontidae and Ascalaphidae were also recovered as paraphyletic. By contrast, all traditional antlion tribes were recovered as monophyletic (except Brachynemurini), but most subtribes were found to be paraphyletic. When compared with the traditional classification of Myrmeleontidae, our results do not support the current taxonomy. Therefore, based on our phylogenomic results, we propose a new classification for the antlions, which synonymizes Ascalaphidae with Myrmeleontidae and divides the family into four subfamilies (Ascalaphinae, Myrmeleontinae, Dendroleontinae and Nemoleontinae) and 17 tribes. We also highlight the most pressing issues in antlion systematics and indicate taxa that need further taxonomic and phylogenetic attention. Finally, we present a comprehensive table placing all extant genera of antlions and owlflies in our new proposed classification, including details on the number of species, distribution and notes on the likely monophyly of each genus.  相似文献   

8.
Supermatrices are often characterized by a large amount of missing data. One possible approach to minimize such missing data is to create composite taxa. These taxa are formed by sampling sequences from different species in order to obtain a composite sequence that includes a maximum number of genes. Although this approach is increasingly used, its accuracy has rarely been tested and some authors prefer to analyze incomplete supermatrices by coding unavailable sequences as missing. To further validate the composite taxon approach, it was applied to complete mitochondrial matrices of 102 mammal species representing 93 families with varying amount of missing data. On average, missing data and composite matrices showed similar congruence to model trees obtained from the complete sequence matrix. As expected, the level of congruence to model trees decreased as missing data increased, with both approaches. We conclude that the composite taxon approach is worth considering in a phylogenomic context since it performs well and reduces computing time when compared to missing data matrices.  相似文献   

9.
Morphological data supports monotremes as the sister group of Theria (extant marsupials + eutherians), but phylogenetic analyses of 12 mitochondrial protein-coding genes have strongly supported the grouping of monotremes with marsupials: the Marsupionta hypothesis. Various nuclear genes tend to support Theria, but a comprehensive study of long concatenated sequences and broad taxon sampling is lacking. We therefore determined sequences from six nuclear genes and obtained additional sequences from the databases to create two large and independent nuclear data sets. One (data set I) emphasized taxon sampling and comprised five genes, with a concatenated length of 2,793 bp, from 21 species (two monotremes, six marsupials, nine placentals, and four outgroups). The other (data set II) emphasized gene sampling and comprised eight genes and three proteins, with a concatenated length of 10,773 bp or 3,669 amino acids, from five taxa (a monotreme, a marsupial, a rodent, human, and chicken). Both data sets were analyzed by parsimony, minimum evolution, maximum likelihood, and Bayesian methods using various models and data partitions. Data set I gave bootstrap support values for Theria between 55% and 100%, while support for Marsupionta was at most 12.3%. Taking base compositional bias into account generally increased the support for Theria. Data set II exclusively supported Theria, with the highest possible values and significantly rejected Marsupionta. Independent phylogenetic evidence in support of Theria was obtained from two single amino acid deletions and one insertion, while no supporting insertions and deletions were found for Marsupionta. On the basis of our data sets, the time of divergence between Monotremata and Theria was estimated at 231-217 MYA and between Marsupialia and Eutheria at 193-186 MYA. The morphological evidence for a basal position of Monotremata, well separated from Theria, is thus fully supported by the available molecular data from nuclear genes.  相似文献   

10.
Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia) because the placement of Scandentia within this clade is controversial. The difficulty in resolving this issue is due to the short time spans between the early divergences of Euarchontoglires, which may cause incongruent gene trees. The conflict in the data can be depicted by network analyses and the contentious relationships are best reconstructed by coalescent-based analyses. This method is expected to be superior to analyses of concatenated data in reconstructing a species tree from numerous gene trees. The total concatenated dataset used to study the relationships in this group comprises 5,875 protein-coding genes (9,799,170 nucleotides) from all orders except Dermoptera (flying lemurs). Reconstruction of the species tree from 1,006 gene trees using coalescent models placed Scandentia as sister group to the primates, which is in agreement with maximum likelihood analyses of concatenated nucleotide sequence data. Additionally, both analytical approaches favoured the Tarsier to be sister taxon to Anthropoidea, thus belonging to the Haplorrhine clade. When divergence times are short such as in radiations over periods of a few million years, even genome scale analyses struggle to resolve phylogenetic relationships. On these short branches processes such as incomplete lineage sorting and possibly hybridization occur and make it preferable to base phylogenomic analyses on coalescent methods.  相似文献   

11.
Remipedes are a small and enigmatic group of crustaceans, first described only 30 years ago. Analyses of both morphological and molecular data have recently suggested a close relationship between Remipedia and Hexapoda. If true, the remipedes occupy an important position in pancrustacean evolution and may be pivotal for understanding the evolutionary history of crustaceans and hexapods. However, it is important to test this hypothesis using new data and new types of analytical approaches. Here, we assembled a phylogenomic data set of 131 taxa, incorporating newly generated 454 expressed sequence tag (EST) data from six species of crustaceans, representing five lineages (Remipedia, Laevicaudata, Spinicaudata, Ostracoda, and Malacostraca). This data set includes all crustacean species for which EST data are available (46 species), and our largest alignment encompasses 866,479 amino acid positions and 1,886 genes. A series of phylogenomic analyses was performed to evaluate pancrustacean relationships. We significantly improved the quality of our data for predicting putative orthologous genes and for generating data subsets by matrix reduction procedures, thereby improving the signal to noise ratio in the data. Eight different data sets were constructed, representing various combinations of orthologous genes, data subsets, and taxa. Our results demonstrate that the different ways to compile an initial data set of core orthologs and the selection of data subsets by matrix reduction can have marked effects on the reconstructed phylogenetic trees. Nonetheless, all eight data sets strongly support Pancrustacea with Remipedia as the sister group to Hexapoda. This is the first time that a sister group relationship of Remipedia and Hexapoda has been inferred using a comprehensive phylogenomic data set that is based on EST data. We also show that selecting data subsets with increased overall signal can help to identify and prevent artifacts in phylogenetic analyses.  相似文献   

12.

Background  

To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available.  相似文献   

13.
14.
The extant mammalian groups Monotremata, Marsupialia and Placentalia are, according to the 'Theria' hypothesis, traditionally classified into two subclasses. The subclass Prototheria includes the monotremes and subclass Theria marsupials and placental mammals. Based on some morphological and molecular data, an alternative proposition, the Marsupionta hypothesis, favours a sister group relationship between monotremes and marsupials to the exclusion of placental mammals. Phylogenetic analyses of single genes and even multiple gene alignments have not yet been able to conclusively resolve this basal mammalian divergence. We have examined this problem using one data set composed of expressed sequence tags (EST) and another containing 1 510 509 nucleotide (nt) sites from 1358 inferred cDNA genomic sequences. All analyses of the concatenated sequences unambiguously supported the Theria hypothesis. The Marsupionta hypothesis was rejected with high statistical confidence from both data sets. In spite of the strong support for Theria, a non-negligible number of single genes supported either of the two alternative hypotheses. The divergence between monotremes and therian mammals was estimated to have taken place 168–178 Mya, a dating compatible with the fossil record. Considering the long common evolutionary branch of therians, it is surprising that sequence data from many thousand amino acid sites were needed to conclusively resolve their relationship to monotremes. This finding draws attention to other mammalian divergences that have been taken as unequivocally settled based on much smaller alignments. EST data provide a comprehensive random sample of protein coding sequences and an economic way to produce large amounts of data for phylogenetic analysis of species for which genomic sequences are not yet available.  相似文献   

15.
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome‐based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined‐data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole‐genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data‐type mega‐matrix). Phylogenetic analysis of this mega‐matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination.
© The Willi Hennig Society 2010.  相似文献   

16.
A total of 22 genes from the genome of Salinibacter ruber strain M31 were selected in order to study the phylogenetic position of this species based on protein alignments. The selection of the genes was based on their essential function for the organism, dispersion within the genome, and sufficient informative length of the final alignment. For each gene, an individual phylogenetic analysis was performed and compared with the resulting tree based on the concatenation of the 22 genes, which rendered a single alignment of 10,757 homologous positions. In addition to the manually chosen genes, an automatically selected data set of 74 orthologous genes was used to reconstruct a tree based on 17,149 homologous positions. Although single genes supported different topologies, the tree topology of both concatenated data sets was shown to be identical to that previously observed based on small subunit (SSU) rRNA gene analysis, in which S. ruber was placed together with Bacteroidetes. In both concatenated data sets the bootstrap was very high, but an analysis with a gradually lower number of genes indicated that the bootstrap was greatly reduced with less than 12 genes. The results indicate that tree reconstructions based on concatenating large numbers of protein coding genes seem to produce tree topologies with similar resolution to that of the single 16S rRNA gene trees. For classification purposes, 16S rRNA gene analysis may remain as the most pragmatic approach to infer genealogic relationships.  相似文献   

17.
The increasing availability of complete genome sequences and the development of new, faster methods for phylogenetic reconstruction allow the exploration of the set of evolutionary trees for each gene in the genome of any species. This has led to the development of new phylogenomic methods. Here, we have compared different phylogenetic and phylogenomic methods in the analysis of the monophyletic origin of insect endosymbionts from the gamma-Proteobacteria, a hotly debated issue with several recent, conflicting reports. We have obtained the phylogenetic tree for each of the 579 identified protein-coding genes in the genome of the primary endosymbiont of carpenter ants, Blochmannia floridanus, after determining their presumed orthologs in 20 additional Proteobacteria genomes. A reference phylogeny reflecting the monophyletic origin of insect endosymbionts was further confirmed with different approaches, which led us to consider it as the presumed species tree. Remarkably, only 43 individual genes produced exactly the same topology as this presumed species tree. Most discrepancies between this tree and those obtained from individual genes or by concatenation of different genes were due to the grouping of Xanthomonadales with beta-Proteobacteria and not to uncertainties over the monophyly of insect endosymbionts. As previously noted, operational genes were more prone to reject the presumed species tree than those included in information-processing categories, but caution should be exerted when selecting genes for phylogenetic inference on the basis of their functional category assignment. We have obtained strong evidence in support of the monophyletic origin of gamma-Proteobacteria insect endosymbionts by a combination of phylogenetic and phylogenomic methods. In our analysis, the use of concatenated genes has shown to be a valuable tool for analyzing primary phylogenetic signals coded in the genomes. Nevertheless, other phylogenomic methods such as supertree approaches were useful in revealing alternative phylogenetic signals and should be included in comprehensive phylogenomic studies.  相似文献   

18.
19.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

20.
Right whales (genus: Eubalaena) are among the most endangered mammals, yet their taxonomy and phylogeny have been questioned. A phylogenetic hypothesis based on mitochondrial DNA (mtDNA) variation recently prompted a taxonomic revision, increasing the number of right whale species to three. We critically evaluated this hypothesis using sequence data from 13 nuclear DNA (nuDNA) loci as well as the mtDNA control region. Fixed diagnostic characters among the nuclear markers strongly support the hypothesis of three genetically distinct species, despite lack of any diagnostic morphological characters. A phylogenetics analysis of all data produced a strict consensus cladogram with strong support at nodes that define each right whale species as well as relationships among species. Results showed very little conflict among the individual partitions as well as congruence between the mtDNA and nuDNA datasets. These data clearly demonstrate the strength of using numerous independent genetic markers during a phylogenetics analysis of closely related species. In evaluating phylogenetic support contributed by individual loci, 11 of the 14 loci provided support for at least one of the nodes of interest to this study. Only a single marker (mtDNA control region) provided support at all four nodes. A study using any single nuclear marker would have failed to support the proposed phylogeny, and a strong phylogenetic hypothesis was only revealed by the simultaneous analysis of many nuclear loci. In addition, nu DNA and mtDNA data provided complementary levels of support at nodes of different evolutionary depth indicating that the combined use of mtDNA and nuDNA data is both practical and desirable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号