首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.  相似文献   

2.
MOTIVATION: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. RESULTS: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are approximately O(n) for a gene tree of sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. AVAILABILITY: http://www.genetics.wustl.edu/eddy/forester.  相似文献   

3.

Background  

The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.  相似文献   

4.
A morphological data set and three sources of data from the chloroplast genome (two genes and a restriction site survey) were used to reconstruct the phylogenetic history of the pickerelweed family Pontederiaceae. The chloroplast data converged towards a single tree, presumably the true chloroplast phylogeny of the family. Unrooted trees estimated from each of the three chloroplast data sets were identical or extremely similar in shape to each other and mostly robustly supported. There was no evidence of significant heterogeneity among the data sets, and the few topological differences seen among unrooted trees from each chloroplast data set are probably artifacts of sampling error on short branches. Despite well-documented differences in rates of evolution for different characters in individual data sets, equally weighted parsimony permits accurate reconstructions of chloroplast relationships in Pontederiaceae. A separate morphology-based data set yielded trees that were very different from the chloroplast trees. Although there was substantial support from the morphological evidence for several major clades supported by chloroplast trees, most of the conflicting phylogenetic structure on the morphology trees was not robust. Nonetheless, several statistical tests of incongruence indicate significant heterogeneity between molecules and morphology. The source of this apparent incongruence appears to be a low ratio of phylogenetic signal to noise in the morphological data.  相似文献   

5.
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.  相似文献   

6.
The use of diverse data sets in phylogenetic studies aiming for understanding evolutionary histories of species can yield conflicting inference. Phylogenetic conflicts observed in animal and plant systems have often been explained by hybridization, incomplete lineage sorting (ILS), or horizontal gene transfer. Here, we used target enrichment data, species tree, and species network approaches to infer the backbone phylogeny of the family Caprifoliaceae, while distinguishing among sources of incongruence. We used 713 nuclear loci and 46 complete plastome sequence data from 43 samples representing 38 species from all major clades to reconstruct the phylogeny of the family using concatenation and coalescence approaches. We found significant nuclear gene tree conflict as well as cytonuclear discordance. Additionally, coalescent simulations and phylogenetic species network analyses suggested putative ancient hybridization among subfamilies of Caprifoliaceae, which seems to be the main source of phylogenetic discordance. Ancestral state reconstruction of six morphological characters revealed some homoplasy for each character examined. By dating the branching events, we inferred the origin of Caprifoliaceae at approximately 66.65 Ma in the late Cretaceous. By integrating evidence from molecular phylogeny, divergence times, and morphology, we here recognize Zabelioideae as a new subfamily in Caprifoliaceae. This work shows the necessity of using a combination of multiple approaches to identify the sources of gene tree discordance. Our study also highlights the importance of using data from both nuclear and plastid genomes to reconstruct deep and shallow phylogenies of plants.  相似文献   

7.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

8.
Background

Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods.

Results

In this article we show the first solution to the open problem of episode clustering where the input gene family trees are unrooted. In particular, by using theoretical properties of unrooted reconciliation, we show an efficient algorithm that reduces this problem into the episode clustering problems defined for rooted trees. We show theoretical properties of the reduction algorithm and evaluation of empirical datasets.

Conclusions

We provided algorithms and tools that were successfully applied to several empirical datasets. In particular, our comparative study shows that we can improve known results on genomic duplication inference from real datasets.

  相似文献   

9.
Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.  相似文献   

10.
GeneTRACE-reconstruction of gene content of ancestral species   总被引:4,自引:0,他引:4  
While current computational methods allow the reconstruction of individual ancestral protein sequences, reconstruction of complete gene content of ancestral species is not yet an established task. In this paper, we describe GENETRACE, an efficient linear-time algorithm that allows the reconstruction of evolutionary history of individual protein families as well as the complete gene content of ancestral species. The performance of the method was validated with a simulated evolution program called SimulEv. Our results indicate that given a set of correct phylogenetic profiles and a correct species tree, ancestral gene content can be reconstructed with sensitivity and selectivity of more than 90%. SimulEv simulations were also used to evaluate performance of the reconstruction of gene content-based phylogenetic trees, suggesting that these trees may be accurate at the terminal branches but suffer from long branch attraction near the root of the tree.  相似文献   

11.
Opsin gene sequences were first reported in the 1980s. The goal of that research was to test the hypothesis that human opsins were members of a single gene family and that variation in human color vision was mediated by mutations in these genes. While the new data supported both hypotheses, the greatest contribution of this work was, arguably, that it provided the data necessary for PCR-based surveys in a diversity of other species. Such studies, and recent whole genome sequencing projects, have uncovered exceptionally large opsin gene repertoires in ray-finned fishes (taxon, Actinopterygii). Guppies and zebrafish, for example, have 10 visual opsin genes each. Here we review the duplication and divergence events that have generated these gene collections. Phylogenetic analyses revealed that large opsin gene repertories in fish have been generated by gene duplication and divergence events that span the age of the ray-finned fishes. Data from whole genome sequencing projects and from large-insert clones show that tandem duplication is the primary mode of opsin gene family expansion in fishes. In some instances gene conversion between tandem duplicates has obscured evolutionary relationships among genes and generated unique key-site haplotypes. We mapped amino acid substitutions at so-called key-sites onto phylogenies and this exposed many examples of convergence. We found that dN/dS values were higher on the branches of our trees that followed gene duplication than on branches that followed speciation events, suggesting that duplication relaxes constraints on opsin sequence evolution. Though the focus of the review is opsin sequence evolution, we also note that there are few clear connections between opsin gene repertoires and variation in spectral environment, morphological traits, or life history traits.  相似文献   

12.
Hahn MW 《Genome biology》2007,8(7):R141-9

Background

Comparative genomic studies are revealing frequent gains and losses of whole genes via duplication and pseudogenization. One commonly used method for inferring the number and timing of gene gains and losses reconciles the gene tree for each gene family with the species tree of the taxa considered. Recent studies using this approach have found a large number of ancient duplications and recent losses among vertebrate genomes.

Results

I show that tree reconciliation methods are biased when the inferred gene tree is not correct. This bias places duplicates towards the root of the tree and losses towards the tips of the tree. I demonstrate that this bias is present when tree reconciliation is conducted on both multiple mammal and Drosophila genomes, and that lower bootstrap cut-off values on gene trees lead to more extreme bias. I also suggest a method for dealing with reconciliation bias, although this method only corrects for the number of gene gains on some branches of the species tree.

Conclusion

Based on the results presented, it is likely that most tree reconciliation analyses show biases, unless the gene trees used are exceptionally well-resolved and well-supported. These results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.  相似文献   

13.
Background and Aims Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species.Methods Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices.Key Results The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species.Conclusions In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree because the total number of variable sites was much lower than in the entire plastid genome. The geographical clustering of the individuals against a background of overall low sequence divergence could indicate transfer of plastid genomes due to hybridization and introgression following secondary contact.  相似文献   

14.
DNA sequences of the plastid gene psaB were completed for 182 species of Orchidaceae (representing 150 different genera) and outgroup families in Asparagales. These data were analyzed using parsimony, and resulting trees were compared to a rbcL phylogeny of Orchidaceae for the same set of taxa after an additional 30 new rbcL sequences were added to a previously published matrix. The psaB tree topology is similar to the rbcL tree, although the psaB data contain less homoplasy and provide greater bootstrap support than rbcL alone. In combination, the two-gene tree recovers the five monophyletic subfamilial clades currently recognized in Orchidaceae, but fails to resolve the positions of Cypripedioideae and Vanilloideae. These new topologies help to clarify some of the anomalous results recovered when rbcL is analyzed alone. Both genes appear to be absent from the plastid genome of several achlorophyllous orchids, but are present in the form of presumably non-functional pseudogenes in Cyrtosia. This study is the first to document the utility of psaB sequences for phylogenetic studies of plants below the family level.  相似文献   

15.
A decade of progress in plant molecular phylogenetics   总被引:8,自引:0,他引:8  
Over the past decade, botanists have produced several thousand phylogenetic analyses based on molecular data, with particular emphasis on sequencing rbcL, the plastid gene encoding the large subunit of Rubisco (ribulose bisphosphate carboxylase). Because phylogenetic trees retrieved from the three plant genomes (plastid, nuclear and mitochondrial) have been highly congruent, the ‘Angiosperm Phylogeny Group’ has used these DNA-based phylogenetic trees to reclassify all families of flowering plants. However, in addition to taxonomy, these major phylogenetic efforts have also helped to define strategies to reconstruct the ‘tree of life’, and have revealed the size of the ancestral plant genome, uncovered potential candidates for the ancestral flower, identified molecular living fossils, and linked the rate of neutral substitutions with species diversity. With an increased interest in DNA sequencing programmes in non-model organisms, the next decade will hopefully see these phylogenetic findings integrated into new genetic syntheses, from genomes to taxa.  相似文献   

16.
Extant gars represent the remaining members of a formerly diverse assemblage of ancient ray-finned fishes and have been the subject of multiple phylogenetic analyses using morphological data. Here, we present the first hypothesis of phylogenetic relationships among living gar species based on molecular data, through the examination of gene tree heterogeneity and coalescent species tree analyses of a portion of one mitochondrial (COI) and seven nuclear (ENC1, myh6, plagl2, S7 ribosomal protein intron 1, sreb2, tbr1, and zic1) genes. Individual gene trees displayed varying degrees of resolution with regards to species-level relationships, and the gene trees inferred from COI and the S7 intron were the only two that were completely resolved. Coalescent species tree analyses of nuclear genes resulted in a well-resolved and strongly supported phylogenetic tree of living gar species, for which Bayesian posterior node support was further improved by the inclusion of the mitochondrial gene. Species-level relationships among gars inferred from our molecular data set were highly congruent with previously published morphological phylogenies, with the exception of the placement of two species, Lepisosteus osseus and L. platostomus. Re-examination of the character coding used by previous authors provided partial resolution of this topological discordance, resulting in broad concordance in the phylogenies inferred from individual genes, the coalescent species tree analysis, and morphology. The completely resolved phylogeny inferred from the molecular data set with strong Bayesian posterior support at all nodes provided insights into the potential for introgressive hybridization and patterns of allopatric speciation in the evolutionary history of living gars, as well as a solid foundation for future examinations of functional diversification and evolutionary stasis in a "living fossil" lineage.  相似文献   

17.
We considered the contribution of two mitochondrial and two nuclear data sets for the phylogenetic reconstruction of 22 species of seed beetles in the genus Curculio (Coleoptera: Cuculionidae). A phylogenetic tree from representatives found on various hosts was inferred from a combined data set of mitochondrial DNA cytochrome oxidase subunit I, mitochondrial cytochrome b, nuclear elongation factor 1alpha, and nuclear phosphoglycerate mutase, used for the first time as a molecular marker. Separate parsimony analyses of each data set showed that individual gene trees were mainly congruent and often complementary in the support of clades but the analysis was complicated by failure of PCR amplification of nuclear genes for many taxa and hence missing data entries. When the four gene partitions were combined in a simultaneous analysis despite the missing data, this increased the resolution and taxonomic coverage compared to the individual source trees. Alternative approaches of combining the information via supertree methodology produced a comparatively less resolved tree, and hence seem inferior to combining data matrices even in cases where numerous taxa are missing. The molecular data suggest a classification of the European species into two species groups that are in accordance with morphological characteristics but the data do no support any of the previously recognised American species groups.  相似文献   

18.
Evolution of the nuclear receptor gene superfamily.   总被引:54,自引:6,他引:48       下载免费PDF全文
V Laudet  C Hnni  J Coll  F Catzeflis    D Sthelin 《The EMBO journal》1992,11(3):1003-1013
  相似文献   

19.
Phylogenies based on mitochondrial DNA (mtDNA) may represent gene trees that may not be congruent with the equivalent species tree. One solution to this problem is to include additional, independent loci from the nuclear genome. Sequence data from the seventh intron of the beta-fibrinogen gene were generated for 25 specimens of vipers, including 8 nominal species of the Trimeresurus complex of Asian pit vipers. Phylogenetic trees were generated using maximum-parsimony and maximum-likelihood methods. The taxonomic level at which the intron provided significant phylogenetic information was examined and the trees were compared to those produced from previously obtained mtDNA cytochrome b sequences. A variety of different approaches (separate analyses, conditional data combination, and consensus) were used in an attempt to provide a sound organismal phylogeny based on both nuclear and mtDNA data sets. We discuss the implications for the gene tree-species tree debate and its particular relevance to medically important organisms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号