首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large scale gene duplication is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important role in the evolution of the maize, yeast, and vertebrate genomes. The use of evolutionary trees to analyze the history of gene duplication and estimate duplication times provides a powerful tool for studying this process. Many studies in the molecular evolution literature have used this approach on small data sets, using analyses performed by hand. The rapid growth of genetic sequence data will soon allow similar studies on a genomic scale, but such studies will be limited unless the analysis can be automated. Even existing data sets admit alternative hypotheses that would be too tedious to consider without automation. In this paper, we describe a program called NOTUNG that facilitates large scale analysis, using both rooted and unrooted trees. When tested on trees analyzed in the literature, NOTUNG consistently yielded results that agree with the assessments in the original publications. Thus, NOTUNG provides a basic building block for inferring duplication dates from gene trees automatically and can also be used as an exploratory analysis tool for evaluating alternative hypotheses.  相似文献   

2.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

3.
Reconstructing the duplication history of tandemly repeated genes   总被引:4,自引:0,他引:4  
We present a novel approach to deal with the problem of reconstructing the duplication history of tandemly repeated genes that are supposed to have arisen from unequal recombination. We first describe the mathematical model of evolution by tandem duplication and introduce duplication histories and duplication trees. We then provide a simple recursive algorithm which determines whether or not a given rooted phylogeny can be a duplication history and another algorithm that simulates the unequal recombination process and searches for the best duplication trees according to the maximum parsimony criterion. We use real data sets of human immunoglobulins and T-cell receptors to validate our methods and algorithms. Identity between most parsimonious duplication trees and most parsimonious phylogenies for the same data, combined with the agreement with additional knowledge about the sequences, such as the presence of polymorphisms, shows strong evidence that our reconstruction procedure provides good insights into the duplication histories of these loci.  相似文献   

4.
Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.  相似文献   

5.

Background  

The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.  相似文献   

6.
Comparative genomics has revealed the ubiquity of gene and genome duplication and subsequent gene loss. In the case of gene duplication and subsequent loss, gene trees can differ from species trees, thus frequent gene duplication poses a challenge for reconstruction of species relationships. Here I address the case of multi-gene sets of putative orthologs that include some unrecognized paralogs due to ancestral gene duplication, and ask how outgroups should best be chosen to reduce the degree of non-species tree (NST) signal. Consideration of expected internal branch lengths supports several conclusions: (i) when a single outgroup is used, the degree of NST signal arising from gene duplication is either independent of outgroup choice, or is minimized by use of a maximally closely related post-duplication (MCRPD) outgroup; (ii) when two outgroups are used, NST signal is minimized by using one MCRPD outgroup, while the position of the second outgroup is of lesser importance; and (iii) when two outgroups are used, the ability to detect gene trees that are inconsistent with known aspects of the species tree is maximized by use of one MCRPD, and is either independent of the position of the second outgroup, or is maximized for a more distantly related second outgroup. Overall, these results generalize the utility of closely-related outgroups for phylogenetic analysis.  相似文献   

7.
Conserved genes have found their way into the mainstream of molecular systematics. Many of these genes are members of multigene families. A difficulty with using single genes of multigene families for phylogenetic inference is that genes from one species may be paralogous to those from another taxon. We focus attention on this problem using heat shock 70 (HSP70) genes. Using polymerase chain reaction techniques with genomic DNA, we isolated and sequenced 123 distinct sequences from 12 species of sharks. Phylogenetic analysis indicated that the sequences cluster with constituitively expressed cytoplasmic heat shock-like genes. Three highly divergent gene clades were sampled. A number of similar sequences were sampled from each species within each distinct gene clade. Comparison of published species trees with an HSP70 gene tree inferred using Bayesian phylogenetic analysis revealed several cases of gene duplication and differential sorting of gene lineages within this group of sharks. Gene tree parsimony based on the objective criteria of duplication and losses showed that previously published hypotheses of species relationships and two novel hypothesis based on Bayesian phylogenetics were concordant with the history of HSP70 gene duplication and loss. By contrast, two published hypotheses based on morphological data were not significantly different from the null hypothesis of a random association between species relatedness and the HSP70 gene tree. These results suggest that gene tree parsimony using data from multigene families can be used for inferring species relationships or testing published alternative hypotheses. More importantly, the results suggest that systematic studies relying on phylogenetic inferences from HSP70 genes may by plagued by unrecognized paralogy of sampled genes. Our results underscore the distinction between gene and species trees and highlight an underappreciated source of discordance between gene trees and organismal phylogeny, i.e., unrecognized paralogy of sampled genes.  相似文献   

8.
Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes; the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.  相似文献   

9.

Background

The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.

Results

We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.

Conclusions

In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.
  相似文献   

10.
Gene trees are often assumed to be equivalent to species trees, but processes such as incomplete lineage sorting can generate incongruence among gene topologies and analyzing multilocus data in concatenated matrices can be prone to systematic errors. Accordingly, a variety of new methods have been developed to estimate species trees using multilocus data sets. Here, we apply some of these methods to reconstruct the phylogeny of Buarremon and near relatives, a group in which phylogenetic analyses of mitochondrial DNA sequences produced results that were inconsistent with relationships implied by a taxonomy based on variation in external phenotype. Gene genealogies obtained for seven loci (one mitochondrial, six nuclear) were varied, with some supporting and some rejecting the monophyly of Buarremon. Overall, our species-tree analyses tended to support a monophyletic Buarremon, but due to lack of congruence between methodologies, resolution of the phylogeny of this group remains uncertain. More generally, our study indicates that the number of individuals sampled can have an important effect on phylogenetic reconstruction, that the use of seven markers does not guarantee obtaining a strongly-supported species tree, and that methods for species-tree reconstruction can produce different results using the same data; these are important considerations for researchers using these new phylogenetic approaches in other systems.  相似文献   

11.
The molecular phylogeny of parabasalids has mainly been inferred from small subunit (SSU) rRNA sequences and has conflicted substantially with systematics based on morphological and ultrastructural characters. This raises the important question, how congruent are protein and SSU rRNA trees? New sequences from seven diverse parabasalids (six trichomonads and one hypermastigid) were added to data sets of glyceraldehyde-3-phosphate dehydrogenase (GAPDH), enolase, alpha-tubulin and beta-tubulin and used to construct phylogenetic trees. The GAPDH tree was well resolved and identical in topology to the SSU rRNA tree. This both validates the rRNA tree and suggests that GAPDH should be a valuable tool in further phylogenetic studies of parabasalids. In particular, the GAPDH tree confirmed the polyphyly of Monocercomonadidae and Trichomonadidae and the basal position of Trichonympha agilis among parabasalids. Moreover, GAPDH strengthened the hypothesis of secondary loss of cytoskeletal structures in Monocercomonadidae such as Monocercomonas and Hypotrichomonas. In contrast to GAPDH, the enolase and both tubulin trees are poorly resolved and rather uninformative about parabasalian phylogeny, although two of these trees also identify T. agilis as representing the basal-most lineage of parabasalids. Although all four protein genes show multiple gene duplications (for 3-6 of the seven taxa examined), most duplications appear to be relatively recent (i.e., species-specific) and not a problem for phylogeny reconstruction. Only for enolase are there more ancient duplications that may confound phylogenetic interpretation.  相似文献   

12.
MOTIVATION: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. RESULTS: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are approximately O(n) for a gene tree of sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. AVAILABILITY: http://www.genetics.wustl.edu/eddy/forester.  相似文献   

13.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.  相似文献   

14.
Haeckel created much of our current vocabulary in evolutionary biology, such as the term phylogeny, which is currently used to designate trees. Assuming that Haeckel gave the same meaning to this term, one often reproduces Haeckel's trees as the first illustrations of phylogenetic trees. A detailed analysis of Haeckel's own evolutionary vocabulary and theory revealed that Haeckel's trees were genealogical trees and that Haeckel's phylogeny was a morphological concept. However, phylogeny was actually the core of Haeckel's tree reconstruction, and understanding the exact meaning Haeckel gave to phylogeny is crucial to understanding the information Haeckel wanted to convey in his famous trees. Haeckel's phylogeny was a linear series of main morphological stages along the line of descent of a given species. The phylogeny of a single species would provide a trunk around which lateral branches were added as mere ornament; the phylogeny selected for drawing a tree of a given group was considered the most complete line of progress from lower to higher forms of this group, such as the phylogeny of Man for the genealogical tree of Vertebrates. Haeckel's phylogeny was mainly inspired by the idea of the scala naturae, or scale of being. Therefore, Haeckel's genealogical trees, which were only branched on the surface, mainly represented the old idea of scale of being. Even though Haeckel decided to draw genealogical trees after reading On the Origin of Species and was called the German Darwin, he did not draw Darwinian branching diagrams. Although Haeckel always saw Lamarck, Goethe, and Darwin as the three fathers of the theory of evolution, he was mainly influenced by Lamarck and Goethe in his approach to tree reconstruction.  相似文献   

15.
Order Diplobathrida is a major clade of camerate crinoids spanning the Ordovician–Mississippian, yet phylogenetic relationships have only been inferred for Ordovician taxa. This has hampered efforts to construct a comprehensive tree of life for crinoids and develop a classification scheme that adequately reflects diplobathrid evolutionary history. Here, I apply maximum parsimony and Bayesian phylogenetic approaches to the fossil record of diplobathrids to infer the largest tree of fossil crinoids to date, with over 100 genera included. Recovered trees provide a framework for evaluating the current classification of diplobathrids. Notably, previous suborder divisions are not supported, and superfamily divisions will require significant modification. Although numerous revisions are required for families, most can be retained through reassignment of genera. In addition, recovered trees were used to produce phylogeny‐based estimates of diplobathrid lineage diversity. By accounting for ghost lineages, phylogeny‐based richness estimates offer greater insight into diversification and extinction dynamics than traditional taxonomy‐based approaches alone and provide a detailed summary of the ~150 million‐year evolutionary history of Diplobathrida. This study constitutes a major step toward producing a phylogeny of the Crinoidea and documenting crinoid diversity dynamics. In addition, it will serve as a framework for subsequent phylogeny‐based investigations of macroevolutionary questions.  相似文献   

16.

Motivation

Species tree estimation from gene trees can be complicated by gene duplication and loss, and “gene tree parsimony” (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species).

Results

We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the “standard” calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.
  相似文献   

17.
A “gene tree” is the phylogeny of alleles or haplotypes for any specified stretch of DNA. Gene trees are components of population trees or species trees; their analysis entails a shift in perspective from many of the familiar models and concepts of population genetics, which typically deal with frequencies of phylogenetically unordered alleles. Molecular surveys of haplotype diversity in mitochondrial DNA (mtDNA) have provided the first extensive empirical data suitable for estimation of gene trees on a microevolutionary (intraspecific) scale. The relationship between phylogeny and geographic distribution constitutes the phylogeographic pattern for any species. Observed phylogeographic trees can be interpreted in terms of historical demography by comparison to predictions derived from models of gene lineage sorting, such as inbreeding theory and branching-process theory. Results of such analyses for more than 20 vertebrate species strongly suggest that the demographies of populations have been remarkably dynamic and unsettled over space and recent evolutionary time. This conclusion is consistent with ecological observations documenting dramatic population-size fluctuations and range shifts in many contemporary species. By adding an historical perspective to population biology, the gene-lineage approach can help forge links between the disciplines of phylogenetic systematics (and macroevolutionary study) and population genetics (microevolution). Preliminary extensions of the “gene tree” methodology to haplotypes of nuclear genes (such as Adh in Drosophila melanogaster) demonstrate that the phylogenetic perspective can also help to illuminate molecular-genetic processes (such as recombination or gene conversion), as well as contribute to knowledge of the origin, age, and molecular basis of particular adaptations.  相似文献   

18.

Background

Several methods have been developed for the accurate reconstruction of gene trees. Some of them use reconciliation with a species tree to correct, a posteriori, errors in gene trees inferred from multiple sequence alignments. Unfortunately the best fit to sequence information can be lost during this process.

Results

We describe GATC, a new algorithm for reconstructing a binary gene tree with branch length. GATC returns optimal solutions according to a measure combining both tree likelihood (according to sequence evolution) and a reconciliation score under the Duplication-Transfer-Loss (DTL) model. It can either be used to construct a gene tree from scratch or to correct trees infered by existing reconstruction method, making it highly flexible to various input data types. The method is based on a genetic algorithm acting on a population of trees at each step. It substantially increases the efficiency of the phylogeny space exploration, reducing the risk of falling into local minima, at a reasonable computational time. We have applied GATC to a dataset of simulated cyanobacterial phylogenies, as well as to an empirical dataset of three reference gene families, and showed that it is able to improve gene tree reconstructions compared with current state-of-the-art algorithms.

Conclusion

The proposed algorithm is able to accurately reconstruct gene trees and is highly suitable for the construction of reference trees. Our results also highlight the efficiency of multi-objective optimization algorithms for the gene tree reconstruction problem. GATC is available on Github at: https://github.com/UdeM-LBIT/GATC.
  相似文献   

19.
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.  相似文献   

20.
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号