首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.  相似文献   

2.
Fourfold paralogy regions in the human genome have been considered historical remnants of whole-genome duplication events predicted to have occurred early in vertebrate evolution. Taking advantage of the well-annotated and high-quality human genomic sequence map as well as the ever-increasing accessibility of large-scale genomic sequence data from a diverse range of animal species, we investigated the prediction that the ancestral vertebrate genome was shaped by two rapid rounds of whole-genome duplication within a period of 10 million years. Both the map self-comparison approach and a phylogenetic analysis revealed that gene families identified as tetralogous on human chromosomes 1/2/8/20 arose by small-scale duplication events that occurred at widely different time points in animal evolution. Furthermore, the data discount the likelihood that tree topologies of the form ((A,B)(C,D)) are best explained by the octoploidy hypothesis. We instead propose that such symmetrical tree patterns are also consistent with local duplications and rearrangement events.  相似文献   

3.
Tyrosine kinase (TK) proteins play a central role in cellular behavior and development of animals. The expansion of this superfamily is regarded as a key event in the evolution of the complex signaling pathways and gene networks of metazoans and is a prominent example of how shuffling of protein modules may generate molecular novelties. Using the intron/exon structure within the TK domain (TK intron code) as a complementary tool for the assignment of orthology and paralogy, we identified and studied the 118 TK proteins of the amphioxus Branchiostoma floridae genome to elucidate TK gene family evolution in metazoans and chordates in particular. Unlike all characterized metazoans to date, amphioxus has members of all known widespread TK families, with not a single loss. Putting amphioxus TKs in an evolutionary context, including new data from the cnidarian Nematostella vectensis, the echinoderm Strongylocentrotus purpuratus, and the ascidian Ciona intestinalis, we suggest new evolutionary histories for different TK families and draw a new global picture of gene loss/gain in the different phyla. Surprisingly, our survey also detected an unprecedented expansion of a group of closely related TK families, including TIE, FGFR, PDGFR, and RET, due most probably to massive gene duplication and exon shuffling. Based on their highly similar intron/exon structure at the TK domain, we suggest that this group of TK families constitute a superfamily of TK proteins, which we termed EXpanding TK, after their seemingly unique propensity to gene duplication and exon shuffling, not only in amphioxus but also across all metazoan groups. Due to this extreme tendency to both retention and expansion of TK genes, amphioxus harbors the richest and most diverse TK repertoire among all metazoans studied so far, retaining most of the gene complement of its ancestors, but having evolved its own repertoire of genetic novelties.  相似文献   

4.
Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.  相似文献   

5.
Conserved genes have found their way into the mainstream of molecular systematics. Many of these genes are members of multigene families. A difficulty with using single genes of multigene families for phylogenetic inference is that genes from one species may be paralogous to those from another taxon. We focus attention on this problem using heat shock 70 (HSP70) genes. Using polymerase chain reaction techniques with genomic DNA, we isolated and sequenced 123 distinct sequences from 12 species of sharks. Phylogenetic analysis indicated that the sequences cluster with constituitively expressed cytoplasmic heat shock-like genes. Three highly divergent gene clades were sampled. A number of similar sequences were sampled from each species within each distinct gene clade. Comparison of published species trees with an HSP70 gene tree inferred using Bayesian phylogenetic analysis revealed several cases of gene duplication and differential sorting of gene lineages within this group of sharks. Gene tree parsimony based on the objective criteria of duplication and losses showed that previously published hypotheses of species relationships and two novel hypothesis based on Bayesian phylogenetics were concordant with the history of HSP70 gene duplication and loss. By contrast, two published hypotheses based on morphological data were not significantly different from the null hypothesis of a random association between species relatedness and the HSP70 gene tree. These results suggest that gene tree parsimony using data from multigene families can be used for inferring species relationships or testing published alternative hypotheses. More importantly, the results suggest that systematic studies relying on phylogenetic inferences from HSP70 genes may by plagued by unrecognized paralogy of sampled genes. Our results underscore the distinction between gene and species trees and highlight an underappreciated source of discordance between gene trees and organismal phylogeny, i.e., unrecognized paralogy of sampled genes.  相似文献   

6.
Some plant microRNAs have been shown to be de novo generated by inverted duplication from their target genes. Subsequent duplication events potentially generate multigene microRNA families. Within this article we provide supportive evidence for the inverted duplication model of plant microRNA evolution. First, we report that the precursors of four Arabidopsis thaliana microRNA families, miR157, miR158, miR405 and miR447 share nearly identical nucleotide sequences throughout the whole miRNA precursor between the family members. The extent and degree of sequence conservation is suggestive of recent evolutionary duplication events. Furthermore we found that sequence similarities are not restricted to the transcribed part but extend into the promoter regions. Thus the duplication event most probably included the promoter regions as well. Conserved elements in upstream regions of miR163 and its targets were also detected. This implies that the inverted duplication of target genes, at least in certain cases, had included the promoters of the target genes. Sequence conservation within promoters of miRNA families as well as between miRNA and its potential progenitor gene can be exploited for understanding the regulation of microRNA genes.  相似文献   

7.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

8.
It has been proposed that two events of duplication of the entire genome occurred early in vertebrate history (2R hypothesis). Several phylogenetic studies with a few gene families (mostly Hox genes and proteins from the MHC) have tried to confirm these polyploidization events. However, data from a single locus cannot explain the evolutionary history of a complete genome. To study this 2R hypothesis, we have taken advantage of the phylogenetic position of the lamprey to study the history of gene duplications in vertebrates. We selected most gene families that contain several paralogous genes in vertebrates and for which lamprey genes and an out-group are known in databases. In addition, we isolated members of the nuclear receptor superfamily in lamprey. Hagfish genes were also analyzed and found to confirm the lamprey gene analysis. Consistent with the 2R hypothesis, the phylogenetic analysis of 33 selected gene families, dispersed through the whole genome, revealed that one period of gene duplication arose before the lamprey-gnathostome split and this was followed by a second period of gene duplication after the lamprey-gnathostome split. Nevertheless, our analysis suggests that numerous gene losses and other gene-genome duplications occurred during the evolution of the vertebrate genomes. Thus, the complexity of all the paralogy groups present in vertebrates should be explained by the contribution of genome duplications (2R hypothesis), extra gene duplications, and gene losses.  相似文献   

9.
Gene fusion and fission events are key mechanisms in the evolution of gene architecture, whose effects are visible in protein architecture when they occur in coding sequences. Until now, the detection of fusion and fission events has been performed at the level of protein sequences with a post facto removal of supernumerary links due to paralogy, and often did not include looking for events defined only in single genomes. We propose a method for the detection of these events, defined on groups of paralogs to compensate for the gene redundancy of eukaryotic genomes, and apply it to the proteomes of 12 fungal species. We collected an inventory of 1,680 elementary fusion and fission events. In half the cases, both composite and element genes are found in the same species. Per-species counts of events correlate with the species genome size, suggesting a random mechanism of occurrence. Some biological functions of the genes involved in fusion and fission events are slightly over- or under-represented. As already noted in previous studies, the genes involved in an event tend to belong to the same functional category. We inferred the position of each event in the evolution tree of the 12 fungal species. The event localization counts for all the segments of the tree provide a metric that depicts the “recombinational” phylogeny among fungi. A possible interpretation of this metric as distance in adaptation space is proposed.  相似文献   

10.
Zinc finger genes in mammalian genomes are frequently found to occur in clusters with cluster members appearing in a tandem array on the chromosome. It has been suggested that in situ gene duplication events are primarily responsible for the evolution of such clusters. The problem of inferring the series of duplication events responsible for producing clustered families is different from the standard phylogeny problem. In this paper, we study this inference problem using a graph called duplication model that captures the series of duplication events while taking into account the observed order of the genes on the chromosome. We provide algorithms to reconstruct a duplication model for a given data set. We use our method to hypothesize the series of duplication events that may have produced the ZNF45 family that appears on human chromosome 19.  相似文献   

11.
12.
In this paper we have analyzed 49 vertebrate gene families that were generated in the early stage of vertebrates and/or shortly before the origin of vertebrates, each of which consists of three or four member genes. We have dated the first (T1) and second (T2) gene duplications of 26 gene families with 3 member genes. The means of T1 (594 mya) and T2 (488 mya) are largely consistent to a well-cited version of two-round (2R) genome duplication theory. Moreover, in most cases, the time interval between two successive gene duplications is large enough that the fate of duplicate genes generated by the first gene duplication was likely to be determined before the second one took place. However, the phylogenetic pattern of 23 gene families with 4 members is complicated; only 5 of them are predicted by 2R model, but 11 families require an additional gene (or genome) duplication. For the rest (7 families), at least one gene duplication event had occurred before the divergence between vertebrate and Drosophila, indicating a possible misleading of the 4:1 rule (member gene ratio between vertebrates and invertebrates). Our results show that Ohno's 2R conjecture is valid as a working hypothesis for providing a most parsimonious explanation. Although for some gene families, additional gene duplication is needed, the credibility of the third genome duplication (3R) remains to be investigated. Received: 13 December 1999 / Accepted: 7 April 2000  相似文献   

13.
Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.  相似文献   

14.
A model and algorithm are proposed to infer the evolution of a gene family described by the corresponding gene tree, with respect to the species evolution described by the corresponding species tree. The model describes the evolution using the new concept of a nested tree. The algorithm performance is illustrated by the example of several orthologous protein groups. The considered evolutionary events are speciation, gene duplication and loss, and horizontal gene transfer retaining the original gene copy. The transfer event with the loss of the original gene copy is considered as a combination of gene transfer and loss. The model maps each evolutionary event onto the species phylogeny.  相似文献   

15.
MOTIVATION: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.  相似文献   

16.
Aromatic amino acid hydroxylase (AAAH) genes and insulin-like genes form part of an extensive paralogy region shared by human chromosomes 11 and 12, thought to have arisen by tetraploidy in early vertebrate evolution. Cloning of a complementary DNA (cDNA) for an amphioxus (Branchiostoma floridae) hydroxylase gene (AmphiPAH) allowed us to investigate the ancestry of the human chromosome 11/12 paralogy region. Molecular phylogenetic evidence reveals that AmphiPAH is orthologous to vertebrate phenylalanine (PAH) genes; the implication is that all three vertebrate AAAH genes arose early in metazoan evolution, predating vertebrates. In contrast, our phylogenetic analysis of amphioxus and vertebrate insulin-related gene sequences is consistent with duplication of these genes during early chordate ancestry. The conclusion is that two tightly linked gene families on human chromosomes 11 and 12 were not duplicated coincidentally. We rationalize this paradox by invoking gene loss in the AAAH gene family and conclude that paralogous genes shared by paralogous chromosomes need not have identical evolutionary histories.  相似文献   

17.
When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With incomplete lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we present the following results: 1) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in linear time for any arbitrary gene tree and species tree. 2) The deep coalescence cost is always not less than the gene duplication cost in the reconciliation of an arbitrary gene tree and a species tree. 3) Species tree inference by minimizing deep coalescence events is NP-hard.  相似文献   

18.
Natural history and functional divergence of protein tyrosine kinases   总被引:3,自引:0,他引:3  
Gu J  Gu X 《Gene》2003,317(1-2):49-57
Cellular signaling is important for many biological processes including growth, differentiation, adhesion, motility and apoptosis. The protein tyrosine kinase (PTK) supergene family is the key mediator in cellular signaling in metazoans, directly associated with a variety of human diseases. All PTKs contain a highly conserved catalytic kinase domain, in spite of variable multi-domain structures. Within each PTK gene family, members exhibit functional divergence in substrate-specificity or temporal/tissue-specific expression, although their primary function is conserved. After conducting phylogenetic analysis on major PTK gene families, we found that the expanding of each PTK family was likely caused by gene or genome duplication event(s) that occurred before the emergence of teleosts but after the vertebrate-amphioxus split. We further investigated the evolutionary pattern of functional divergence after gene duplication in those gene families. Our results show that site-specific shifted evolutionary rate (altered functional constraint) is a common pattern in PTK gene family evolution.  相似文献   

19.
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号