首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Comparing chromosomal gene order in two or more related species is an important approach to studying the forces that guide genome organization and evolution. Linked clusters of similar genes found in related genomes are often used to support arguments of evolutionary relatedness or functional selection. However, as the gene order and the gene complement of sister genomes diverge progressively due to large scale rearrangements, horizontal gene transfer, gene duplication and gene loss, it becomes increasingly difficult to determine whether observed similarities in local genomic structure are indeed remnants of common ancestral gene order, or are merely coincidences. A rigorous comparative genomics requires principled methods for distinguishing chance commonalities, within or between genomes, from genuine historical or functional relationships. In this paper, we construct tests for significant groupings against null hypotheses of random gene order, taking incomplete clusters, multiple genomes, and gene families into account. We consider both the significance of individual clusters of prespecified genes and the overall degree of clustering in whole genomes.  相似文献   

2.
There is a limited repertoire of domain families that are duplicated and combined in different ways to form the set of proteins in a genome. Proteins are gene products, and at the level of genes, duplication, recombination, fusion and fission are the processes that produce new genes. We attempt to gain an overview of these processes by studying the evolutionary units in proteins, domains, in the protein sequences of 40 genomes. The domain and superfamily definitions in the Structural Classification of Proteins Database are used, so that we can view all pairs of adjacent domains in genome sequences in terms of their superfamily combinations. We find 783 out of the 859 superfamilies in SCOP in these genomes, and the 783 families occur in 1307 pairwise combinations. Most families are observed in combination with one or two other families, while a few families are very versatile in their combinatorial behaviour; 209 families do not make combinations with other families. This type of pattern can be described as a scale-free network. We also study the N to C-terminal orientation of domain pairs and domain repeats. The phylogenetic distribution of domain combinations is surveyed, to establish the extent of common and kingdom-specific combinations. Of the kingdom-specific combinations, significantly more combinations consist of families present in all three kingdoms than of families present in one or two kingdoms. Hence, we are led to conclude that recombination between common families, as compared to the invention of new families and recombination among these, has also been a major contribution to the evolution of kingdom-specific and species-specific functions in organisms in all three kingdoms. Finally, we compare the set of the domain combinations in the genomes to those in the RCSB Protein Data Bank, and discuss the implications for structural genomics.  相似文献   

3.
The origin of novel gene functions through gene duplication, mutation, and natural selection represents one of the mechanisms by which organisms diversify and one of the possible paths leading to adaptation. Nonetheless, the extent, role, and consequences of duplications in the origins of ecological adaptations, especially in the context of species interactions, remain unclear. To explore the evolution of a gene family that is likely linked to species associations, we investigated the evolutionary history of the A-superfamily of conotoxin genes of predatory marine cone snails (Conus species). Members of this gene family are expressed in the venoms of Conus species and are presumably involved in predator-prey associations because of their utility in prey capture. We recovered sequences of this gene family from genomic DNA of four closely related species of Conus and reconstructed the evolutionary history of these genes. Our study is the first to directly recover conotoxin genes from Conus genomes to investigate the evolution of conotoxin gene families. Our results revealed a phenomenon of rapid and continuous gene turnover that is coupled with heightened rates of evolution. This continuous duplication pattern has not been observed previously, and the rate of gene turnover is at least two times higher than estimates from other multigene families. Conotoxin genes are among the most rapidly evolving protein-coding genes in metazoans, a phenomenon that may be facilitated by extensive gene duplications and have driven changes in conotoxin functions through neofunctionalization. Together these mechanisms led to dramatically divergent arrangements of A-superfamily conotoxin genes among closely related species of Conus. Our findings suggest that extensive and continuous gene duplication facilitates rapid evolution and drastic divergence in venom compositions among species, processes that may be associated with evolutionary responses to predator-prey interactions.  相似文献   

4.
5.
Linkage analyses in metazoan genomes suggest two ancestral arrays for the majority of homeobox genes. The related homeobox genes and chromosomal regions that are dispersed in extant species derived possibly from only two single common ancestor regions. One proposed ancestral array, designated as ANTP mega-array, contains most of the ANTP class homeobox genes; the second, named the contraHox super-paralogon, would consist of the classes PRD, POU, LIM, CUT, prospero, TALE and SIX. Here, we report the tight linkage of a POU class 6 gene to an anterior Hox-like gene in the hydrozoan Eleutheria dichotoma and discuss its possible significance for the evolution of homeobox genes. POU class 6 genes also seem to be ancestrally linked to the HoxC and A clusters in vertebrates, despite POU homeobox genes belonging to the contraHox paralogon. Hence, the much tighter linkage of a POU class 6 gene to an anterior Hox-like gene in a cnidarian is possibly the evolutionary echo of an ancestral genomic region from which most metazoan homeobox classes emerged.  相似文献   

6.

Background  

We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. However, this previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results.  相似文献   

7.
Shi G  Peng MC  Jiang T 《PloS one》2011,6(6):e20892
The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free.  相似文献   

8.
The high number of duplicated genes in plant genomes provides a potential template for gene conversion and unequal crossing-over. Within a gene family these two processes can render all members homogeneous or generate diversity by reassorting variants among paralogs. The latter is especially feasible in families where gene diversity confers a selective advantage and thus conversion events are likely to be retained. Consequently, the most complete record of gene conversion is expected to be most evident in gene families commonly subjected to positive selection. Here, we describe the extent and characteristics of gene conversion and unequal crossing-over in the coding and noncoding regions of nucleotide-binding site leucine-rich repeat (NBS-LRR), receptor-like kinases (RLK), and receptor-like proteins (RLP) in the plant Arabidopsis thaliana. Members of these three gene families are associated with disease resistance and their pathogen-recognition domain is a documented target of positive selection. Our bioinformatic approach to study the major family features that may influence gene conversion revealed that in these families there is a significant association between the occurrence of gene conversion and high levels of sequence similarity, close physical clustering, gene orientation, and recombination rate. We discuss these results in the context of the overlap between gene conversion and positive selection during the evolutionary expansion of the NBS-LRR, RLK, and RLP gene families.  相似文献   

9.
10.
Conservation of proximity of a pair of genes across multiple genomes generally indicates that their functions could be linked. Here, we present a systematic evaluation using 42 complete microbial genomes from 25 phylogenetic groups to test the reliability of this observation in predicting function for genes. We find a relationship between the number of phylogenetic groups in which a gene pair is proximate and the probability that the pair belongs to a common pathway. Our method produces 1586 links between ortholog families substantiated by observed proximity in genomes representing at least three phylogenetic groups. Of the pairs annotated in the KEGG database, 80% are in the same biological pathway in KEGG.  相似文献   

11.
Genome-level evolution of resistance genes in Arabidopsis thaliana   总被引:2,自引:0,他引:2  
Baumgarten A  Cannon S  Spangler R  May G 《Genetics》2003,165(1):309-319
Pathogen resistance genes represent some of the most abundant and diverse gene families found within plant genomes. However, evolutionary mechanisms generating resistance gene diversity at the genome level are not well understood. We used the complete Arabidopsis thaliana genome sequence to show that most duplication of individual NBS-LRR sequences occurs at close physical proximity to the parent sequence and generates clusters of closely related NBS-LRR sequences. Deploying the statistical strength of phylogeographic approaches and using chromosomal location as a proxy for spatial location, we show that apparent duplication of NBS-LRR genes to ectopic chromosomal locations is largely the consequence of segmental chromosome duplication and rearrangement, rather than the independent duplication of individual sequences. Although accounting for a smaller fraction of NBS-LRR gene duplications, segmental chromosome duplication and rearrangement events have a large impact on the evolution of this multigene family. Intergenic exchange is dramatically lower between NBS-LRR sequences located in different chromosome regions as compared to exchange between sequences within the same chromosome region. Consequently, once translocated to new chromosome locations, NBS-LRR gene copies have a greater likelihood of escaping intergenic exchange and adopting new functions than do gene copies located within the same chromosomal region. We propose an evolutionary model that relates processes of genome evolution to mechanisms of evolution for the large, diverse, NBS-LRR gene family.  相似文献   

12.
ZooDDD: a cross-species database for digital differential display analysis   总被引:2,自引:0,他引:2  
In this article, we combined EST information from the UniGene database and orthologous relationships from the Ensembl database to construct a ZooDDD database. The primary function of ZooDDD is to mine evolutionary conserved, highly expressed, tissue-specific orthologues in model animals. The candidate genes of interest derived from the ZooDDD database will provide biologists with a good step for comparing the expression, functions and evolution of animal genomes. AVAILABILITY: http://bio301.iis.sinica.edu.tw/~ZooDDDNew/main.php.  相似文献   

13.
14.
MOTIVATION: Gene duplications and losses (GDLs) are important events in genome evolution. They result in expansion or contraction of gene families, with a likely role in phenotypic evolution. As more genomes become available and their annotations are improved, software programs capable of rapidly and accurately identifying the content of ancestral genomes and the timings of GDLs become necessary to understand the unique evolution of each lineage. RESULTS: We report EvolMAP, a new algorithm and software that utilizes a species tree-based gene clustering method to join all-to-all symmetrical similarity comparisons of multiple gene sets in order to infer the gene composition of multiple ancestral genomes. The algorithm further uses Dollo parsimony-based comparison of the inferred ancestral genes to pinpoint the timings of GDLs onto evolutionary intervals marked by speciation events. Using EvolMAP, first we analyzed the expansion of four families of G-protein coupled receptors (GPCRs) within animal lineages. Additional to demonstrating the unique expansion tree for each family, results also show that the ancestral eumetazoan genome contained many fewer GPCRs than modern animals, and these families expanded through concurrent lineage-specific duplications. Second, we analyzed the history of GDLs in mammalian genomes by comparing seven proteomes. In agreement with previous studies, we report that the mammalian gene family sizes have changed drastically through their evolution. Interestingly, although we identified a potential source of duplication for 75% of the gained genes, remaining 25% did not have clear-cut sources, revealing thousands of genes that have likely gained their distinct sequence identities within the descent of mammals. AVAILABILITY: Query server, source code and executable are available at http://kosik-web.mcdb.ucsb.edu/evolmap/index.htm .  相似文献   

15.
We have identified conserved orthologs in completely sequenced genomes of double-strand DNA phages and arranged them into evolutionary families (phage orthologous groups [POGs]). Using this resource to analyze the collection of known phage genomes, we find that most orthologs are unique in their genomes (having no diverged duplicates [paralogs]), and while many proteins contain multiple domains, the evolutionary recombination of these domains does not appear to be a major factor in evolution of these orthologous families. The number of POGs has been rapidly increasing over the past decade, the percentage of genes in phage genomes that have orthologs in other phages has also been increasing, and the percentage of unknown "ORFans" is decreasing as more proteins find homologs and establish a family. Other properties of phage genomes have remained relatively stable over time, most notably the high fraction of genes that are never or only rarely observed in their cellular hosts. This suggests that despite the renowned ability of phages to transduce cellular genes, these cellular "hitchhiker" genes do not dominate the phage genomic landscape, and a large fraction of the genes in phage genomes maintain an evolutionary trajectory that is distinct from that of the host genes.  相似文献   

16.
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.  相似文献   

17.
The gene families encoding the immunoglobulin variable regions of heavy (VH) and light (VL) chains in vertebrates are composed of many genes. However, the gene number and the extent of diversity among VH and VL gene copies vary with species. To examine the causes of this variation and the evolutionary forces for these multigene families, we conducted a phylogenetic analysis of VH and VL genes from the species of amniotes. The results of our analysis showed that for each species, VH and VL genes have the same pattern of clustering in the trees, and, according to this clustering pattern, the species can be divided into two groups. In the first group of species (humans and mice), VH and VL genes were extensively intermingled with genes from other organisms; in the second group of species (chickens, rabbits, cattle, sheep, swine, and horses), the genes tended to form clusters within the same group of organisms. These results suggest that the VH and VL multigene families have evolved in the same fashion: they have undergone coordinated contraction and expansion of gene repertoires such that each group of organisms is characterized by a certain level of diversity of VH and VL genes. The extent of diversity among copies of VH and VL genes in each species is related to the mechanism of generation of antibody variety. In humans and mice, DNA rearrangement of immunoglobulin variable, diversity, and joining-segment genes is a main source of antibody diversity, whereas in chickens, rabbits, cattle, sheep, swine, and horses, somatic hypermutation and somatic gene conversion play important roles. The evolutionary pattern of VH and VL multigene families is consistent with the birth-and-death model of evolution, yet different levels of diversifying selection seem to operate in the VH and VL genes of these two groups of species.   相似文献   

18.
Short interspersed elements (SINEs) and long interspersed elements (LINEs) are transposable elements in eukaryotic genomes that mobilize through an RNA intermediate. Understanding their evolution is important because of their impact on the host genome. Most eukaryotic SINEs are ancestrally related to tRNA genes, although the typical tRNA cloverleaf structure is not apparent for most SINE consensus RNAs. Using a cladistic method where RNA structural components were coded as polarized and ordered multistate characters, we showed that related structural motifs are present in most SINE RNAs from mammals, fishes and plants, suggesting common selective constraints imposed at the SINE RNA structural level. Based on these results, we propose a general multistep model for the evolution of tRNA-related SINEs in eukaryotes.  相似文献   

19.
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1?h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein-protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.  相似文献   

20.
ABSTRACT: BACKGROUND: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. RESULTS: Its small genome of 15Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. CONCLUSION: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号