首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The gene neighborhood in prokaryotic genomes has been effectively utilized in inferring co-functional networks in various organisms. Previously, such genomic context information has been sought among completely assembled prokaryotic genomes. Here, we present a method to infer functional gene networks according to the gene neighborhood in metagenome contigs, which are incompletely assembled genomic fragments. Given that the amount of metagenome sequence data has now surpassed that of completely assembled prokaryotic genomes in the public domain, we expect benefits of inferring networks by the metagenome-based gene neighborhood. We generated co-functional networks for diverse taxonomical species using metagenomics contigs derived from the human microbiome and the ocean microbiome. We found that the networks based on the metagenome gene neighborhood outperformed those based on 1748 completely assembled prokaryotic genomes. We also demonstrated that the metagenome-based gene neighborhood could predict genes related to virulence-associated phenotypes in a bacterial pathogen, indicating that metagenome-based functional links could be sufficiently predictive for some phenotypes of medical importance. Owing to the exponential growth of metagenome sequence data in public repositories, metagenome-based inference of co-functional networks will facilitate understanding of gene functions and pathways in diverse species.  相似文献   

2.
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.  相似文献   

3.
Connected gene neighborhoods in prokaryotic genomes   总被引:12,自引:1,他引:11  
A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.  相似文献   

4.

Background

The influence of lateral gene transfer on gene origins and biology in eukaryotes is poorly understood compared with those of prokaryotes. A number of independent investigations focusing on specific genes, individual genomes, or specific functional categories from various eukaryotes have indicated that lateral gene transfer does indeed affect eukaryotic genomes. However, the lack of common methodology and criteria in these studies makes it difficult to assess the general importance and influence of lateral gene transfer on eukaryotic genome evolution.

Results

We used a phylogenomic approach to systematically investigate lateral gene transfer affecting the proteomes of thirteen, mainly parasitic, microbial eukaryotes, representing four of the six eukaryotic super-groups. All of the genomes investigated have been significantly affected by prokaryote-to-eukaryote lateral gene transfers, dramatically affecting the enzymes of core pathways, particularly amino acid and sugar metabolism, but also providing new genes of potential adaptive significance in the life of parasites. A broad range of prokaryotic donors is involved in such transfers, but there is clear and significant enrichment for bacterial groups that share the same habitats, including the human microbiota, as the parasites investigated.

Conclusions

Our data show that ecology and lifestyle strongly influence gene origins and opportunities for gene transfer and reveal that, although the outlines of the core eukaryotic metabolism are conserved among lineages, the genes making up those pathways can have very different origins in different eukaryotes. Thus, from the perspective of the effects of lateral gene transfer on individual gene ancestries in different lineages, eukaryotic metabolism appears to be chimeric.  相似文献   

5.
Yan Y  Moult J 《Proteins》2006,64(3):615-628
Operons are clusters of genes that are transcribed as a single message, and regulated by the same gene expression machinery. They are found primarily in prokaryotic genomes. Because genes in the same operon are likely to have related functions, identification of the operon structure is potentially useful for assigning gene function. We report the development and benchmarking of two different methods for detecting operons, based on an analysis of 42 fully sequenced prokaryotic organisms. The Gene Neighbor method (GNM) utilizes the relatively high conservation of gene order in operons, compared with genes in general. The Gene Gap Method (GGM) makes use of the relatively short gap between genes in operons compared with that otherwise found between adjacent genes. The methods have been benchmarked using KEGG pathway data and RegulonDB Escherichia coli operon data. With optimum parameters, the specificity of the GNM is 93% and the sensitivity is 70%. For the GGM, the specificity is 95% and the sensitivity is 68%. Together, the two methods have a sensitivity of 87.2%, while joint predictions have a sensitivity of 50% and a specificity of 98%. The methods are used to infer possible functions for some hypothetical genes in prokaryotic genomes. The methods have proven a useful addition to structure information in deriving protein function in a structural genomics project.  相似文献   

6.
Comparative whole-genome analyses have demonstrated that horizontal gene transfer (HGT) provides a significant contribution to prokaryotic genome innovation. The evolution of specific prokaryotes is therefore tightly linked to the environment in which they live and the communal pool of genes available within that environment. Here we use the term supergenome to describe the set of all genes that a prokaryotic ‘individual’ can draw on within a particular environmental setting. Conjugative plasmids can be considered particularly successful entities within the communal pool, which have enabled HGT over large taxonomic distances. These plasmids are collections of discrete regions of genes that function as ‘backbone modules’ to undertake different aspects of overall plasmid maintenance and propagation. Conjugative plasmids often carry suites of ‘accessory elements’ that contribute adaptive traits to the hosts and, potentially, other resident prokaryotes within specific environmental niches. Insight into the evolution of plasmid modules therefore contributes to our knowledge of gene dissemination and evolution within prokaryotic communities. This communal pool provides the prokaryotes with an important mechanistic framework for obtaining adaptability and functional diversity that alleviates the need for large genomes of specialized ‘private genes’.  相似文献   

7.
8.
Horizontal gene transfer (HGT) spreads genetic diversity by moving genes across species boundaries. By rapidly introducing newly evolved genes into existing genomes, HGT circumvents the slow step of ab initio gene creation and accelerates genome innovation. However, HGT can only affect organisms that readily exchange genes (exchange communities). In order to define exchange communities and understand the internal and external environmental factors that regulate HGT, we analyzed approximately 20,000 genes contained in eight free-living prokaryotic genomes. These analyses indicate that HGT occurs among organisms that share similar factors. The most significant are genome size, genome G/C composition, carbon utilization, and oxygen tolerance.  相似文献   

9.
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.  相似文献   

10.
Many genes are involved in mammalian cell apoptosis pathway. These apoptosis genes often contain characteristic functional domains, and can be classified into at least 15 functional groups, according to previous reports. Using an integrated bioinformatics platform for motif or domain search from three public mammalian proteomes (International Protein Index database for human, mouse, and rat), we systematically cataloged all of the proteins involved in mammalian apoptosis pathway. By localizing those proteins onto the genomes, we obtained a gene locus centric apoptosis gene catalog for human, mouse and rat.Further phylogenetic analysis showed that most of the apoptosis related gene loci are conserved among these three mammals. Interestingly, about one-third of apoptosis gene loci form gene clusters on mammal chromosomes, and exist in the three species, which indicated that mammalian apoptosis gene orders are also conserved. In addition, some tandem duplicated gene loci were revealed by comparing gene loci clusters in the three species. All data produced in this work were stored in a relational database and may be viewed at http://pcas.cbi.pku.edu.cn/database/apd.php.  相似文献   

11.
Wu H  Mao F  Olman V  Xu Y 《Nucleic acids research》2007,35(7):2125-2140
Functional classification of genes represents a fundamental problem to many biological studies. Most of the existing classification schemes are based on the concepts of homology and orthology, which were originally introduced to study gene evolution but might not be the most appropriate for gene function prediction, particularly at high resolution level. We have recently developed a scheme for hierarchical classification of genes (HCGs) in prokaryotes. In the HCG scheme, the functional equivalence relationships among genes are first assessed through a careful application of both sequence similarity and genomic neighborhood information; and genes are then classified into a hierarchical structure of clusters, where genes in each cluster are functionally equivalent at some resolution level, and the level of resolution goes higher as the clusters become increasingly smaller traveling down the hierarchy. The HCG scheme is validated through comparisons with the taxonomy of the prokaryotic genomes, Clusters of Orthologous Groups (COGs) of genes and the Pfam system. We have applied the HCG scheme to 224 complete prokaryotic genomes, and constructed a HCG database consisting of a forest of 5339 multi-level and 15 770 single-level trees of gene clusters covering approximately 93% of the genes of these 224 genomes. The validation results indicate that the HCG scheme not only captures the key features of the existing classification schemes but also provides a much richer organization of genes which can be used for functional prediction of genes at higher resolution and to help reveal evolutionary trace of the genes.  相似文献   

12.
Although it is well known that there is no long range colinearity in gene order in bacterial genomes, it is thought that there are several regions that are under strong structural constraints during evolution, in which gene order is extremely conserved. One such region is the str locus, containing the S10-spc-alpha operons. These operons contain genes coding for ribosomal proteins and for a number of housekeeping genes. We compared the organisation of these gene clusters in 111 sequenced prokaryotic genomes (99 bacterial and 12 archaeal genomes). We also compared the organisation to the phylogeny based on 16S ribosomal RNA gene sequences and the sequences of the ribosomal proteins L22, L16 and S14. Our data indicate that there is much variation in gene order and content in these gene clusters, both in bacterial as well as in archaeal genomes. Our data indicate that differential gene loss has occurred on multiple occasions during evolution. We also noted several discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14, suggesting that horizontal gene transfer did play a significant role in the evolution of the S10-spc-alpha gene clusters.  相似文献   

13.
During a systematic analysis of conserved gene context in prokaryotic genomes, a previously undetected, complex, partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea (with the exception of Thermoplasma acidophilum and Halobacterium NRC-1) and some bacteria, including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus. The gene composition and gene order in this neighborhood vary greatly between species, but all versions have a stable, conserved core that consists of five genes. One of the core genes encodes a predicted DNA helicase, often fused to a predicted HD-superfamily hydrolase, and another encodes a RecB family exonuclease; three core genes remain uncharacterized, but one of these might encode a nuclease of a new family. Two more genes that belong to this neighborhood and are present in most of the genomes in which the neighborhood was detected encode, respectively, a predicted HD-superfamily hydrolase (possibly a nuclease) of a distinct family and a predicted, novel DNA polymerase. Another characteristic feature of this neighborhood is the expansion of a superfamily of paralogous, uncharacterized proteins, which are encoded by at least 20–30% of the genes in the neighborhood. The functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system, which, to our knowledge, is the first repair system largely specific for thermophiles to be identified. This hypothetical repair system might be functionally analogous to the bacterial–eukaryotic system of translesion, mutagenic repair whose central components are DNA polymerases of the UmuC-DinB-Rad30-Rev1 superfamily, which typically are missing in thermophiles.  相似文献   

14.
Identification of functional open reading frames in chloroplast genomes   总被引:7,自引:0,他引:7  
K H Wolfe  P M Sharp 《Gene》1988,66(2):215-222
We have used a rapid computer dot-matrix comparison method to identify all DNA regions which have been evolutionarily conserved between the completely sequenced chloroplast genomes of tobacco and a liverwort. Analysis of these regions reveals 74 homologous open reading frames (ORFs) which have been conserved as to length and amino acid sequence; these ORFs also have an excess of nucleotide substitutions at silent sites of codons. Since the nonfunctional parts of these genomes have become saturated with mutations and show no sequence similarity whatsoever, the homologous ORFs are almost certainly functional. A further four pairs of ORFs show homology limited to only a short part of their putative gene products. Amino acid sequence identities range between 50 and 99%; some chloroplast proteins are seen to be among the most slowly evolving of all known proteins. A search of the nucleotide and amino acid sequence databanks has revealed several previously unidentified genes in chloroplast sequences from other species, but no new homologies to prokaryotic genes.  相似文献   

15.

Background  

Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as vertical inheritance, or functional selection. Thus, identification of conserved gene clusters that are under functional selection provides an effective channel for gene annotation, microarray screening, and pathway reconstruction. The problem of devising a robust method to identify these conserved gene clusters and to evaluate the significance of the conservation in multiple genomes has a number of implications for comparative, evolutionary and functional genomics as well as synthetic biology.  相似文献   

16.
The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and ar-chaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacte-ria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variat  相似文献   

17.
Systematic determination of gene function is an essential step in fully understanding the precise contribution of each gene for the proper execution of molecular functions in the cell. Gene functional linkage is defined as to describe the relationship of a group of genes with similar functions. With thousands of genomes sequenced, there arises a great opportunity to utilize gene evolutionary information to identify gene functional linkages. To this end, we established a computational method (called TRACE) to trace gene footprints through a gene functional network constructed from 341 prokaryotic genomes. TRACE performance was validated and successfully tested to predict enzyme functions as well as components of pathway. A so far undescribed chromosome partitioning-like protein ro03654 of an oleaginous bacteria Rhodococcus sp. RHA1 (RHA1) was predicted and verified experimentally with its deletion mutant showing growth inhibition compared to RHA1 wild type. In addition, four proteins were predicted to act as prokaryotic SNARE-like proteins, and two of them were shown to be localized at the plasma membrane. Thus, we believe that TRACE is an effective new method to infer prokaryotic gene functional linkages by tracing evolutionary events.  相似文献   

18.

Background  

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances.  相似文献   

19.
The study of conserved gene clusters is important for understanding the forces behind genome organization and evolution, as well as the function of individual genes or gene groups. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of genes that appear homologously in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique occurrence in each chromosome and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm analyzes a pair of chromosomes in O(mn) time and uses O(m+n) space, where m and n are the number of genes in the respective chromosomes. We demonstrate the utility of our methods by studying two bacterial genomes, E. coli K-12 and B. subtilis. Many of the teams identified by our algorithm correlate with documented E. coli operons, while several others match predicted operons, previously suggested by computational techniques. Our implementation and data are publicly available at euler.slu.edu/ approximately goldwasser/homologyteams/.  相似文献   

20.
While it is well accepted that horizontal gene transfer plays an important role in the evolution and the diversification of prokaryotic genomes, many questions remain open regarding its functional mechanisms of action and its interplay with the extant genome. This study addresses the relationship between proteome innovation by horizontal gene transfer and genome content in Proteobacteria. We characterize the transferred genes, focusing on the protein domain compositions and their relationships with the existing protein domain superfamilies in the genome. In agreement with previous observations, we find that the protein domain architectures of horizontally transferred genes are significantly shorter than the genomic average. Furthermore, protein domains that are more common in the total pool of genomes appear to have a proportionally higher chance to be transferred. This suggests that transfer events behave as if they were drawn randomly from a cross-genomic community gene pool, much like gene duplicates are drawn from a genomic gene pool. Finally, horizontally transferred genes carry domains of exogenous families less frequently for larger genomes, although they might do it more than expected by chance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号