首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.  相似文献   

2.

Background  

An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes.  相似文献   

3.
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.  相似文献   

4.
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.  相似文献   

5.
Connected gene neighborhoods in prokaryotic genomes   总被引:12,自引:1,他引:11  
A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.  相似文献   

6.
Comparative genomics has revealed that variations in bacterial and archaeal genome DNA sequences cannot be explained by only neutral mutations. Virus resistance and plasmid distribution systems have resulted in changes in bacterial and archaeal genome sequences during evolution. The restriction-modification system, a virus resistance system, leads to avoidance of palindromic DNA sequences in genomes. Clustered, regularly interspaced, short palindromic repeats (CRISPRs) found in genomes represent yet another virus resistance system. Comparative genomics has shown that bacteria and archaea have failed to gain any DNA with GC content higher than the GC content of their chromosomes. Thus, horizontally transferred DNA regions have lower GC content than the host chromosomal DNA does. Some nucleoid-associated proteins bind DNA regions with low GC content and inhibit the expression of genes contained in those regions. This form of gene repression is another type of virus resistance system. On the other hand, bacteria and archaea have used plasmids to gain additional genes. Virus resistance systems influence plasmid distribution. Interestingly, the restriction-modification system and nucleoid-associated protein genes have been distributed via plasmids. Thus, GC content and genomic signatures do not reflect bacterial and archaeal evolutionary relationships.  相似文献   

7.
Since the definition of archaea as a separate domain of life along with bacteria and eukaryotes, they have become one of the most interesting objects of modern microbiology, molecular biology, and biochemistry. Sequencing and analysis of archaeal genomes were especially important for studies on archaea because of a limited availability of genetic tools for the majority of these microorganisms and problems associated with their cultivation. Fifteen years since the publication of the first genome of an archaeon, more than one hundred complete genome sequences of representatives of different phylogenetic groups have been determined. Analysis of these genomes has expanded our knowledge of biology of archaea, their diversity and evolution, and allowed identification and characterization of new deep phylogenetic lineages of archaea. The development of genome technologies has allowed sequencing the genomes of uncultivated archaea directly from enrichment cultures, metagenomic samples, and even from single cells. Insights have been gained into the evolution of key biochemical processes in archaea, such as cell division and DNA replication, the role of horizontal gene transfer in the evolution of archaea, and new relationships between archaea and eukaryotes have been revealed.  相似文献   

8.
Natale DA  Shankavaram UT  Galperin MY  Wolf YI  Aravind L  Koonin EV 《Genome biology》2000,1(5):research0009.1-research000919

Background  

Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi.  相似文献   

9.
Comparing chromosomal gene order in two or more related species is an important approach to studying the forces that guide genome organization and evolution. Linked clusters of similar genes found in related genomes are often used to support arguments of evolutionary relatedness or functional selection. However, as the gene order and the gene complement of sister genomes diverge progressively due to large scale rearrangements, horizontal gene transfer, gene duplication and gene loss, it becomes increasingly difficult to determine whether observed similarities in local genomic structure are indeed remnants of common ancestral gene order, or are merely coincidences. A rigorous comparative genomics requires principled methods for distinguishing chance commonalities, within or between genomes, from genuine historical or functional relationships. In this paper, we construct tests for significant groupings against null hypotheses of random gene order, taking incomplete clusters, multiple genomes, and gene families into account. We consider both the significance of individual clusters of prespecified genes and the overall degree of clustering in whole genomes.  相似文献   

10.
The COG database: an updated version includes eukaryotes   总被引:4,自引:0,他引:4  

Background

The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.

Results

We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.

Conclusion

The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.  相似文献   

11.
鱼类特异的基因组复制   总被引:2,自引:0,他引:2  
周莉  汪洋  桂建芳 《动物学研究》2006,27(5):525-532
辐鳍鱼类是脊椎动物中种类最多、分布最广的类群,其基因组大小不等。过去的观点认为,在脊椎动物进化历程中曾发生了两次基因组复制。近期的系统基因组学研究资料进一步提出,在大约350百万年,辐鳍鱼还发生了第三次基因组复制,即鱼类特异的基因组复制(fish-specificgenomeduplication,FSGD),且发生的时间正处在“物种极度丰富”的硬骨鱼谱系(真骨总目)和“物种贫乏”的谱系(辐鳍鱼纲基部的类群)出现分歧的时间点,表明FSGD与硬骨鱼物种和生物多样性的增加有关。进一步开展鱼类比较基因组学和功能基因组学研究将进一步验证FSGD这一假说。  相似文献   

12.
The sequencing of several genomes from each of the three domains of life (Archaea, Bacteria and Eukarya) has provided a huge amount of data that can be used to gain insight about early cellular evolution. Some features of the universal tree of life based on rRNA polygenies have been confirmed, such as the division of the cellular living world into three domains. The monophyly of each domain is supported by comparative genomics. However, the hyperthermophilic nature of the 'last universal common ancestor' (LUCA) is not confirmed. Comparative genomics has revealed that gene transfers have been (and still are) very frequent in genome evolution. Nevertheless, a core of informational genes appears more resistant to transfer, testifying for a close relationship between archaeal and eukaryal informational processes. This observation can be explained either by a common unique history between Archaea and Eukarya or by an atypical evolution of these systems in Bacteria. At the moment, comparative genomics still does not allow to choose between a simple LUCA, possibly with an RNA genome, or a complex LUCA, with a DNA genome and informational mechanisms similar to those of Archaea and Eukarya. Further comparative studies on informational mechanisms in the three domains should help to resolve this critical question. The role of viruses in the origin and evolution of DNA genomes also appears an area worth of active investigations. I suggest here that DNA and DNA replication mechanisms appeared first in the virus world before being transferred into cellular organisms.  相似文献   

13.
Xu X  Wu X  Yu Z 《Génome》2010,53(12):1041-1052
Extraordinary variation has been found in mitochondrial (mt) genome inheritance, gene content and arrangement among bivalves. However, only few bivalve mt genomes have been comparatively analyzed to infer their evolutionary scenarios. In this study, the complete mt genome of the venerid Paphia euglypta (Bivalvia: Veneridae) was firstly studied and, secondly, it was comparatively analyzed with other venerids (e.g., Venerupis philippinarum and Meretrix petechialis) to better understand the mt genome evolution within a family. Though several common features such as the AT content, codon usage of protein-coding genes, and AT/GC skew are shared by the three venerids, a high level of variability is observed in genome size, gene content, gene order, arrangements and primary sequence of nucleotides or amino acids. Most of the gene rearrangement can be explained by the "tandem duplication and random loss" model. From the observed rearrangement patterns, we speculate that block interchange between adjacent genes may be common in the evolution of mt genomes in venerids. Furthermore, this study presents several new findings in mt genome annotation of V. philippinarum and M. petechialis, and hence we have reannotated the genome of these two species as: (1) the ORF of the formerly annotated cox2 gene in V. philippinarum is deduced by using a truncated "T" codon and a second cox2 gene is identified; (2) the trnS-AGN gene is identified and marked in the mt genome of both venerids. Thus, this study demonstrated a high variability of mt genomes in the Veneridae, and showed the importance of comparative mt genome analysis to interpret the evolution of the bivalve mt genome.  相似文献   

14.
Yanai I  DeLisi C 《Genome biology》2002,3(11):research0064.1-research006412

Background  

Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations.  相似文献   

15.
随着许多生物体全基因组测序的完成,兴起了最小基因组的研究,即一个能营独立生活的生物体最少需要多少个基因。已知最小细胞支原体基因组是研究最小基因组的重要内容,还通过比较多种已测序基因组COG分析最小基因组,目前通过转座子插入基因突主为和同源重组删除基因的分析,进行最小基因组研究。  相似文献   

16.
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.  相似文献   

17.
Ion current fluctuation of voltage‐dependent potassium channel in LβT2 cells has been investigated by autocorrelation function and DFA (detrended fluctuation analysis) methods. The calculation of the autocorrelation function exponent and DFA exponent of the sample was based on the digital signals or the 0–1 series corresponding to closing and opening of channels after routine evolution, rather than the sequence of sojourn times. The persistent character of the correlation of the time series was evident from the slow decay of the autocorrelation function. DFA exponent α was significantly greater than 0.5. The main outcome has been the demonstration of the existence of memory in this ion channel. Thus, the ion channel current fluctuation provided information about the kinetics of the channel protein. The result suggests the correlation character of the ion channel protein non‐linear kinetics indicates whether the channel is open or not.  相似文献   

18.
19.
Sridhar J  Rafi ZA 《Bioinformation》2008,2(7):284-295
One of the key challenges in computational genomics is annotating coding genes and identification of regulatory RNAs in complete genomes. An attempt is made in this study which uses the regulatory RNA locations and their conserved flanking genes identified within the genomic backbone of template genome to search for similar RNA locations in query genomes. The search is based on recently reported coexistence of small RNAs and their conserved flanking genes in related genomes. Based on our study, 54 additional sRNA locations and functions of 96 uncharacterized genes are predicted in two draft genomes viz., Serratia marcesens Db1 and Yersinia enterocolitica 8081. Although most of the identified additional small RNA regions and their corresponding flanking genes are homologous in nature, the proposed anchoring technique could successfully identify four non-homologous small RNA regions in Y. enterocolitica genome also. The KEGG Orthology (KO) based automated functional predictions confirms the predicted functions of 65 flanking genes having defined KO numbers, out of the total 96 predictions made by this method. This coexistence based method shows more sensitivity than controlled vocabularies in locating orthologous gene pairs even in the absence of defined Orthology numbers. All functional predictions made by this study in Y. enterocolitica 8081 were confirmed by the recently published complete genome sequence and annotations. This study also reports the possible regions of gene rearrangements in these two genomes and further characterization of such RNA regions could shed more light on their possible role in genome evolution.  相似文献   

20.
Aldehyde dehydrogenase (ALDH) superfamily represents a group of NAD(P)(+)-dependent enzymes that catalyze the oxidation of a wide spectrum of endogenous and exogenous aldehydes. With the advent of megabase genome sequencing, the ALDH superfamily is expanding rapidly on many fronts. As expected, ALDH genes are found in virtually all genomes analyzed to date, indicating the importance of these enzymes in biological functions. Complete genome sequences of various species have revealed additional ALDH genes. As of July 2000, the ALDH superfamily consists of 331 distinct genes, of which eight are found in archaea, 165 in eubacteria, and 158 in eukaryota. The number of ALDH genes in some species with their genomes completely sequenced and annotated, Escherichia coli and Caenorhabditis elegans, ranges from 10 to 17. In the human genome, 17 functional genes and three pseudogenes have been identified to date. Divergent evolution, based on multiple alignment analysis of 86 eukaryotic ALDH amino-acid sequences, was the basis of the standardized ALDH gene nomenclature system (Pharmacogenetics 9: 421-434, 1999). Thus far, the eukaryotic ALDHs comprise 20 gene families. A complete list of all ALDH sequences known to date is presented here along with the evolution analysis of the eukaryotic ALDHs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号