首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
2.
The COG database: an updated version includes eukaryotes   总被引:4,自引:0,他引:4  

Background

The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.

Results

We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.

Conclusion

The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.  相似文献   

3.

Background and Aims

The OVATE gene encodes a nuclear-localized regulatory protein belonging to a distinct family of plant-specific proteins known as the OVATE family proteins (OFPs). OVATE was first identified as a key regulator of fruit shape in tomato, with nonsense mutants displaying pear-shaped fruits. However, the role of OFPs in plant development has been poorly characterized.

Methods

Public databases were searched and a total of 265 putative OVATE protein sequences were identified from 13 sequenced plant genomes that represent the major evolutionary lineages of land plants. A phylogenetic analysis was conducted based on the alignment of the conserved OVATE domain from these 13 selected plant genomes. The expression patterns of tomato SlOFP genes were analysed via quantitative real-time PCR. The pattern of OVATE gene duplication resulting in the expansion of the gene family was determined in arabidopsis, rice and tomato.

Key Results

Genes for OFPs were found to be present in all the sampled land plant genomes, including the early-diverged lineages, mosses and lycophytes. Phylogenetic analysis based on the amino acid sequences of the conserved OVATE domain defined 11 sub-groups of OFPs in angiosperms. Different evolutionary mechanisms are proposed for OVATE family evolution, namely conserved evolution and divergent expansion. Characterization of the AtOFP family in arabidopsis, the OsOFP family in rice and the SlOFP family in tomato provided further details regarding the evolutionary framework and revealed a major contribution of tandem and segmental duplications towards expansion of the OVATE gene family.

Conclusions

This first genome-wide survey on OFPs provides new insights into the evolution of the OVATE protein family and establishes a solid base for future functional genomics studies on this important but poorly characterized regulatory protein family in plants.  相似文献   

4.

Background

Chlamydia pneumoniae causes human respiratory diseases and has recently been associated with atherosclerosis. Analysis of the three recently published C. pneumoniae genomes has led to the identification of a new gene family (the Cpn 1054 family) that consists of 11 predicted genes and gene fragments. Each member encodes a polypeptide with a hydrophobic domain characteristic of proteins localized to the inclusion membrane.

Results

Comparative analysis of this gene family within the published genome sequences provided evidence that multiple levels of genetic variation are evident within this single collection of paralogous genes. Frameshift mutations are found that result in both truncated gene products and pseudogenes that vary among isolates. Several genes in this family contain polycytosine (polyC) tracts either upstream or within the terminal 5' end of the predicted coding sequence. The length of the polyC stretch varies between paralogous genes and within single genes in the three genomes. Sequence analysis of genomic DNA from a collection of 12 C. pneumoniae clinical isolates was used to determine the extent of the variation in the Cpn 1054 gene family.

Conclusions

These studies demonstrate that sequence variability is present both among strains and within strains at several of the loci. In particular, changes in the length of the polyC tract associated with the different Cpn 1054 gene family members are common within each tested C. pneumoniae isolate. The variability identified within this newly described gene family may modulate either phase or antigenic variation and subsequent physiologic diversity within a C. pneumoniae population.  相似文献   

5.

Background

Extensive genome-wide analyses of many human populations, using microarrays containing hundreds of thousands of single-nucleotide polymorphisms, have provided us with abundant information about global genomic diversity. However, these data can also be used to analyze local variability in individual genomic regions. In this study, we analyzed the variability in two genomic regions carrying the genes of the GSTA and GSTM subfamilies, located on different chromosomes.

Results

Analysis of the polymorphisms in GSTA and GSTM gene clusters showed similarities in their allelic and haplotype diversities. These patterns were similar in three Russian populations and the CEU population of European origin. There were statistically significant differences in all the haploblocks of both the GSTM and GSTA regions when the Russian populations were compared with populations from China and Japan. Most haploblocks also differed between the Russians and Nigerians from Yoruba, but, some of them had similar allelic frequencies. Special attention was paid to SNP rs4986947 from the intron of the GSTA4 gene, which is represented in apes by an A nucleotide. In the Asian and African samples, it was represented only by a G allele, and both allelic variants (G/A) occurred in the Russian and European populations.

Conclusions

The results obtained suggest the presence of common features in the evolutionary histories of the GSTA and GSTM gene regions, and that African subpopulations were involved differently in the formation of the European and Asian human lineages.  相似文献   

6.

Background

Entamoeba histolytica is a significant cause of disease worldwide. However, little is known about the genetic diversity of the parasite. We re-sequenced the genomes of ten laboratory cultured lines of the eukaryotic pathogen Entamoeba histolytica in order to develop a picture of genetic diversity across the genome.

Results

The extreme nucleotide composition bias and repetitiveness of the E. histolytica genome provide a challenge for short-read mapping, yet we were able to define putative single nucleotide polymorphisms in a large portion of the genome. The results suggest a rather low level of single nucleotide diversity, although genes and gene families with putative roles in virulence are among the more polymorphic genes. We did observe large differences in coverage depth among genes, indicating differences in gene copy number between genomes. We found evidence indicating that recombination has occurred in the history of the sequenced genomes, suggesting that E. histolytica may reproduce sexually.

Conclusions

E. histolytica displays a relatively low level of nucleotide diversity across its genome. However, large differences in gene family content and gene copy number are seen among the sequenced genomes. The pattern of polymorphism indicates that E. histolytica reproduces sexually, or has done so in the past, which has previously been suggested but not proven.  相似文献   

7.
8.
The dog and rat olfactory receptor repertoires   总被引:1,自引:0,他引:1       下载免费PDF全文

Background

Dogs and rats have a highly developed capability to detect and identify odorant molecules, even at minute concentrations. Previous analyses have shown that the olfactory receptors (ORs) that specifically bind odorant molecules are encoded by the largest gene family sequenced in mammals so far.

Results

We identified five amino acid patterns characteristic of ORs in the recently sequenced boxer dog and brown Norway rat genomes. Using these patterns, we retrieved 1,094 dog genes and 1,493 rat genes from these shotgun sequences. The retrieved sequences constitute the olfactory receptor repertoires of these two animals. Subsets of 20.3% (for the dog) and 19.5% (for the rat) of these genes were annotated as pseudogenes as they had one or several mutations interrupting their open reading frames. We performed phylogenetic studies and organized these two repertoires into classes, families and subfamilies.

Conclusion

We have established a complete or almost complete list of OR genes in the dog and the rat and have compared the sequences of these genes within and between the two species. Our results provide insight into the evolutionary development of these genes and the local amplifications that have led to the specific amplification of many subfamilies. We have also compared the human and rat ORs with the human and mouse OR repertoires.  相似文献   

9.
Li M  Liu J  Zhang C 《PloS one》2011,6(10):e26999

Background

The mitogen activated protein kinases (MAPK) family pathway is implicated in diverse cellular processes and pathways essential to most organisms. Its evolution is conserved throughout the eukaryotic kingdoms. However, the detailed evolutionary history of the vertebrate MAPK family is largely unclear.

Methodology/Principal Findings

The MAPK family members were collected from literatures or by searching the genomes of several vertebrates and invertebrates with the known MAPK sequences as queries. We found that vertebrates had significantly more MAPK family members than invertebrates, and the vertebrate MAPK family originated from 3 progenitors, suggesting that a burst of gene duplication events had occurred after the divergence of vertebrates from invertebrates. Conservation of evolutionary synteny was observed in the vertebrate MAPK subfamilies 4, 6, 7, and 11 to 14. Based on synteny and phylogenetic relationships, MAPK12 appeared to have arisen from a tandem duplication of MAPK11 and the MAPK13-MAPK14 gene unit was from a segmental duplication of the MAPK11-MAPK12 gene unit. Adaptive evolution analyses reveal that purifying selection drove the evolution of MAPK family, implying strong functional constraints of MAPK genes. Intriguingly, however, intron losses were specifically observed in the MAPK4 and MAPK7 genes, but not in their flanking genes, during the evolution from teleosts to amphibians and mammals. The specific occurrence of intron losses in the MAPK4 and MAPK7 subfamilies might be associated with adaptive evolution of the vertebrates by enhancing the gene expression level of both MAPK genes.

Conclusions/Significance

These results provide valuable insight into the evolutionary history of the vertebrate MAPK family.  相似文献   

10.
11.
12.
13.

Background

Francisella tularensis subspecies tularensis and holarctica are pathogenic to humans, whereas the two other subspecies, novicida and mediasiatica, rarely cause disease. To uncover the factors that allow subspecies tularensis and holarctica to be pathogenic to humans, we compared their genome sequences with the genome sequence of Francisella tularensis subspecies novicida U112, which is nonpathogenic to humans.

Results

Comparison of the genomes of human pathogenic Francisella strains with the genome of U112 identifies genes specific to the human pathogenic strains and reveals pseudogenes that previously were unidentified. In addition, this analysis provides a coarse chronology of the evolutionary events that took place during the emergence of the human pathogenic strains. Genomic rearrangements at the level of insertion sequences (IS elements), point mutations, and small indels took place in the human pathogenic strains during and after differentiation from the nonpathogenic strain, resulting in gene inactivation.

Conclusion

The chronology of events suggests a substantial role for genetic drift in the formation of pseudogenes in Francisella genomes. Mutations that occurred early in the evolution, however, might have been fixed in the population either because of evolutionary bottlenecks or because they were pathoadaptive (beneficial in the context of infection). Because the structure of Francisella genomes is similar to that of the genomes of other emerging or highly pathogenic bacteria, this evolutionary scenario may be shared by pathogens from other species.  相似文献   

14.
Avian genomes are small and lack some genes that are conserved in the genomes of most other vertebrates including nonavian sauropsids. One hypothesis stated that paralogs may provide biochemical or physiological compensation for certain gene losses; however, no functional evidence has been reported to date. By integrating evolutionary analysis, physiological genomics, and experimental gene interference, we clearly demonstrate functional compensation for gene loss. A large-scale phylogenetic analysis of over 1,400 SLC2 gene sequences identifies six new SLC2 genes from nonmammalian vertebrates and divides the SLC2 gene family into four classes. Vertebrates retain class III SLC2 genes but partially lack the more recent duplicates of classes I and II. Birds appear to have completely lost the SLC2A4 gene that encodes an important insulin-sensitive GLUT in mammals. We found strong evidence for positive selection, indicating that the N-termini of SLC2A4 and SLC2A12 have undergone diversifying selection in birds and mammals, and there is a significant correlation between SLC2A12 functionality and basal metabolic rates in endotherms. Physiological genomics have uncovered that SLC2A12 expression and allelic variants are associated with insulin sensitivity and blood glucose levels in wild birds. Functional tests have indicated that SLC2A12 abrogation causes hyperglycemia, insulin resistance, and high relative activity, thus increasing energy expenditures that resemble a diabetic phenotype. These analyses suggest that the SLC2A12 gene not only functionally compensates insulin response for SLC2A4 loss but also affects daily physical behavior and basal metabolic rate during bird evolution, highlighting that older genes retain a higher level of functional diversification.  相似文献   

15.
Yan J  Cai Z 《PloS one》2010,5(12):e14276

Background

The cytochrome P450 (CYP) superfamily is a multifunctional hemethiolate enzyme that is widely distributed from Bacteria to Eukarya. The CYP3 family contains mainly the four subfamilies CYP3A, CYP3B, CYP3C and CYP3D in vertebrates; however, only the Actinopterygii (ray-finned fish) have all four subfamilies and detailed understanding of the evolutionary relationship of Actinopterygii CYP3 family members would be valuable.

Methods and Findings

Phylogenetic relationships were constructed to trace the evolutionary history of the Actinopterygii CYP3 family genes. Selection analysis, relative rate tests and functional divergence analysis were combined to interpret the relationship of the site-specific evolution and functional divergence in the Actinopterygii CYP3 family. The results showed that the four CYP3 subfamilies in Actinopterygii might be formed by gene duplication. The first gene duplication event was responsible for divergence of the CYP3B/C clusters from ancient CYP3 before the origin of the Actinopterygii, which corresponded to the fish-specific whole genome duplication (WGD). Tandem repeat duplication in each of the homologue clusters produced stable CYP3B, CYP3C, CYP3A and CYP3D subfamilies. Acceleration of asymmetric evolutionary rates and purifying selection together were the main force for the production of new subfamilies and functional divergence in the new subset after gene duplication, whereas positive selection was detected only in the retained CYP3A subfamily. Furthermore, nearly half of the functional divergence sites appear to be related to substrate recognition, which suggests that site-specific evolution is closely related with functional divergence in the Actinopterygii CYP3 family.

Conclusions

The split of fish-specific CYP3 subfamilies was related to the fish-specific WGD, and site-specific acceleration of asymmetric evolutionary rates and purifying selection was the main force for the origin of the new subfamilies and functional divergence in the new subset after gene duplication. Site-specific evolution in substrate recognition was related to functional divergence in the Actinopterygii CYP3 family.  相似文献   

16.

Background

The recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB) genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C), along with the genomes of laboratory strains (H37Rv and H37Ra), provides new insights on the mechanisms of adaptation of this bacterium to the human host.

Findings

The genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms.

Conclusion

The comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.  相似文献   

17.

Background

Amino acid transporters (AATs) that transport amino acids across cellular membranes are essential for plant growth and development. To date, a genome-wide overview of the AAT gene family in rice is not yet available.

Methodology/Principal Findings

In this study, a total of 85 AAT genes were identified in rice genome and were classified into eleven distinct subfamilies based upon their sequence composition and phylogenetic relationship. A large number of OsAAT genes were expanded via gene duplication, 23 and 24 OsAAT genes were tandemly and segmentally duplicated, respectively. Comprehensive analyses were performed to investigate the expression profiles of OsAAT genes in various stages of vegetative and reproductive development by using data from EST, Microarrays, MPSS and Real-time PCR. Many OsAAT genes exhibited abundant and tissue-specific expression patterns. Moreover, 21 OsAAT genes were found to be differentially expressed under the treatments of abiotic stresses. Comparative analysis indicates that 26 AAT genes with close evolutionary relationships between rice and Arabidopsis exhibited similar expression patterns.

Conclusions/Significance

This study will facilitate further studies on OsAAT family and provide useful clues for functional validation of OsAATs.  相似文献   

18.

Background

Genetic plasticity may be understood as the ability of a functional gene network to tolerate alterations in its components or structure. Usually, the studies involving gene modifications in the course of the evolution are concerned to nucleotide sequence alterations in closely related species. However, the analysis of large scale data about the distribution of gene families in non-exclusively closely related species can provide insights on how plastic or how conserved a given gene family is. Here, we analyze the abundance and diversity of all Eukaryotic Clusters of Orthologous Groups (KOG) present in STRING database, resulting in a total of 4,850 KOGs. This dataset comprises 481,421 proteins distributed among 55 eukaryotes.

Results

We propose an index to evaluate the evolutionary plasticity and conservation of an orthologous group based on its abundance and diversity across eukaryotes. To further KOG plasticity analysis, we estimate the evolutionary distance average among all proteins which take part in the same orthologous group. As a result, we found a strong correlation between the evolutionary distance average and the proposed evolutionary plasticity index. Additionally, we found low evolutionary plasticity in Saccharomyces cerevisiae genes associated with inviability and Mus musculus genes associated with early lethality. At last, we plot the evolutionary plasticity value in different gene networks from yeast and humans. As a result, it was possible to discriminate among higher and lower plastic areas of the gene networks analyzed.

Conclusions

The distribution of gene families brings valuable information on evolutionary plasticity which might be related with genetic plasticity. Accordingly, it is possible to discriminate among conserved and plastic orthologous groups by evaluating their abundance and diversity across eukaryotes.

Reviewers

This article was reviewed by Prof Manyuan Long, Hiroyuki Toh, and Sebastien Halary.  相似文献   

19.

Background

Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.

Results

There are 959 single copy nuclear genes shared in Arabidopsis, Populus, Vitis and Oryza ["APVO SSC genes"]. The majority of these genes are also present in the Selaginella and Physcomitrella genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown. Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.

Conclusions

Putatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号