首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.  相似文献   

2.
Classical studies on protist diversity of freshwater environments worldwide have led to the idea that most species of microbial eukaryotes are known. One exemplary case would be constituted by the ciliates, which have been claimed to encompass a few thousands of ubiquitous species, most of them already described. Recently, molecular methods have revealed an unsuspected protist diversity, especially in oceanic as well as some extreme environments, suggesting the occurrence of a hidden diversity of eukaryotic lineages. In order to test if this holds also for freshwater environments, we have carried out a molecular survey of small subunit ribosomal RNA genes in water and sediment samples of two ponds, one oxic and another suboxic, from the same geographic area. Our results show that protist diversity is very high. The majority of phylotypes affiliated within a few well established eukaryotic kingdoms or phyla, including alveolates, cryptophytes, heterokonts, Cercozoa, Centroheliozoa and haptophytes, although a few sequences did not display a clear taxonomic affiliation. The diversity of sequences within groups was very large, particularly that of ciliates, and a number of them were very divergent from known species, which could define new intra-phylum groups. This suggests that, contrary to current ideas, the diversity of freshwater protists is far from being completely described.  相似文献   

3.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

4.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

5.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

6.
7.
Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.  相似文献   

8.
In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22 000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir.  相似文献   

9.
Biological systems evolved to be functionally robust in uncertain environments, but also highly adaptable. Such robustness is partly achieved by genetic redundancy, where the failure of a specific component through mutation or environmental challenge can be compensated by duplicate components capable of performing, to a limited extent, the same function. Highly variable environments require very robust systems. Conversely, predictable environments should not place a high selective value on robustness. Here we test this hypothesis by investigating the evolutionary dynamics of genetic redundancy in extremely reduced genomes, found mostly in intracellular parasites and endosymbionts. By combining data analysis with simulations of genome evolution we show that in the extensive gene loss suffered by reduced genomes there is a selective drive to keep the diversity of protein families while sacrificing paralogy. We show that this is not a by-product of the known drivers of genome reduction and that there is very limited convergence to a common core of families, indicating that the repertoire of protein families in reduced genomes is the result of historical contingency and niche-specific adaptations. We propose that our observations reflect a loss of genetic redundancy due to a decreased selection for robustness in a predictable environment.  相似文献   

10.
Yeast chromosome III: new gene functions.   总被引:19,自引:1,他引:18       下载免费PDF全文
E V Koonin  P Bork    C Sander 《The EMBO journal》1994,13(3):493-503
  相似文献   

11.
There are many more phyla of microbes than of macro-organisms, but microbial biodiversity is poorly understood because most microbes are uncultured. Phylogenetic analysis of rDNA sequences cloned after PCR amplification of DNA extracted directly from environmental samples is a powerful way of exploring our degree of ignorance of major groups. As there are only five eukaryotic kingdoms, two claims using such methods for numerous novel 'kingdom-level' lineages among anaerobic eukaryotes would be remarkable, if true. By reanalysing those data with 167 known species (not merely 8-37), I identified relatives for all 8-10 'mysterious' lineages. All probably belong to one of five already recognized phyla (Amoebozoa, Cercozoa, Apusozoa, Myzozoa, Loukozoa) within the basal kingdom Protozoa, mostly in known classes, sometimes even in known orders, families or genera. This strengthens the idea that the ancestral eukaryote was a mitochondrial aerobe. Analogous claims of novel bacterial divisions or kingdoms may reflect the weak resolution and grossly non-clock-like evolution of ribosomal rRNA, not genuine phylum-level biological disparity. Critical interpretation of environmental DNA sequences suggests that our overall picture of microbial biodiversity at phylum or division level is already rather good and comprehensive and that there are no uncharacterized kingdoms of life. However, immense lower-level diversity remains to be mapped, as does the root of the tree of life.  相似文献   

12.
The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.  相似文献   

13.
A unifold, mesofold, and superfold model of protein fold use.   总被引:4,自引:0,他引:4  
As more and more protein structures are determined, there is increasing interest in the question of how many different folds have been used in biology. The history of the rate of discovery of new folds and the distribution of sequence families among known folds provide a means of estimating the underlying distribution of fold use. Previous models exploiting these data have led to rather different conclusions on the total number of folds. We present a new model, based on the notion that the folds used in biology fall naturally into three classes: unifolds, that is, folds found only in a single narrow sequence family; mesofolds, found in an intermediate number of families; and the previously noted superfolds, found in many protein families. We show that this model fits the available data well and has predicted the development of SCOP over the past 2 years. The principle implications of the model are as follows: (1) The vast majority of folds will be found in only a single sequence family; (2) the total number of folds is at least 10,000; and (3) 80% of sequence families have one of about 400 folds, most of which are already known.  相似文献   

14.
Snake venoms present a great diversity of pharmacologically active compounds that may be applied as research and biotechnological tools, as well as in drug development and diagnostic tests for certain diseases. The most abundant toxins have been extensively studied in the last decades and some of them have already been used for different purposes. Nevertheless, most of the minor snake venom protein classes remain poorly explored, even presenting potential application in diverse areas. The main difficulty in studying these proteins lies on the impossibility of obtaining sufficient amounts of them for a comprehensive investigation. The advent of more sensitive techniques in the last few years allowed the discovery of new venom components and the in-depth study of some already known minor proteins. This review summarizes information regarding some structural and functional aspects of low abundant snake venom proteins classes, such as growth factors, hyaluronidases, cysteine-rich secretory proteins, nucleases and nucleotidases, cobra venom factors, vespryns, protease inhibitors, antimicrobial peptides, among others. Some potential applications of these molecules are discussed herein in order to encourage researchers to explore the full venom repertoire and to discover new molecules or applications for the already known venom components.  相似文献   

15.
极端环境下嗜热酸甲烷营养细菌研究进展   总被引:5,自引:0,他引:5  
郑勇  郑袁明  张丽梅  贺纪正 《生态学报》2009,29(7):3864-3871
甲烷营养细菌能够将温室气体甲烷(CH4)转化为CO2或生物质,在碳生物地球化学循环及缓解由温室气体导致的全球气候变化方面发挥着重要的作用.甲烷营养细菌生存的条件范围较为广泛,但在中性pH (5~8)和中温(20~35℃)范围内生长最佳.系统进化分析认为,它们均属于γ-或α-变形菌门(Proteobacteria).最近3项独立完成的研究从极端热酸(pH接近1,温度高于50℃)环境中分离获得了具有甲烷氧化(营养)功能的微生物,经鉴定均属于疣微菌门(Verrucomicrobia).这些全新的、不同于以往的研究结果不仅是对现有关于甲烷营养细菌生态学认知的进一步拓展,同时也暗示着可能存在着新型的、由微生物介导的CH4氧化途径与机制. 因此,特就极端环境中嗜热嗜酸甲烷营养细菌的最新研究进展作一概述.  相似文献   

16.
新疆塔里木盆地可培养嗜盐放线菌系统发育多样性   总被引:3,自引:0,他引:3  
应用纯培养手段和基于16S rRNA基因序列的系统发育分析,对从塔里木盆地高盐环境土壤样品中分离的18株可培养嗜盐放线菌多样性进行了研究.实验结果表明,18株嗜盐放线菌可3个(GlycomycetaceaePseudonocardineae和Nocardiopsaceae),在有效发表的5个属的嗜盐放线菌中有4个属的嗜盐放线菌被分离到.多数菌株属于Actinopolyspora属(38.9%),Nocardiopsis属(27.8%)和Streptomonospora属(22.2%),是塔里木盆地高盐环境中嗜盐放线菌的优势类群.这些分离菌株中,菌株YIM 92370与最近种的相似性为92%,在Glycomycetaceae科内形成一个独立的分支,极有可能代表Glycomycetaceae科的一个新属.研究结果表明塔里木盆地高盐环境中存在有较为丰富的嗜盐放线菌系统发育多样性,并且潜藏着新类型的放线菌资源.  相似文献   

17.
18.
The eukaryotic protein kinase (ePK) domain mediates the majority of signaling and coordination of complex events in eukaryotes. By contrast, most bacterial signaling is thought to occur through structurally unrelated histidine kinases, though some ePK-like kinases (ELKs) and small molecule kinases are known in bacteria. Our analysis of the Global Ocean Sampling (GOS) dataset reveals that ELKs are as prevalent as histidine kinases and may play an equally important role in prokaryotic behavior. By combining GOS and public databases, we show that the ePK is just one subset of a diverse superfamily of enzymes built on a common protein kinase-like (PKL) fold. We explored this huge phylogenetic and functional space to cast light on the ancient evolution of this superfamily, its mechanistic core, and the structural basis for its observed diversity. We cataloged 27,677 ePKs and 18,699 ELKs, and classified them into 20 highly distinct families whose known members suggest regulatory functions. GOS data more than tripled the count of ELK sequences and enabled the discovery of novel families and classification and analysis of all ELKs. Comparison between and within families revealed ten key residues that are highly conserved across families. However, all but one of the ten residues has been eliminated in one family or another, indicating great functional plasticity. We show that loss of a catalytic lysine in two families is compensated by distinct mechanisms both involving other key motifs. This diverse superfamily serves as a model for further structural and functional analysis of enzyme evolution.  相似文献   

19.
This article presents a comprehensive review of large and highly diverse superfamily of nucleotidyltransferase fold proteins by providing a global picture about their evolutionary history, sequence-structure diversity and fulfilled functional roles. Using top-of-the-line homology detection method combined with transitive searches and fold recognition, we revised the realm of these superfamily in numerous databases of catalogued protein families and structures, and identified 10 new families of nucleotidyltransferase fold. These families include hundreds of previously uncharacterized and various poorly annotated proteins such as Fukutin/LICD, NFAT, FAM46, Mab-21 and NRAP. Some of these proteins seem to play novel important roles, not observed before for this superfamily, such as regulation of gene expression or choline incorporation into cell membrane. Importantly, within newly detected families we identified 25 novel superfamily members in human genome. Among these newly assigned members are proteins known to be involved in congenital muscular dystrophy, neurological diseases and retinal pigmentosa what sheds some new light on the molecular background of these genetic disorders. Twelve of new human nucleotidyltransferase fold proteins belong to Mab-21 family known to be involved in organogenesis and development. The determination of specific biological functions of these newly detected proteins remains a challenging task.  相似文献   

20.
Yin Y  Huang J  Gu X  Bar-Peled M  Xu Y 《PloS one》2011,6(11):e27995
Nucleotide-diphospho-sugars (NDP-sugars) are the building blocks of diverse polysaccharides and glycoconjugates in all organisms. In plants, 11 families of NDP-sugar interconversion enzymes (NSEs) have been identified, each of which interconverts one NDP-sugar to another. While the functions of these enzyme families have been characterized in various plants, very little is known about their evolution and origin. Our phylogenetic analyses indicate that all the 11 plant NSE families are distantly related and most of them originated from different progenitor genes, which have already diverged in ancient prokaryotes. For instance, all NSE families are found in the lower land plant mosses and most of them are also found in aquatic algae, implicating that they have already evolved to be capable of synthesizing all the 11 different NDP-sugars. Particularly interesting is that the evolution of RHM (UDP-L-rhamnose synthase) manifests the fusion of genes of three enzymatic activities in early eukaryotes in a rather intriguing manner. The plant NRS/ER (nucleotide-rhamnose synthase/epimerase-reductase), on the other hand, evolved much later from the ancient plant RHMs through losing the N-terminal domain. Based on these findings, an evolutionary model is proposed to explain the origin and evolution of different NSE families. For instance, the UGlcAE (UDP-D-glucuronic acid 4-epimerase) family is suggested to have evolved from some chlamydial bacteria. Our data also show considerably higher sequence diversity among NSE-like genes in modern prokaryotes, consistent with the higher sugar diversity found in prokaryotes. All the NSE families are widely found in plants and algae containing carbohydrate-rich cell walls, while sporadically found in animals, fungi and other eukaryotes, which do not have or have cell walls with distinct compositions. Results of this study were shown to be highly useful for identifying unknown genes for further experimental characterization to determine their functions in the synthesis of diverse glycosylated molecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号