首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Mark Gerstein 《Proteins》1998,33(4):518-534
Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori, and E. coli are compared in terms of patterns of fold usage—whether a given fold occurs in a particular organism. Of the ∼340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in all-helical structure and enriched in mixed helix-sheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many non-homologous sequence families, and are especially similar in overall architecture—eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly expressed folds are considerably different from the most highly duplicated folds. A tree can be constructed grouping genomes in terms of their shared folds. This has a remarkably similar topology to more conventional classifications, based on very different measures of relatedness. Finally, folds of membrane proteins can be analyzed through transmembrane-helix (TM) prediction. All the genomes appear to have similar usage patterns for these folds, with the occurrence of a particular fold falling off rapidly with increasing numbers of TM-elements, according to a “Zipf-like” law. This implies there are no marked preferences for proteins with particular numbers of TM-helices (e.g. 7-TM) in microbial genomes. Further information pertinent to this analysis is available at http://bioinfo.mbb.yale.edu/genome. Proteins 33:518–534, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

3.
After the surprisingly low number of genes identified in the human genome, alternative splicing emerged as a major mechanism to generate protein diversity in higher eukaryotes. However, it is still not known if its prevalence along the genome evolution has contributed to the overall functional protein diversity or if it simply reflects splicing noise. The (βα)8 barrel or TIM barrel is one of the most frequent, versatile, and ancient fold encountered among enzymes. Here, we analyze the structural modifications present in TIM barrel proteins from the human genome product of alternative splicing events. We found that 87% of all splicing events involved deletions; most of these events resulted in protein fragments that corresponded to the (βα)2, (βα)4, (βα)5, (βα)6, and (βα)7 subdomains of TIM barrels. Because approximately 7% of all the splicing events involved internal β-strand substitutions, we decided, based on the genomic data, to design β-strand and α-helix substitutions in a well-studied TIM barrel enzyme. The biochemical characterization of one of the chimeric variants suggests that some of the splice variants in the human genome with β-strand substitutions may be evolving novel functions via either the oligomeric state or substrate specificity. We provide results of how the splice variants represent subdomains that correlate with the independently folding and evolving structural units previously reported. This work is the first to observe a link between the structural features of the barrel and a recurrent genetic mechanism. Our results suggest that it is reasonable to expect that a sizeable fraction of splice variants found in the human genome represent structurally viable functional proteins. Our data provide additional support for the hypothesis of the origin of the TIM barrel fold through the assembly of smaller subdomains. We suggest a model of how nature explores new proteins through alternative splicing as a mechanism to diversify the proteins encoded in the human genome.  相似文献   

4.
5.
6.
We provide statistically reliable sequence evidence indicating that at least 12 of 23 SCOP (betaalpha)(8) (TIM) barrel superfamilies share a common origin. This includes all but one of the known and predicted TIM barrels found in central metabolism. The statistical evidence is complemented by an examination of the details of protein structure, with certain structural locations favouring catalytic residues even though the nature of their molecular function may change. The combined analysis of sequence, structure and function also enables us to propose a phylogeny of TIM barrels. Based on these data, we are able to examine differing theories of pathway and enzyme evolution, by mapping known TIM barrel folds to the pathways of central metabolism. The results favour widespread recruitment of enzymes between pathways, rather than a "backwards evolution" model, and support the idea that modern proteins may have arisen from common ancestors that bound key metabolites.  相似文献   

7.
The availability of complete genome sequences of H. pylori 26695 has provided a wealth of information enabling us to carry out in silico studies to identify new molecular targets for pharmaceutical treatment. In order to construe the structural and functional information of complete proteome, use of computational methods are more relevant since these methods are reliable and provide a solution to the time consuming and expensive experimental methods. Out of 1590 predicted protein coding genes in H. pylori, experimentally determined structures are available for only 145 proteins in the PDB. In the absence of experimental structures, computational studies on the three dimensional (3D) structural organization would help in deciphering the protein fold, structure and active site. Functional annotation of each protein was carried out based on structural fold and binding site based ligand association. Most of these proteins are uncharacterized in this proteome and through our annotation pipeline we were able to annotate most of them. We could assign structural folds to 464 uncharacterized proteins from an initial list of 557 sequences. Of the 1195 known structural folds present in the SCOP database, 411 (34% of all known folds) are observed in the whole H. pylori 26695 proteome, with greater inclination for domains belonging to α/β class (36.63%). Top folds include P-loop containing nucleoside triphosphate hydrolases (22.6%), TIM barrel (16.7%), transmembrane helix hairpin (16.05%), alpha-alpha superhelix (11.1%) and S-adenosyl-L-methionine-dependent methyltransferases (10.7%).  相似文献   

8.
9.
Advancement in technology has helped to solve structures of several proteins including M. tuberculosis (MTB) proteins. Identifying similarity between protein structures could not only yield valuable clues to their function, but can also be employed for motif finding, protein docking and off-target identification. The current study has undertaken analysis of structures of all MTB gene products with available structures was analyzed. Majority of the MTB proteins belonged to the α/β class. 23 different protein folds are used in the MTB protein structures. Of these, the TIM barrel fold was found to be highly conserved even at very low sequence identity. We identified 21 paralogs and 27 analogs of MTB based on domains and EC classification. Our analysis revealed that many of the current drug targets share structural similarity with other proteins within the MTB genome, which could probably be off-targets. Results of this analysis have been made available in the Mycobacterium tuberculosis Structural Database (http://bmi.icmr.org.in/mtbsd/MtbSD.php/search.php) which is a useful resource for current and novel drug targets of MTB.  相似文献   

10.
A new method to analyze the similarity between multiply aligned protein motifs (blocks) was developed. It identifies sets of consistently aligned blocks. These are found to be protein regions of similar function and structure that appear in different contexts. For example, the Rossmann fold ligand-binding region is found similar to TIM barrel and methylase regions, various protein families are predicted to have a TIM-barrel fold and the structural relation between the ClpP protease and crotonase folds is identified from their sequence. Besides identifying local structure features, sequence similarity across short sequence-regions (less than 20 amino acid regions) also predicts structure similarity of whole domains (folds) a few hundred amino acid residues long. Most of these relations could not be identified by other advanced sequence-to-sequence or sequence-to-multiple alignments comparisons. We describe the method (termed CYRCA), present examples of our findings, and discuss their implications.  相似文献   

11.
Comparative analysis of numerous protein structures that have become available in the past few years, combined with genome comparison, has yielded new insights into the evolution of enzymes and their functions. In addition to the well-known diversification of substrate specificities, enzymes with several widespread catalytic folds, particularly the TIM barrel, the RRM-like domain and the double-stranded beta-helix (cupin) domain, have been extensively explored in 'reaction space', resulting in the evolution of numerous, diverse catalytic activities supported by the same structural scaffold. Common protein folds differ widely in the diversity of catalyzed reactions. The biochemical plasticity of a fold seems to hinge on the presence of a generic, symmetrical substrate-binding pocket as opposed to highly specialized binding sites.  相似文献   

12.
The function of the mammalian TIMELESS protein (TIM) has been enigmatic. TIM is essential for early embryonic development, but little is known regarding its biochemical and cellular function. Although identified based on similarity to a Drosophila circadian clock factor, it also shares similarity with a second family of proteins that is more widely conserved throughout eukaryotes. Members of this second protein family in yeast (S.c. Tof1p, S.p. Swi1p) have been implicated in DNA synthesis, S-phase-dependent checkpoint activation and chromosome cohesion, three processes coordinated at the level of the replication fork complex. The present work demonstrates that mammalian TIM and its constitutive binding partner, Tipin (ortholog of S.c. Csm3p, S.p. Swi3p), are replisome-associated proteins. Both proteins associate with components of the endogenous replication fork complex, and are present at BrdU-positive DNA replication sites. Knock-down of TIM also compromises DNA replication efficiency. Further, the direct binding of the TIM-Tipin complex to the 34 kDa subunit of replication protein A provides a biochemical explanation for the potential coupling role of these proteins. Like TIM, Tipin is also involved in the molecular mechanism of UV-dependent checkpoint activation and cell growth arrest. Tipin additionally associates with peroxiredoxin2 and appears to be involved in checkpoint responses to H(2)O(2), a role recently described for yeast versions of TIM and Tipin. Together, this work establishes TIM and Tipin as functional orthologs of their replisome-associated yeast counterparts capable of coordinating replication with genotoxic stress responses, and distinguishes mammalian TIM from the circadian-specific paralogs from which it was originally identified.  相似文献   

13.
We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge.  相似文献   

14.
15.
榕小蜂的雌雄个体之间存在很大表型差异,以至于很难将雌雄个体彼此联系在一起.以对叶榕传粉榕小蜂作为材料,利用"加权基因共表达网络分析"软件(WGCNA),对榕小蜂的基因组和转录组进行分析,结果发现,5个基因共表达模块,分别用蓝色、蓝绿色、棕色、绿色和黄色标识,其中2个模块偏爱在雌蜂中表达,3个模块偏爱在蛹中表达.基因本体(GO)分析发现在蓝绿色和黄色表达模块中发现3个功能富集的基因集合.在蓝绿色基因表达模块中发现2个基因集合,分别与细胞周期和核苷酸结合活性有关;在黄色基因表达模块中发现1个基因结合,与细胞分化有关,尤其是与神经发育有关.  相似文献   

16.
For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically investigating the relationship between protein function and structure. We focus initially on enzymes functionally classified by the Enzyme Commission (EC) and relate these to by structurally classified domains the SCOP database. We find that the major SCOP fold classes have different propensities to carry out certain broad categories of functions. For instance, alpha/beta folds are disproportionately associated with enzymes, especially transferases and hydrolases, and all-alpha and small folds with non-enzymes, while alpha+beta folds have an equal tendency either way. These observations for the database overall are largely true for specific genomes. We focus, in particular, on yeast, analyzing it with many classifications in addition to SCOP and EC (i.e. COGs, CATH, MIPS), and find clear tendencies for fold-function association, across a broad spectrum of functions. Analysis with the COGs scheme also suggests that the functions of the most ancient proteins are more evenly distributed among different structural classes than those of more modern ones. For the database overall, we identify the most versatile functions, i.e. those that are associated with the most folds, and the most versatile folds, associated with the most functions. The two most versatile enzymatic functions (hydro-lyases and O-glycosyl glucosidases) are associated with seven folds each. The five most versatile folds (TIM-barrel, Rossmann, ferredoxin, alpha-beta hydrolase, and P-loop NTP hydrolase) are all mixed alpha-beta structures. They stand out as generic scaffolds, accommodating from six to as many as 16 functions (for the exceptional TIM-barrel). At the conclusion of our analysis we are able to construct a graph giving the chance that a functional annotation can be reliably transferred at different degrees of sequence and structural similarity. Supplemental information is available from http://bioinfo.mbb.yale.edu/genome/foldfunc++ +.  相似文献   

17.
Automated image analysis of protein localization in budding yeast   总被引:1,自引:0,他引:1  
MOTIVATION: The yeast Saccharomyces cerevisiae is the first eukaryotic organism to have its genome completely sequenced. Since then, several large-scale analyses of the yeast genome have provided extensive functional annotations of individual genes and proteins. One fundamental property of a protein is its subcellular localization, which provides critical information about how this protein works in a cell. An important project therefore was the creation of the yeast GFP fusion localization database by the University of California, San Francisco, USA (UCSF). This database provides localization data for 75% of the proteins believed to be encoded by the yeast genome. These proteins were classified into 22 distinct subcellular location categories by visual examination. Based on our past success at building automated systems to classify subcellular location patterns in mammalian cells, we sought to create a similar system for yeast. RESULTS: We developed computational methods to automatically analyze the images created by the UCSF yeast GFP fusion localization project. The system was trained to recognize the same location categories that were used in that study. We applied the system to 2640 images, and the system gave the same label as the previous assignments to 2139 images (81%). When only the highest confidence assignments were considered, 94.7% agreement was observed. Visual examination of the proteins for which the two approaches disagree suggests that at least some of the automated assignments may be more accurate. The automated method provides an objective, quantitative and repeatable assignment of protein locations that can be applied to new collections of yeast images (e.g. for different strains or the same strain under different conditions). It is also important to note that this performance could be achieved without requiring colocalization with any marker proteins. AVAILABILITY: The original images analyzed in this article are available at http://yeastgfp.ucsf.edu, and source code and results are available at http://murphylab.web.cmu.edu/software.  相似文献   

18.
Seventeen loci encode proteins of the preprotein and amino acid transporter family in Arabidopsis (Arabidopsis thaliana). Some of these genes have arisen from recent duplications and are not in annotated duplicated regions of the Arabidopsis genome. In comparison to a number of other eukaryotic organisms, this family of proteins has greatly expanded in plants, with 24 loci in rice (Oryza sativa). Most of the Arabidopsis and rice genes are orthologous, indicating expansion of this family before monocot and dicot divergence. In vitro protein uptake assays, in vivo green fluorescent protein tagging, and immunological analyses of selected proteins determined either mitochondrial or plastidic localization for 10 and six proteins, respectively. The protein encoded by At5g24650 is targeted to both mitochondria and chloroplasts and, to our knowledge, is the first membrane protein reported to be targeted to mitochondria and chloroplasts. Three genes encoded translocase of the inner mitochondrial membrane (TIM)17-like proteins, three TIM23-like proteins, and three outer envelope protein16-like proteins in Arabidopsis. The identity of Arabidopsis TIM22-like proteins is most likely a protein encoded by At3g10110/At1g18320, based on phylogenetic analysis, subcellular localization, and complementation of a yeast (Saccharomyces cerevisiae) mutant and coexpression analysis. The lack of a preprotein and amino acid transporter domain in some proteins, localization in mitochondria, plastids, or both, variation in gene structure, and the differences in expression profiles indicate that the function of this family has diverged in plants beyond roles in protein translocation.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号