首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite decades of research and the availability of the full genomic sequence of the baker’s yeast Saccharomyces cerevisiae, still a large fraction of its genome is not functionally annotated. This hinders our ability to fully understand cellular activity and suggests that many additional processes await discovery. The recent years have shown an explosion of high-quality genomic and structural data from multiple organisms, ranging from bacteria to mammals. New computational methods now allow us to integrate these data and extract meaningful insights into the functional identity of uncharacterized proteins in yeast. Here, we created a database of sensitive sequence similarity predictions for all yeast proteins. We use this information to identify candidate enzymes for known biochemical reactions whose enzymes are unidentified, and show how this provides a powerful basis for experimental validation. Using one pathway as a test case we pair a new function for the previously uncharacterized enzyme Yhr202w, as an extra-cellular AMP hydrolase in the NAD degradation pathway. Yhr202w, which we now term Smn1 for Scavenger MonoNucleotidase 1, is a highly conserved protein that is similar to the human protein E5NT/CD73, which is associated with multiple cancers. Hence, our new methodology provides a paradigm, that can be adopted to other organisms, for uncovering new enzymatic functions of uncharacterized proteins.  相似文献   

2.
The tricarboxylic acid (TCA) cycle is an energy-producing pathway for aerobic organisms. However, it is widely accepted that the phylogenetic origin of the TCA cycle is the reductive TCA cycle, which is a non-Calvin-type carbon-dioxide-fixing pathway. Most of the enzymes responsible for the oxidative and reductive TCA cycles are common to the two pathways, the difference being the direction in which the reactions operate. Because the reductive TCA cycle operates in an energetically unfavorable direction, some specific mechanisms are required for the reductive TCA-cycle-utilizing organisms. Recently, the molecular mechanism for the “citrate cleavage reaction” and the “reductive carboxylating reaction from 2-oxoglutarate to isocitrate” in Hydrogenobacter thermophilus have been demonstrated. Both of these reactions comprise two distinct consecutive reactions, each catalyzed by two novel enzymes. Sequence analyses of the newly discovered enzymes revealed phylogenetic and functional relationships between other TCA-cycle-related enzymes. The occurrence of novel enzymes involved in the citrate-cleaving reaction seems to be limited to the family Aquificaceae. In contrast, the key enzyme in the reductive carboxylation of 2-oxoglutarate appears to be more widely distributed in extant organisms. The four newly discovered enzymes have a number of potential biotechnological applications.  相似文献   

3.
Han LY  Cai CZ  Ji ZL  Cao ZW  Cui J  Chen YZ 《Nucleic acids research》2004,32(21):6437-6444
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

4.
Discovering structural correlations in alpha-helices.   总被引:5,自引:2,他引:3       下载免费PDF全文
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.  相似文献   

5.
6.
Parallel functional modules are separate sets of proteins in an organism that catalyze the same or similar biochemical reactions but act on different substrates or use different cofactors. They originate by gene duplication during evolution. Parallel functional modules provide versatility and complexity to organisms, and increase cellular flexibility and robustness. We have developed a four-step approach for genome-wide discovery of parallel modules from protein functional linkages. From ten genomes, we identified 37 cellular systems that consist of parallel functional modules. This approach recovers known parallel complexes and pathways, and discovers new ones that conventional homology-based methods did not previously reveal, as illustrated by examples of peptide transporters in Escherichia coli and nitrogenases in Rhodopseudomonas palustris. The approach untangles intertwined functional linkages between parallel functional modules and expands our ability to decode protein functions from genome sequences.  相似文献   

7.
8.
Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes.  相似文献   

9.
Kinetic aspects of reactions in homogeneous multienzyme systems under nonsteady state conditions were investigated. An analysis of formal-kinetic relationships, describing the time course of system was conducted with a bienzyme system. Presteady state kinetics of processes in lineal multienzyme systems was investigated. Relax-kinetics methods were applied for the analysis of processes in lineal sequences. Methods of determination of number of stages initial substrate transformations and of number of enzymes were developed as well as methods for the analysis of sequences of intermediates in reaction pathway. Methods of determination of Vmax and Kmax for each individual enzyme are considered.  相似文献   

10.
We provide a comprehensive analysis of the current enzymes with alpha-amylase activity (AAMYs) that belong to family 13 glycoside hydrolase (GH-13; 144 Archaea, Bacteria, and Eukaryota sequences from 87 different species). This study aims to further knowledge of the evolutionary molecular relationships among the sequences of their A and B domains with special emphasis on the correlation between what is observed in the structures and protein evolution. Multialignments for the A domain distinguish two clusters for sequences from Archaea organisms, eight for sequences from Bacteria organisms, and three for sequences from Eukaryota organisms. The clusters for Bacteria do not follow any strict taxonomic pathway; in fact, they are rather scattered. When we compared the A domains of sequences belonging to different kingdoms, we found that various pairs of clusters were significantly similar. Using either sequence similarity with crystallized structures or secondary-structure prediction methods, we identified in all AAMYs the eight putative beta-strands that constitute the beta-sheet in the TIM barrel of the A domain and studied the packing in its interior. We also discovered a "hidden homology" in the TIM barrel, an invariant Gly located upstream in the sequence before the conserved Asp in beta-strand 3. This Gly precedes an alpha-helix and is actively involved in capping its N-terminal end with a capping box. In all cases, a Schellman motif caps the C-terminal end of this helix.  相似文献   

11.
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation.  相似文献   

12.
13.
14.
Members of the transketolase group of thiamine-diphosphate-dependent enzymes from 17 different organisms including mammals, yeast, bacteria, and plants have been used for phylogenetic reconstruction. Alignment of the amino acid and DNA sequences for 21 transketolase enzymes and one putative transketolase reveals a number of highly conserved regions and invariant residues that are of predicted importance for enzyme activity, based on the crystal structure of yeast transketolase. One particular sequence of 36 residues has some similarities to the nucleotide-binding motif and we designate it as the transketolase motif. We report further evidence that the recP protein from Streptococcus pneumoniae might be a transketolase and we list a number of invariant residues which might be involved in substrate binding. Phylogenies derived from the nucleotide and the amino acid sequences by various methods show a conventional clustering for mammalian, plant, and gram-negative bacterial transketolases. The branching order of the gram-positive bacteria could not be inferred reliably. The formaldehyde transketolase (sometimes known as dihydroxyacetone synthase) of the yeast Hansenula polymorpha appears to be orthologous to the mammalian enzymes but paralogous to the other yeast transketolases. The occurrence of more than one transketolase gene in some organisms is consistent with several gene duplications. The high degree of similarity in functionally important residues and the fact that the same kinetic mechanism is applicable to all characterized transketolase enzymes is consistent with the proposition that they are all derived from one common ancestral gene. Transketolase appears to be an ancient enzyme that has evolved slowly and might serve as a model for a molecular clock, at least within the mammalian clade. Received: 13 September 1995 / Accepted: 14 November 1996  相似文献   

15.
16.
Enzymes catalyze multistep chemical reactions and achieve phenomenal rate accelerations by matching protein and substrate chemical groups in the transition state. Inhibitors that take advantage of these chemical interactions are among the most potent and effective drugs known. Recently, three new enzyme targets have been validated by FDA approval of new enzyme inhibitor drugs. These include mitogen-activated protein kinase, renin, and dipeptidyl peptidase IV. The drugs against these enzymes engage important enzyme functional groups, such as the active site serine in dipeptidyl peptidase IV. Clinical and pre-clinical discovery programs also demonstrate the same theme, as evidenced by pM and fM transition state inhibitors of purine nucleoside phosphorylase, methylthioadenosine phosphorylase, and 5-methylthioadenosine/S-adenosylhomocysteine nucleosidase, and covalent substrate trapping in leu-tRNA synthetase. The catalytic chemistry of enzymes is the key to designing potent inhibitors and makes them a special class of drug target.  相似文献   

17.
18.
Kim Y  Subramaniam S 《Proteins》2006,62(4):1115-1124
Phylogenetic profiles encode patterns of presence or absence of genes across genomes, and these profiles can be used to assign functional relationships to nonhomologous pairs of proteins (Pellegrini et al., Proc Natl Acad Sci USA 1999;96:4284-4288). Although it is well known that many proteins were created from combinations of domains, most of the existing implementations of phylogenetic profiles do not consider this fact. Here, we introduce an extension that considers the multidomain nature of proteins and test the method against the known interaction data sets. Whereas earlier implementations associated one entire sequence with one protein phylogenetic profile (Single-Profile), our method instead breaks the sequence into a set of segments of predetermined size and constructs a separate profile for each segment (Multiple-Profile). The results show that the Multiple-Profile method performs as well as the Single-Profile method. However, the two methods share, surprisingly, a small fraction of their predictions, indicating that the Multiple-Profile method can detect known interactions missed by the Single-Profile method. Thus, the Multiple-Profile method can be used with other methods to determine functional relationships on a genome scale with wider coverage.  相似文献   

19.
Light-dark modulation of chloroplast enzymes is achieved by covalent redox-modification of protein thiols/disulfides mediated by ferredoxin/thioredoxin reductase and thioredoxins. Light-dependent electron flow leads to reduction of particular chloroplast proteins, while photosynthetically evolved oxygen effects their continuous reoxidation. The oxidized and the reduced forms, respectively, differ greatly in their catalytic properties. The rate of reduction of each target enzyme is specifically fine-controlled by metabolites. By this combined mode of producing a defined ratio of active to inactive enzyme during steady-state each of the enzymes is adjusted to the immediate requirements of the chloroplast. Upon changes of the metabolic situation the system can respond in a flexible manner as is known from comparable regulatory mechanisms such as protein phosphorylation/dephosphorylation in animals and bacteria. From sequence comparisons between various light-dark modulated chloroplast enzymes and their non-regulated counterparts from other organelles or non-photosynthetic organisms, the presence of extra-peptides in the otherwise highly homologous sequences has been estabüshed for the chloroplast enzymes. However, no general pattern in the primary structure of those extra-sequences can be recognized. By the acquisition of “regulatory peptides” during evolution a new type of metabolic control was created in a compartment uniquely occurring in organisms performing oxygenic photosynthesis.  相似文献   

20.
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号