首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Physicochemical properties are potentially useful in predicting functional differences between aligned protein subfamilies. We present a method that considers physicochemical properties from ancestral sequences predicted to have given rise to the subfamilies of interest by gene duplication. Comparison between two map kinases subfamilies, p38 and ERK, revealed a region that had an excess of change in properties after gene duplication followed by conservation within the two subfamilies. This region corresponded to that experimentally defined as important for substrate and pathway specificity. The derived scores for the region of interest were found to differ significantly in their distribution compared to the rest of the protein when the Kolmogorov-Smirnov test was applied (p = 0.005). Thus, the incorporation of ancestral physicochemical properties is useful in predicting functional differences between protein subfamilies. In addition, the method was applied to the MKK and MAPK components of the p38 and JNK pathways. These proteins showed a similar pattern in their evolution and regions predicted to confer functional differences are discussed.  相似文献   

2.
The rapid increase in the amount of protein sequence data has created a need for automated identification of sites that determine functional specificity among related subfamilies of proteins. A significant fraction of subfamily specific sites are only marginally conserved, which makes it extremely challenging to detect those amino acid changes that lead to functional diversification. To address this critical problem we developed a method named SPEER (specificity prediction using amino acids' properties, entropy and evolution rate) to distinguish specificity determining sites from others. SPEER encodes the conservation patterns of amino acid types using their physico-chemical properties and the heterogeneity of evolutionary changes between and within the subfamilies. To test the method, we compiled a test set containing 13 protein families with known specificity determining sites. Extensive benchmarking by comparing the performance of SPEER with other specificity site prediction algorithms has shown that it performs better in predicting several categories of subfamily specific sites.  相似文献   

3.
Automatic methods for predicting functionally important residues   总被引:9,自引:0,他引:9  
Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional significance for the whole family, but at the same time which exhibit the specificity of each subfamily ("Tree-determinant residues"). However, there are still many unsolved questions like the best division of a protein family into subfamilies, or the accurate detection of sequence variation patterns characteristic of different subfamilies. Here we present a systematic study in a significant number of protein families, testing the statistical meaning of the Tree-determinant residues predicted by three different methods that represent the range of available approaches. The first method takes as a starting point a phylogenetic representation of a protein family and, following the principle of Relative Entropy from Information Theory, automatically searches for the optimal division of the family into subfamilies. The second method looks for positions whose mutational behavior is reminiscent of the mutational behavior of the full-length proteins, by directly comparing the corresponding distance matrices. The third method is an automation of the analysis of distribution of sequences and amino acid positions in the corresponding multidimensional spaces using a vector-based principal component analysis. These three methods have been tested on two non-redundant lists of protein families: one composed by proteins that bind a variety of ligand groups, and the other composed by proteins with annotated functionally relevant sites. In most cases, the residues predicted by the three methods show a clear tendency to be close to bound ligands of biological relevance and to those amino acids described as participants in key aspects of protein function. These three automatic methods provide a wide range of possibilities for biologists to analyze their families of interest, in a similar way to the one presented here for the family of proteins related with ras-p21.  相似文献   

4.
MOTIVATION: Identification of residues that account for protein function specificity is crucial, not only for understanding the nature of functional specificity, but also for protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. Available algorithms generally use multiple sequence alignments to identify residue positions conserved within subfamilies but divergent in between. However, many biological examples show a much subtler picture than simple intra-group conservation versus inter-group divergence. RESULTS: We present multi-RELIEF, a novel approach for identifying specificity residues that is based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected 'local' functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we show that (a) multi-RELIEF identifies specificity residues for the seven test sets used, (b) incorporating structural information improves prediction for specificity of interaction with small molecules and (c) comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. AVAILABILITY: A web-server implementation of multi-RELIEF is available at www.ibi.vu.nl/programs/multirelief. Matlab source code of the algorithm and data sets are available on request for academic use.  相似文献   

5.
6.
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem’s complexity. Hence, this work’s purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.  相似文献   

7.
Zhang L  Ma H 《The New phytologist》2012,195(1):248-263
? Plants and animals possess very different developmental processes, yet share conserved epigenetic regulatory mechanisms, such as histone modifications. One of the most important forms of histone modification is methylation on lysine residues of the tails, carried out by members of the SET protein family, which are widespread in eukaryotes. ? We analyzed molecular evolution by comparative genomics and phylogenetics of the SET genes from plant and animal genomes, grouping SET genes into several subfamilies and uncovering numerous gene duplications, particularly in the Suv, Ash, Trx and E(z) subfamilies. ? Domain organizations differ between different subfamilies and between plant and animal SET proteins in some subfamilies, and support the grouping of SET genes into seven main subfamilies, suggesting that SET proteins have acquired distinctive regulatory interactions during evolution. We detected evidence for independent evolution of domain organization in different lineages, including recruitment of new domains following some duplications. ? More recent duplications in both vertebrates and land plants are probably the result of whole-genome or segmental duplications. The evolution of the SET gene family shows that gene duplications caused by segmental duplications and other mechanisms have probably contributed to the complexity of epigenetic regulation, providing insights into the evolution of the regulation of chromatin structure.  相似文献   

8.
9.

Background  

The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level.  相似文献   

10.
Gene duplication is a common evolutionary process that leads to the expansion and functional diversification of protein subfamilies. The evolutionary events that cause paralogous proteins to bind different protein ligands (functionally diverged interfaces) are investigated and compared to paralogous proteins that bind the same protein ligand (functionally preserved interfaces). We find that functionally diverged interfaces possess more subfamily-specific residues than functionally preserved interfaces. These subfamily-specific residues are usually partially buried at the interface rim and achieve specific binding through optimized hydrogen bond geometries. In addition to optimized hydrogen bond geometries, side-chain modeling experiments suggest that steric effects are also important for binding specificity. Residues that are completely buried at the interface hub are also less conserved in functionally diverged interfaces than in functionally preserved interfaces. Consistent with this finding, hub residues contribute less to free energy of binding in functionally diverged interfaces than in functionally preserved interfaces. Therefore, we propose that protein binding is a delicate balance between binding affinity that primarily occurs at the interface hub and binding specificity that primarily occurs at the interface rim.  相似文献   

11.
Y Wang  X Gu 《Genetics》2001,158(3):1311-1320
In this article, we explore the pattern of type I functional divergence (i.e., altered functional constraints or site-specific rate difference) in the caspase gene family that is important for apoptosis (programmed cell death) and cytokine maturation. By taking advantage of substantial experimental data from caspases, the functional/structural basis of our posterior predictions from sequence analysis was extensively studied. Our results are as follows: (1) Phylogenetic analysis shows that the evolution of major caspase-mediated pathways has been facilitated by gene duplications, (2) type I functional divergence (altered functional constraints) is statistically significant between two major subfamilies, CED-3 and ICE, (3) 4 of 21 predicted amino acid residues (for site-specific rate difference between CED-3 and ICE) have been verified by experimental evidence, and (4) we found that some CED-3 caspases may inherit more ancestral functions, whereas other members may employ some recently derived functions. Our approach can be cost effective in functional genomics to make statistically sound predictions from amino acid sequences.  相似文献   

12.
Amino acid residues associated with functional specificity of cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAPKs), glycogen synthase kinases (GSKs), and CDK-like kinases (CLKs), which are collectively termed the CMGC group, were identified by categorizing and quantifying the selective constraints acting upon these proteins during evolution. Many constraints specific to CMGC kinases correspond to residues between the N-terminal end of the activation segment and a CMGC-conserved insert segment associated with coprotein binding. The strongest such constraint is imposed on a "CMGC-arginine" near the substrate phosphorylation site with a side chain that plays a role both in substrate recognition and in kinase activation. Two nearby buried waters, which are also present in non-CMGC kinases, typically position the main chain of this arginine relative to the catalytic loop. These and other CMGC-specific features suggest a structural linkage between coprotein binding, substrate recognition, and kinase activation. Constraints specific to individual subfamilies point to mechanisms for CMGC kinase specialization. Within casein kinase 2alpha (CK2alpha), for example, the binding of one of the buried waters appears prohibited by the side chain of a leucine that is highly conserved within CK2alpha and that, along with substitution of lysine for the CMGC-arginine, may contribute to the broad substrate specificity of CK2alpha by relaxing characteristically conserved, precise interactions near the active site. This leucine is replaced by a conserved isoleucine or valine in other CMGC kinases, thereby illustrating the potential functional significance of subtle amino acid substitutions. Analysis of other CMGC kinases similarly suggests candidate family-specific residues for experimental follow-up.  相似文献   

13.
Haloalkane dehalogenases (HLDs) are enzymes that catalyze the cleavage of carbon-halogen bonds by a hydrolytic mechanism. Although comparative biochemical analyses have been published, no classification system has been proposed for HLDs, to date, that reconciles their phylogenetic and functional relationships. In the study presented here, we have analyzed all sequences and structures of genuine HLDs and their homologs detectable by database searches. Phylogenetic analyses revealed that the HLD family can be divided into three subfamilies denoted HLD-I, HLD-II, and HLD-III, of which HLD-I and HLD-III are predicted to be sister-groups. A mismatch between the HLD protein tree and the tree of species, as well as the presence of more than one HLD gene in a few genomes, suggest that horizontal gene transfers, and perhaps also multiple gene duplications and losses have been involved in the evolution of this family. Most of the biochemically characterized HLDs are found in the HLD-II subfamily. The dehalogenating activity of two members of the newly identified HLD-III subfamily has only recently been confirmed, in a study motivated by this phylogenetic analysis. A novel type of the catalytic pentad (Asp-His-Asp+Asn-Trp) was predicted for members of the HLD-III subfamily. Calculation of the evolutionary rates and lineage-specific innovations revealed a common conserved core as well as a set of residues that characterizes each HLD subfamily. The N-terminal part of the cap domain is one of the most variable regions within the whole family as well as within individual subfamilies, and serves as a preferential site for the location of relatively long insertions. The highest variability of discrete sites was observed among residues that are structural components of the access channels. Mutations at these sites modify the anatomy of the channels, which are important for the exchange of ligands between the buried active site and the bulk solvent, thus creating a structural basis for the molecular evolution of new substrate specificities. Our analysis sheds light on the evolutionary history of HLDs and provides a structural framework for designing enzymes with new specificities.  相似文献   

14.
The organelle paralogy hypothesis is one model for the acquisition of nonendosymbiotic organelles, generated from molecular evolutionary analyses of proteins encoding specificity in the membrane traffic system. GTPase activating proteins (GAPs) for the ADP‐ribosylation factor (Arfs) GTPases are additional regulators of the kinetics and fidelity of membrane traffic. Here we describe molecular evolutionary analyses of the Arf GAP protein family. Of the 10 subfamilies previously defined in humans, we find that 5 were likely present in the last eukaryotic common ancestor. Of the 3 most recently derived subfamilies, 1 was likely present in the ancestor of opisthokonts (animals and fungi) and apusomonads (flagellates classified as the sister lineage to opisthokonts), while 2 arose in the holozoan lineage. We also propose to have identified a novel ancient subfamily (ArfGAPC2), present in diverse eukaryotes but which is lost frequently, including in the opisthokonts. Surprisingly few ancient domains accompanying the ArfGAP domain were identified, in marked contrast to the extensively decorated human Arf GAPs. Phylogenetic analyses of the subfamilies reveal patterns of single and multiple gene duplications specific to the Holozoa, to some degree mirroring evolution of Arf GAP targets, the Arfs. Conservation, and lack thereof, of various residues in the ArfGAP structure provide contextualization of previously identified functional amino acids and their application to Arf GAP biology in general. Overall, our results yield insights into current Arf GAP biology, reveal complexity in the ancient eukaryotic ancestor and integrate the Arf GAP family into a proposed mechanism for the evolution of nonendosymbiotic organelles.  相似文献   

15.
The integrases are a diverse family of tyrosine recombinases which rearrange DNA duplexes by means of conservative site-specific recombination reactions. Members of this family, of which the well-studied lambda Int protein is the prototype, were previously found to share four strongly conserved residues, including an active site tyrosine directly involved in transesterification. However, few additional sequence similarities were found in the original group of 27 proteins. We have now identified a total of 81 members of the integrase family deposited in the databases. Alignment and comparisons of these sequences combined with an evolutionary analysis aided in identifying broader sequence similarities and clarifying the possible functions of these conserved residues. This analysis showed that members of the family aggregate into subfamilies which are consistent with their biological roles; these subfamilies have significant levels of sequence similarity beyond the four residues previously identified. It was also possible to map the location of conserved residues onto the available crystal structures; most of the conserved residues cluster in the predicted active site cleft. In addition, these results offer clues into an apparent discrepancy between the mechanisms of different subfamilies of integrases.  相似文献   

16.
Kondo R  Kaneko S  Sun H  Sakaizumi M  Chigusa SI 《Gene》2002,282(1-2):113-120
Vertebrate olfactory receptors (OR) exists as the largest multigene family, scattered throughout the genome in clusters. Studies have shown that different animals possess remarkably diverse set of OR genes to recognize diverse odor molecules. In order to examine the evolutionary process of OR diversification, we examined three OR gene subfamilies from Japanese medaka fish (seven lines sampled from four populations). For each subfamily, the sequences of ancestral genes were inferred based on distance method. Examination of d(N)/d(S) ratios for each branch of phylogenetic trees suggested that purifying selection is the major force of evolution in medaka OR genes. However, for the mfOR1 and mfOR2 paralogous gene pairs, a nonrandom distribution of fixed amino acid changes and the d(N)>d(S) in a branch suggested that diversifying selection occurred after gene duplication. The fixed amino acid changes were observed in the third, fifth and sixth transmembrane domains, which has been predicted to serve as a ligand-binding pocket in a structural model. Compatibility test suggested that interlocus recombinations involving the fourth transmembrane domain occurred between the mfOR1 and mfOR2 gene pairs. The pattern of nucleotide substitutions in other OR genes agrees with the hypothesis that a limited number of amino acid residues are involved in odorant binding. Such comparative analyses of paralogous OR genes should provide bases for understanding the evolution, the structure, and the functional specificity of OR genes.  相似文献   

17.
NCS1 proteins are H+ or Na+ symporters responsible for the uptake of purines, pyrimidines or related metabolites in bacteria, fungi and some plants. Fungal NCS1 are classified into two evolutionary and structurally distinct subfamilies, known as Fur‐ and Fcy‐like transporters. These subfamilies have expanded and functionally diversified by gene duplications. The Fur subfamily of the model fungus Aspergillus nidulans includes both major and cryptic transporters specific for uracil, 5‐fluorouracil, allantoin or/and uric acid. Here we functionally analyse all four A. nidulans Fcy transporters (FcyA, FcyC, FcyD and FcyE) with previously unknown function. Our analysis shows that FcyD is moderate‐affinity, low‐capacity, highly specific adenine transporter, whereas FcyE contributes to 8‐azaguanine uptake. Mutational analysis of FcyD, supported by homology modelling and substrate docking, shows that two variably conserved residues (Leu356 and Ser359) in transmembrane segment 8 (TMS8) are critical for transport kinetics and specificity differences among Fcy transporters, while two conserved residues (Phe167 and Ser171) in TMS3 are also important for function. Importantly, mutation S359N converts FcyD to a promiscuous nucleobase transporter capable of recognizing adenine, xanthine and several nucleobase analogues. Our results reveal the importance of specific residues in the functional evolution of NCS1 transporters.  相似文献   

18.
Structures of homologous proteins are usually conserved during evolution, as are critical active site residues. This is the case for actin and tubulin, the two most important cytoskeleton proteins in eukaryotes. Actins and their related proteins (Arps) constitute a large superfamily whereas the tubulin family has fewer members. Unaligned sequences of these two protein families were analysed by searching for short groups of family-specific amino acid residues, that we call motifs, and by counting the number of residues from one motif to the next. For each sequence, the set of motif-to-motif residue counts forms a subfamily-specific pattern (landmark pattern) allowing actin and tubulin superfamily members to be identified and sorted into subfamilies. The differences between patterns of individual subfamilies are due to inserts and deletions (indels). Inserts appear to have arisen at an early stage in eukaryote evolution as suggested by the small but consistent kingdom-dependent differences found within many Arp subfamilies and in γ-tubulins. Inserts tend to be in surface loops where they can influence subfamily-specific function without disturbing the core structure of the protein. The relatively few indels found for tubulins have similar positions to established results, whereas we find many previously unreported indel positions and lengths for the metazoan Arps.  相似文献   

19.
The Ral effector protein RLIP76 (also called RIP/RalBP1) binds to Ral.GTP via a region that shares no sequence homology with the Ras-binding domains of the Ser/Thr kinase c-Raf-1 and the Ral-specific guanine nucleotide exchange factors. Whereas the Ras-binding domains have a similar ubiquitin-like structure, the Ral-binding domain of RLIP was predicted to comprise a coiled-coil region. In order to obtain more information about the specificity and the structural mode of the interaction between Ral and RLIP, we have performed a sequence space and a mutational analysis. The sequence space analysis of a comprehensive nonredundant assembly of Ras-like proteins strongly indicated that positions 36 and 37 in the core of the effector region are tree-determinant positions for all subfamilies of Ras-like proteins and dictate the specificity of the interaction of these GTPases with their effector proteins. Indeed, we could convert the specific interaction with Ras effectors and RLIP by mutating these residues in Ras and Ral. We therefore conclude that positions 36 and 37 are critical for the discrimination between Ras and Ral effectors and that, despite the absence of sequence homology between the Ral-binding and the Ras-binding domains, their mode of interaction is most probably similar.  相似文献   

20.
The rapid rise in DNA sequencing has led to an expansion in the number of glycoside hydrolase (GH) families. The GH43 family currently contains α-l-arabinofuranosidase, β-d-xylosidase, α-l-arabinanase, and β-d-galactosidase enzymes for the debranching and degradation of hemicellulose and pectin polymers. Many studies have revealed finer details about members of GH43 that necessitate the division of GH43 into subfamilies, as was done previously for the GH5 and GH13 families. The work presented here is a robust subfamily classification that assigns over 91% of all complete GH43 domains into 37 subfamilies that correlate with conserved sequence residues and results of biochemical assays and structural studies. Furthermore, cooccurrence analysis of these subfamilies and other functional modules revealed strong associations between some GH43 subfamilies and CBM6 and CBM13 domains. Cooccurrence analysis also revealed the presence of proteins containing up to three GH43 domains and belonging to different subfamilies, suggesting significant functional differences for each subfamily. Overall, the subfamily analysis suggests that the GH43 enzymes probably display a hitherto underestimated variety of subtle specificity features that are not apparent when the enzymes are assayed with simple synthetic substrates, such as pNP-glycosides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号