首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 14 毫秒
1.
Amino acid residues that are involved in functional interactions in proteins have strong evolutionary pressure to remain unchanged and consequently their substitution patterns are different from those that are noninteracting. To characterize and quantify the differences between amino acid substitution patterns due to structural restraints and those under functional restraints, we have made a comparative analysis of families of homologous proteins. Residues classified as having the same amino acid type, secondary structure, accessibility, and side-chain hydrogen bonds are shown to be better conserved if they are close to the active site. We have focused on enzyme families for this analysis since they have functional sites that are easily defined by their catalytic residues. We have derived new sets of environment-specific substitution tables, which we term function-dependent environment-specific substitution tables, where amino acid residues are classified according to their distance from the functional sites. The residues that are within a distance of 9 A from the active site have distinct amino acid substitution patterns when compared to the other sites. The function-dependent environment-specific substitution tables have been tested using the sequence-structure homology recognition program FUGUE and the results compared with the recognition performance obtained using the standard environment-specific substitution tables. Significant improvements are obtained in both recognition performance and alignment accuracy using the function-dependent environment-specific substitution tables (P-value = 0.02, according to the Wilcoxon signed rank test for alignment accuracy). The alignments near the active site are greatly improved with pronounced improvements at lower percentage identities (less than 30%).  相似文献   

2.
Functional sites determine the activity and interactions of proteins and as such constitute the targets of most drugs. However, the exponential growth of sequence and structure data far exceeds the ability of experimental techniques to identify their locations and key amino acids. To fill this gap we developed a computational Evolutionary Trace method that ranks the evolutionary importance of amino acids in protein sequences. Studies show that the best-ranked residues form fewer and larger structural clusters than expected by chance and overlap with functional sites, but until now the significance of this overlap has remained qualitative. Here, we use 86 diverse protein structures, including 20 determined by the structural genomics initiative, to show that this overlap is a recurrent and statistically significant feature. An automated ET correctly identifies seven of ten functional sites by the least favorable statistical measure, and nine of ten by the most favorable one. These results quantitatively demonstrate that a large fraction of functional sites in the proteome may be accurately identified from sequence and structure. This should help focus structure-function studies, rational drug design, protein engineering, and functional annotation to the relevant regions of a protein.  相似文献   

3.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

4.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

5.
Cellular functions are regulated by molecules that interact with proteins and alter their activities. To enable such control, protein activity, and therefore protein conformational distributions, must be susceptible to alteration by molecular interactions at functional sites. Here we investigate whether interactions at functional sites cause a large change in the protein conformational distribution. We apply a computational method, called dynamics perturbation analysis (DPA), to identify sites at which interactions have a large allosteric potential D(x), which is the Kullback-Leibler divergence between protein conformational distributions with and without an interaction. In DPA, a protein is decorated with surface points that interact with neighboring protein atoms, and D(x) is calculated for each of the points in a coarse-grained model of protein vibrations. We use DPA to examine hundreds of protein structures from a standard small-molecule docking test set, and find that ligand-binding sites have elevated values of D(x): for 95% of proteins, the probability of randomly obtaining values as high as those in the binding site is 10(-3) or smaller. We then use DPA to develop a computational method to predict functional sites in proteins, and find that the method accurately predicts ligand-binding-site residues for proteins in the test set. The performance of this method compares favorably with that of a cleft analysis method. The results confirm that interactions at small-molecule binding sites cause a large change in the protein conformational distribution, and motivate using DPA for large-scale prediction of functional sites in proteins. They also suggest that natural selection favors proteins whose activities are capable of being regulated by molecular interactions.  相似文献   

6.
Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR-Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well-defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR-Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source .  相似文献   

7.
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.  相似文献   

8.
Protein phosphorylation is a ubiquitous protein post-translational modification,which plays an important role in cellular signaling systems underlying various physiological and pathological processes.Current in silico methods mainly focused on the prediction of phosphorylation sites,but rare methods considered whether a phosphorylation site is functional or not.Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects,the prediction of functional phosphorylation sites is quite necessary for this research area.Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution.Thus,in our method,we developed a web sewer by integrating existing phosphorylation site prediction methods,as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites.Using our method,we predicted the most likely functional sites of the human,rat and mouse proteomes and built a database for the predicted sites.By the analysis of overall prediction results,we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways.By the analysis of protein-specific prediction results,we demonstrated the usefulness of our method for individual protein studies.Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.  相似文献   

9.
Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.  相似文献   

10.
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active‐site structural similarities has not yet been undertaken. Pyridoxal‐5′‐phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP‐dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three‐dimensional‐fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. Proteins 2014; 82:2597–2608. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
The concept of protein function is widely used and manipulated by biologists. However, the means of the concept and its understanding may vary depending on the level of functionality one considers (molecular, cellular, physiological, etc.). Genomic studies and new high-throughput methods of the post-genomic era provide the opportunity to shed a new light on the concept of protein function: protein-protein interactions can now be considered as pieces of incomplete but still gigantic networks and the analysis of these networks will permit the emergence of a more integrated view of protein function. In this context, we propose a new functional classification method, which, unlike usual methods based on sequence homology, allows the definition of functional classes of protein based on the identity of their interacting partners. An example of such classification will be shown and discussed for a subset of Saccharomyces cerevisiae proteins, accounting for 7% of the yeast proteome. The genome of the budding yeast contains 50% of protein-coding genes that are paralogs, including 457 pairs of duplicated genes coming probably from an ancient whole genome duplication. We will comment on the functional classification of the duplicated genes when using our method and discuss the contribution of these results to the understanding of function evolution for the duplicated genes.  相似文献   

12.
Seven species of the family Cercopithecidae have been studied using highresolution banding techniques. Comparative studies allowed us to identify the main chromosomal reorganizations in this group, as well as to establish the phylogenetic relationships between species. Some of the regions involved in evolutionary rearrangements correspond to human fragile sites and/or chromosomal rearrangements related to neoplasia.  相似文献   

13.
To correlate structural features with glucoamylase properties, a structure-based multisequence alignment was constructed using information from catalytic and starch-binding domain models. The catalytic domain is composed of three hydrophobic folding units, the most labile and least hydrophobic of them being missing in the most stable glucoamylase. The role of O-glycosylation in stabilizing the most hydrophobic folding unit, the only one where thermostabilizing mutations with unchanged activity have been made, is described. Differences in both length and composition of interhelical loops are correlated with stability and selectivity characteristics. Two new glucoamylase subfamilies are defined by using homology criteria. Protein parsimony analysis suggests an ancient bacterial origin for the glucoamylase gene. Increases in length of the belt surrounding the active site, degree of O-glycosylation, and length of the linker probably correspond to evolutionary steps that increase stability and secretion levels of Aspergillus-related glucoamylases. Proteins 29:334–347, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

14.
Our aim is to explore the similarities in structural fluctuations of homologous kinases. Gaussian Network Model based Normal Mode Analysis was performed on 73 active conformation structures in Ser/Thr/Tyr kinase superfamily. Categories of kinases with progressive evolutionary divergence, viz. (i) Same kinase with many crystal structures, (ii) Within‐Subfamily, (iii) Within‐Family, (iv) Within‐Group, and (v) Across‐Group, were analyzed. We identified a flexibility signature conserved in all kinases involving residues in and around the catalytic loop with consistent low‐magnitude fluctuations. However, the overall structural fluctuation profiles are conserved better in closely related kinases (Within‐Subfamily and Within‐family) than in distant ones (Within‐Group and Across‐Group). A substantial 65.4% of variation in flexibility was not accounted by variation in sequences or structures. Interestingly, we identified substructural residue‐wise fluctuation patterns characteristic of kinases of different categories. Specifically, we recognized statistically significant fluctuations unique to families of protein kinase A, cyclin‐dependent kinases, and nonreceptor tyrosine kinases. These fluctuation signatures localized to sites known to participate in protein‐protein interactions typical of these kinase families. We report for the first time that residues characterized by fluctuations unique to the group/family are involved in interactions specific to the group/family. As highlighted for Src family, local regions with differential fluctuations are proposed as attractive targets for drug design. Overall, our study underscores the importance of consideration of fluctuations, over and above sequence and structural features, in understanding the roles of sites characteristic of kinases. Proteins 2016; 84:957–978. © 2016 Wiley Periodicals, Inc.  相似文献   

15.
Rates of genome evolution and branching order from whole genome analysis   总被引:2,自引:0,他引:2  
Accurate estimation of any phylogeny is important as a framework for evolutionary analysis of form and function at all levels of organization from sequence to whole organism. Using alignments of nonrepetitive components of opossum, human, mouse, rat, and dog genomes we evaluated two alternative tree topologies for eutherian evolution. We show with very high confidence that there is a basal split between rodents (as represented by the mouse and rat) and a branch joining primates (as represented by humans) and carnivores (as represented by dogs), consistent with some but not the most widely accepted mammalian phylogenies. The result was robust to substitution model choice with equivalent inference returned from a spectrum of models ranging from a general time reversible model, a model that treated nucleotides as either purines and pyrimidines, and variants of these that incorporated rate heterogeneity among sites. By determining this particular branching order we are able to show that the rate of molecular evolution is almost identical in rodent and carnivore lineages and that sequences evolve approximately 11%-14% faster in these lineages than in the primate lineage. In addition by applying the chicken as outgroup the analyses suggested that the rate of evolution in all eutherian lineages is approximately 30% slower than in the opossum lineage. This pattern of relative rates is inconsistent with the hypothesis that generation time is an important determinant of substitution rates and, by implication, mutation rates. Possible factors causing rate differences between the lineages include differences in DNA repair and replication enzymology, and shifts in nucleotide pools. Our analysis demonstrates the importance of using multiple sequences from across the genome to estimate phylogeny and relative evolutionary rate in order to reduce the influence of distorting local effects evident even in relatively long sequences.  相似文献   

16.
The rapidly increasing volume of sequence and structure information available for proteins poses the daunting task of determining their functional importance. Computational methods can prove to be very useful in understanding and characterizing the biochemical and evolutionary information contained in this wealth of data, particularly at functionally important sites. Therefore, we perform a detailed survey of compositional and evolutionary constraints at the molecular and biological function level for a large set of known functionally important sites extracted from a wide range of protein families. We compare the degree of conservation across different functional categories and provide detailed statistical insight to decipher the varying evolutionary constraints at functionally important sites. The compositional and evolutionary information at functionally important sites has been compiled into a library of functional templates. We developed a module that predicts functionally important columns (FIC) of an alignment based on the detection of a significant "template match score" to a library template. Our template match score measures an alignment column's similarity to a library template and combines a term explicitly representing a column's residue composition with various evolutionary conservation scores (information content and position-specific scoring matrix-derived statistics). Our benchmarking studies show good sensitivity/specificity for the prediction of functional sites and high accuracy in attributing correct molecular function type to the predicted sites. This prediction method is based on information derived from homologous sequences and no structural information is required. Therefore, this method could be extremely useful for large-scale functional annotation.  相似文献   

17.
Zn2+, an element that is essential to all life forms, can play a catalytic or a solely structural role. Previous works have shown that Zn2+ binds preferentially to water molecules and His in catalytic sites, but to Cys instructural sites, but the molecular basis for the observed ligand preference is unclear. Here, we show that the different Zn2+ roles are also reflected in the different bond distances to Zn2+ in structural and catalytic sites. We reveal the physical basis for the observed differences between structural and catalytic Zn sites: In most catalytic sites, water is found bound to Zn2+ as it transfers the least charge to Zn2+ and is less bulky compared to the protein ligands, enabling Zn2+ to serve as a Lewis acid in catalysis. In most structural sites, however, ≥ 2 Cys are found bound to Zn2+, as Cys transfers the most charge to Zn2+ and reduces the Zn charge to such an extent that Zn2+ can no longer act as a Lewis acid; furthermore, steric repulsion among the bulky Cys(S) prevents Zn2+ from accommodating another ligand. Based on the observed ligand preference and Zn-ligand distance differences between structural and catalytic Zn sites, we present a simple method for distinguishing the two types of sites and for verifying the catalytic role of Zn2+. Finally, we discuss how the physical bases revealed aid in designing potential drug molecules that target Zn proteins.  相似文献   

18.
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution‐matrix‐based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface‐exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site‐directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. Proteins 2015; 83:135–152. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
Hydrophobic cores are fundamental structural properties of proteins typically associated with protein folding and stability; however, how the hydrophobic core shapes protein evolution and function is poorly understood. Here, we investigated the role of conserved hydrophobic cores in fold-A glycosyltransferases (GT-As), a large superfamily of enzymes that catalyze formation of glycosidic linkages between diverse donor and acceptor substrates through distinct catalytic mechanisms (inverting versus retaining). Using hidden Markov models and protein structural alignments, we identify similarities in the phosphate-binding cassette (PBC) of GT-As and unrelated nucleotide-binding proteins, such as UDP-sugar pyrophosphorylases. We demonstrate that GT-As have diverged from other nucleotide-binding proteins through structural elaboration of the PBC and its unique hydrophobic tethering to the F-helix, which harbors the catalytic base (xED-Asp). While the hydrophobic tethering is conserved across diverse GT-A fold enzymes, some families, such as B3GNT2, display variations in tethering interactions and core packing. We evaluated the structural and functional impact of these core variations through experimental mutational analysis and molecular dynamics simulations and find that some of the core mutations (T336I in B3GNT2) increase catalytic efficiency by modulating the conformational occupancy of the catalytic base between “D-in” and acceptor-accessible “D-out” conformation. Taken together, our studies support a model of evolution in which the GT-A core evolved progressively through elaboration upon an ancient PBC found in diverse nucleotide-binding proteins, and malleability of this core provided the structural framework for evolving new catalytic and substrate-binding functions in extant GT-A fold enzymes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号