首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

2.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

3.
Molecular principles of the interactions of disordered proteins   总被引:6,自引:0,他引:6  
Thorough knowledge of the molecular principles of protein-protein recognition is essential to our understanding of protein function at the cellular level. Whereas interactions of ordered proteins have been analyzed in great detail, complexes of intrinsically unstructured/disordered proteins (IUPs) have hardly been addressed so far. Here, we have collected a database of 39 complexes of experimentally verified IUPs, and compared their interfaces with those of 72 complexes of ordered, globular proteins. The characteristic differences found between the two types of complexes suggest that IUPs represent a distinct molecular implementation of the principles of protein-protein recognition. The interfaces do not differ in size, but those of IUPs cover a much larger part of the surface of the protein than for their ordered counterparts. Moreover, IUP interfaces are significantly more hydrophobic relative to their overall amino acid composition, but also in absolute terms. They rely more on hydrophobic-hydrophobic than on polar-polar interactions. Their amino acids in the interface realize more intermolecular contacts, which suggests a better fit with the partner due to induced folding upon binding that results in a better adaptation to the partner. The two modes of interaction also differ in that IUPs usually use only a single continuous segment for partner binding, whereas the binding sites of ordered proteins are more segmented. Probably, all these features contribute to the increased evolutionary conservation of IUP interface residues. These noted molecular differences are also manifested in the interaction energies of IUPs. Our approximation of these by low-resolution force-fields shows that IUPs gain much more stabilization energy from intermolecular contacts, than from folding, i.e. they use their binding energy for folding. Overall, our findings provide a structural rationale to the prior suggestions that many IUPs are specialized for functions realized by protein-protein interactions.  相似文献   

4.
The globin family has long been known from studies of approximately 150-residue proteins such as vertebrate myoglobins and haemoglobins. Recently, this family has been enriched by the investigation of the sequences and structures of truncated globins, which have the same basic topology but are approximately 30 residues shorter and exhibit functions other than the familiar one of binding diatomic ligands. The divergence of protein sequences, structures and functions reveals Nature's exploration of the potential inherent in a folding pattern, that is, the topology of the native structure. The observation of what remains constant and what varies during the evolution of a protein family reveals essential features of structure and function. Study of proteins with a wide range of divergence can therefore sharpen our understanding of how different amino acid sequences can determine similar three-dimensional structures. Globins have provided, and continue to provide, interesting material for such studies.  相似文献   

5.
A structural database of 11 families of chains differing by a single amino acid substitution has been built. Another structural dataset of 5 families with identical sequences has been used for comparison. The RMSD computed after a global superimposition of the mutated protein on each native one is smaller than the RMSD calculated among proteins of identical sequences. The effect of the perturbation is very local, and not necessarily the highest at the position of the mutation. A RMSD between mutated and native proteins is computed over a 3‐residue or a 7‐residue window at each position. To separate the effects of structural fluctuations due to point mutations from other sources, pair RMSD have been translated into P values which themselves are included in a score called P‐RANK. This score allows highlighting small backbone distortions by comparing these RMSD between mutated and native positions to the RMSD at the same positions in the absence of a mutation. It results from the P‐RANK that 38% of all mutations produce a significant effect on the displacement. When compared with a random distribution of RMSD at un‐mutated positions, we show that, even if the RMSD is greater when the mutation is in loops than in regular secondary structure, the relative effect is more important for regular secondary structures and for buried positions. We confirm the absence of correlation between RMSD and the predicted variation of free energy of folding but we found a small correlation between high RMSD and the error in the prediction of ΔΔG.  相似文献   

6.
Prothrombin, plasminogen, urokinase- and tissue-type plasminogen activators contain homologous structures known as kringles . The kringles correspond to autonomous structural and folding domains which mediate the binding of these multidomain proteins to other proteins. During evolution the different kringles retained the same gross architecture, the kringle -fold, yet diverged to bind different proteins. We show that the amino acid sequences of the type II structures of the gelatin-binding region of fibronectin are homologous with those of the protease- kringles . Prediction of secondary structures revealed a remarkable agreement in the positions of predicted beta-sheets, suggesting that the folding of kringles and type II structures may also be similar. As a corollary of this finding, the disulphide-bridge pattern of type II structures is shown to be homologous to that in kringles . It is noteworthy that protease- kringles and fibronectin type II structures have similar functions inasmuch as they mediate the binding of multidomain proteins to other proteins. It is proposed that the kringles of proteases and type II structures of fibronectin evolved from a common ancestral protein binding module.  相似文献   

7.

Background

Protein surfaces comprise only a fraction of the total residues but are the most conserved functional features of proteins. Surfaces performing identical functions are found in proteins absent of any sequence or fold similarity. While biochemical activity can be attributed to a few key residues, the broader surrounding environment plays an equally important role.

Results

We describe a methodology that attempts to optimize two components, global shape and local physicochemical texture, for evaluating the similarity between a pair of surfaces. Surface shape similarity is assessed using a three-dimensional object recognition algorithm and physicochemical texture similarity is assessed through a spatial alignment of conserved residues between the surfaces. The comparisons are used in tandem to efficiently search the Global Protein Surface Survey (GPSS), a library of annotated surfaces derived from structures in the PDB, for studying evolutionary relationships and uncovering novel similarities between proteins.

Conclusion

We provide an assessment of our method using library retrieval experiments for identifying functionally homologous surfaces binding different ligands, functionally diverse surfaces binding the same ligand, and binding surfaces of ubiquitous and conformationally flexible ligands. Results using surface similarity to predict function for proteins of unknown function are reported. Additionally, an automated analysis of the ATP binding surface landscape is presented to provide insight into the correlation between surface similarity and function for structures in the PDB and for the subset of protein kinases.  相似文献   

8.
A number of biophysical and population-genetic processes influence amino acid substitution rates. It is commonly recognized that proteins must fold into a native structure with preference over an unfolded state, and must bind to functional interacting partners favourably to function properly. What is less clear is how important folding and binding specificity are to amino acid substitution rates. A hypothesis of the importance of binding specificity in constraining sequence and functional evolution is presented. Examples include an evolutionary simulation of a population of SH2 sequences evolved by threading through the structure and binding to a native ligand, as well as SH3 domain signalling in yeast and selection for specificity in enzymatic reactions. An example in vampire bats where negative pleiotropy appears to have been adaptive is presented. Finally, considerations of compartmentalization and macromolecular crowding on negative pleiotropy are discussed.  相似文献   

9.
The folding pathway, three-dimensional structure and intrinsic dynamics of proteins are governed by their amino acid sequences. Internal protein surfaces with physicochemical properties appropriate to modulate conformational fluctuations could play important roles in folding and dynamics. We show here that proteins contain buried interfaces of high polarity and low packing density, coined as LIPs: Light Interfaces of high Polarity, whose physicochemical properties make them unstable. The structures of well-characterized equilibrium and kinetic folding intermediates indicate that the LIPs of the corresponding native proteins fold late and are involved in local unfolding events. Importantly, LIPs can be identified using very fast and uncomplicated computational analysis of protein three-dimensional structures, which provides an easy way to delineate the protein segments involved in dynamics. Since LIPs can be retained while the sequences of the interacting segments diverge significantly, proteins could in principle evolve new functional features reusing pre-existing encoded dynamics. Large-scale identification of LIPS may contribute to understanding evolutionary constraints of proteins and the way protein intrinsic dynamics are encoded.  相似文献   

10.
The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrostatic surface of functional-site in proteins). Using this system, functional sites similar to those of phosphoenoylpyruvate carboxy kinase were detected in several mononucleotide-binding proteins, which have different folds. We also applied our method to a hypothetical protein, MJ0226 from Methanococcus jannaschii, and detected the mononucleotide binding site from the similarity to other proteins having different folds.  相似文献   

11.
Huang JT  Tian J 《Proteins》2006,63(3):551-554
The significant correlation between protein folding rates and the sequence-predicted secondary structure suggests that folding rates are largely determined by the amino acid sequence. Here, we present a method for predicting the folding rates of proteins from sequences using the intrinsic properties of amino acids, which does not require any information on secondary structure prediction and structural topology. The contribution of residue to the folding rate is expressed by the residue's Omega value. For a given residue, its Omega depends on the amino acid properties (amino acid rigidity and dislike of amino acid for secondary structures). Our investigation achieves 82% correlation with folding rates determined experimentally for simple, two-state proteins studied until the present, suggesting that the amino acid sequence of a protein is an important determinant of the protein-folding rate and mechanism.  相似文献   

12.
The amino acid composition of human alcohol dehydrogenase (ADH) was compared with alcohol dehydrogenases from different organisms and with other proteins. Similar amino acid sequences in human ADH (template protein) and in other proteins were determined by means of an original computer program. Analysis of amino acid motifs reveals that the ADHs from evolutionary more close organisms have more common amino acid sequences. The quantity measure of amino acid similarity was the number of similar motifs in analyzed protein per protein length. This value was measured for ADHs and for different proteins. For ADHs, this quotient was higher than for proteins with different functions; for vertebrates it correlated with evolutionary closeness. The similar operation of motif comparison was made with the help of program complex “MEME”. The analysis of ADHs revealed 4 motifs common to 6 of 10 tested organisms and no such motifs for proteins of different function. The conclusion is that general amino composition is more important for protein function than amino acid order and for enzymes of similar function it better correlates with evolutionary distance between organisms.  相似文献   

13.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

14.
15.
Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.  相似文献   

16.
All eukaryotic cellular mRNAs contain a 5' m(7)GpppN cap. In addition to conferring stability to the mRNA, the cap is required for pre-mRNA splicing, nuclear export and translation by providing an anchor point for protein binding. In translation, the interaction between the cap and the eukaryotic initiation factor 4E (eIF4E) is important in the recruitment of the mRNAs to the ribosome. Human 4EHP (h4EHP) is a homologue of eIF4E. Like eIF4E it is able to bind the cap but it appears to play a different cellular role, possibly being involved in the fine-tuning of protein expression levels. Here we use X-ray crystallography and isothermal titration calorimetry (ITC) to investigate further the binding of cap analogues and peptides to h4EHP. m(7)GTP binds to 4EHP 200-fold more weakly than it does to eIF4E with the guanine base sandwiched by a tyrosine and a tryptophan instead of two tryptophan residues as seen in eIF4E. The tyrosine resides on a loop that is longer in h4EHP than in eIF4E. The consequent conformational difference between the proteins allows the tyrosine to mimic the six-membered ring of the tryptophan in eIF4E and adopt an orientation that is similar to that seen for equivalent residues in other non-homologous cap-binding proteins. In the absence of ligand the binding site is incompletely formed with one of the aromatic residues being disordered and the side-chain of the other adopting a novel conformation. A peptide derived from the eIF4E inhibitory protein, 4E-BP1 binds h4EHP 100-fold less strongly than eIF4E but in a similar manner. Overall the data, combined with sequence analyses of 4EHP from evolutionary diverse species, strongly support the hypothesis that 4EHP plays a physiological role utilizing both cap-binding and protein-binding functions but which is distinct from eIF4E.  相似文献   

17.
Many proteins function by interacting with other small molecules (ligands). Identification of ligand‐binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand‐binding protein sequences and functions. Consequently, we classified the patches into ~2000 well‐characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross‐fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.  相似文献   

18.
The arylamine N-acetyltransferases are important xenobiotic-metabolizing enzymes that catalyze an acetyl group transfer from acetylCoA to arylamine substrates. NAT enzymes possess an active-site loop (the active-site P-loop) involved in substrate binding and selectivity. The Gly/Ala residue present at the start of the active-site P-loop, although conserved in all NAT enzymes, is not involved in the catalytic mechanism or substrate binding. Here we show that a small amino acid (such as Gly or Ala) at this position is important not only for maintaining the functions of the active-site P-loop but, more surprisingly, also important for maintaining the overall structural integrity of NAT enzymes. Our data thus suggest that in addition to its role in substrate binding and selectivity, the active-site P-loop could play a wider structural role in NAT enzymes.  相似文献   

19.
Detecting similarities between local binding surfaces can facilitate identification of enzyme binding sites and prediction of enzyme functions, and aid in our understanding of enzyme mechanisms. Constructing a template of local surface characteristics for a specific enzyme function or binding activity is a challenging task, as the size and shape of the binding surfaces of a biochemical function often vary. Here we introduce the concept of signature binding pockets, which captures information on preserved and varied atomic positions at multiresolution levels. For proteins with complex enzyme binding and activity, multiple signatures arise naturally in our model, forming a signature basis set that characterizes this class of proteins. Both signatures and signature basis sets can be automatically constructed by a method called SOLAR (Signature Of Local Active Regions). This method is based on a sequence-order-independent alignment of computed binding surface pockets. SOLAR also provides a structure-based multiple sequence fragment alignment to facilitate the interpretation of computed signatures. By studying a family of evolutionarily related proteins, we show that for metzincin metalloendopeptidase, which has a broad spectrum of substrate binding, signature and basis set pockets can be used to discriminate metzincins from other enzymes, to predict the subclass of metzincins functions, and to identify specific binding surfaces. Studying unrelated proteins that have evolved to bind to the same NAD cofactor, we constructed signatures of NAD binding pockets and used them to predict NAD binding proteins and to locate NAD binding pockets. By measuring preservation ratio and location variation, our method can identify residues and atoms that are important for binding affinity and specificity. In both cases, we show that signatures and signature basis set reveal significant biological insight.  相似文献   

20.
Phylogenetic profiling of amino acid substitution patterns in proteins has led many to conclude that most structural information is carried by interior core residues that are solvent inaccessible. This conclusion is based on the observation that buried residues generally tolerate only conserved sequence changes, while surface residues allow more diverse chemical substitutions. This notion is now changing as it has become apparent that both core and surface residues play important roles in protein folding and stability. Unfortunately, the ability to identify specific mutations that will lead to enhanced stability remains a challenging problem. Here we discuss two mutations that emerged from an in vitro selection experiment designed to improve the folding stability of a non-biological ATP binding protein. These mutations alter two solvent accessible residues, and dramatically enhance the expression, solubility, thermal stability, and ligand binding affinity of the protein. The significance of both mutations was investigated individually and together, and the X-ray crystal structures of the parent sequence and double mutant protein were solved to a resolution limit of 2.8 and 1.65 A, respectively. Comparative structural analysis of the evolved protein to proteins found in nature reveals that our non-biological protein evolved certain structural features shared by many thermophilic proteins. This experimental result suggests that protein fold optimization by in vitro selection offers a viable approach to generating stable variants of many naturally occurring proteins whose structures and functions are otherwise difficult to study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号