首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Protein secondary structure: entropy, correlations and prediction   总被引:4,自引:0,他引:4  
MOTIVATION: Is protein secondary structure primarily determined by local interactions between residues closely spaced along the amino acid backbone or by non-local tertiary interactions? To answer this question, we measure the entropy densities of primary and secondary structure sequences, and the local inter-sequence mutual information density. RESULTS: We find that the important inter-sequence interactions are short ranged, that correlations between neighboring amino acids are essentially uninformative and that only one-fourth of the total information needed to determine the secondary structure is available from local inter-sequence correlations. These observations support the view that the majority of most proteins fold via a cooperative process where secondary and tertiary structure form concurrently. Moreover, existing single-sequence secondary structure prediction algorithms are almost optimal, and we should not expect a dramatic improvement in prediction accuracy. AVAILABILITY: Both the data sets and analysis code are freely available from our Web site at http://compbio.berkeley.edu/  相似文献   

2.
Inter-residue interactions in protein folding and stability   总被引:6,自引:0,他引:6  
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.  相似文献   

3.
A statistical analysis of known structures is made for an assessment of the utility of short-range energy considerations. For each type of amino acid, the potentials governing (1) the torsions and bond angle changes of virtual Cα-Cα bonds and (2) the coupling between torsion and bond angle changes are derived. These contribute approximately −2 RT per residue to the stability of native proteins, approximately half of which is due to coupling effects. The torsional potentials for the α-helical states of different residues are verified to be strongly correlated with the free-energy change measurements made upon single-site mutations at solvent-exposed regions. Likewise, a satisfactory correlation is shown between the β-sheet potentials of different amino acids and the scales from free-energy measurements, despite the role of tertiary context in stabilizing β-sheets. Furthermore, there is excellent agreement between our residue-specific potentials for α-helical state and other thermodynamic based scales. Threading experiments performed by using an inverse folding protocol show that 50 of 62 test structures correctly recognize their native sequence on the basis of short-range potentials. The performance is improved to 55, upon simultaneous consideration of short-range potentials and the nonbonded interaction potentials between sequentially distant residues. Interactions between near residues along the primary structure, i.e., the local or short-range interactions, are known to be insufficient, alone, for understanding the tertiary structural preferences of proteins alone. Yet, knowledge of short-range conformational potentials permits rationalizing the secondary structure propensities and aids in the discrimination between correct and incorrect tertiary folds. Proteins 29:292–308, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

4.
The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information. © 1994 John Wiley & Sons, Inc.  相似文献   

5.
S Miyazawa  R L Jernigan 《Proteins》1999,36(3):347-356
Short-range interactions for secondary structures of proteins are evaluated as potentials of mean force from the observed frequencies of secondary structures in known protein structures which are assumed to have an equilibrium distribution with the Boltzmann factor of secondary structure energies. A secondary conformation at each residue position in a protein is described by a tripeptide, including one nearest neighbor on each side. The secondary structure potentials are approximated as additive contributions from neighboring residues along the sequence. These are part of an empirical potential to provide a crude estimate of protein conformational energy at a residue level. Unlike previous works, interactions are decoupled into intrinsic potentials of residues, potentials of backbone-backbone interactions, and of side chain-backbone interactions. Also interactions are decoupled into one-body, two-body, and higher order interactions between peptide backbone and side chain and between backbones. These decouplings are essential to correctly evaluate the total secondary structure energy of a protein structure without overcounting interactions. Each interaction potential is evaluated separately by taking account of the correlation in the amino acid order of protein sequences. Interactions among side chains are neglected, because of the relatively limited number of protein structures. Proteins 1999;36:347-356. Published 1999 Wiley-Liss, Inc.  相似文献   

6.
Structural uniqueness is characteristic of native proteins and is essential to express their biological functions. The major factors that bring about the uniqueness are specific interactions between hydrophobic residues and their unique packing in the protein core. To find the origin of the uniqueness in their amino acid sequences, we analyzed the distribution of the side chain rotational isomers (rotamers) of hydrophobic amino acids in protein tertiary structures and derived deltaS(contact), the conformational-entropy changes of side chains by residue-residue contacts in each secondary structure. The deltaS(contact) values indicate distinct tendencies of the residue pairs to restrict side chain conformation by inter-residue contacts. Of the hydrophobic residues in alpha-helices, aliphatic residues (Leu, Val, Ile) strongly restrict the side chain conformations of each other. In beta-sheets, Met is most strongly restricted by contact with Ile, whereas Leu, Val and Ile are less affected by other residues in contact than those in alpha-helices. In designed and native protein variants, deltaS(contact) was found to correlate with the folding-unfolding cooperativity. Thus, it can be used as a specificity parameter for designing artificial proteins with a unique structure.  相似文献   

7.
Protein contacts, inter-residue interactions and side-chain modelling   总被引:1,自引:0,他引:1  
Faure G  Bornot A  de Brevern AG 《Biochimie》2008,90(4):626-639
Three-dimensional structures of proteins are the support of their biological functions. Their folds are stabilized by contacts between residues. Inner protein contacts are generally described through direct atomic contacts, i.e. interactions between side-chain atoms, while contact prediction methods mainly used inter-Calpha distances. In this paper, we have analyzed the protein contacts on a recent high quality non-redundant databank using different criteria. First, we have studied the average number of contacts depending on the distance threshold to define a contact. Preferential contacts between types of amino acids have been highlighted. Detailed analyses have been done concerning the proximity of contacts in the sequence, the size of the proteins and fold classes. The strongest differences have been extracted, highlighting important residues. Then, we studied the influence of five different side-chain conformation prediction methods (SCWRL, IRECS, SCAP, SCATD and SCCOMP) on the distribution of contacts. The prediction rates of these different methods are quite similar. However, using a distance criterion between side chains, the results are quite different, e.g. SCAP predicts 50% more contacts than observed, unlike other methods that predict fewer contacts than observed. Contacts deduced are quite distinct from one method to another with at most 75% contacts in common. Moreover, distributions of amino acid preferential contacts present unexpected behaviours distinct from previously observed in the X-ray structures, especially at the surface of proteins. For instance, the interactions involving Tryptophan greatly decrease.  相似文献   

8.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

9.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

10.
Hyun Joo  Jerry Tsai 《Proteins》2014,82(9):2128-2140
To understand the relationship between protein sequence and structure, this work extends the knob‐socket model in an investigation of β‐sheet packing. Over a comprehensive set of β‐sheet folds, the contacts between residues were used to identify packing cliques: sets of residues that all contact each other. These packing cliques were then classified based on size and contact order. From this analysis, the two types of four‐residue packing cliques necessary to describe β‐sheet packing were characterized. Both occur between two adjacent hydrogen bonded β‐strands. First, defining the secondary structure packing within β‐sheets, the combined socket or XY:HG pocket consists of four residues i, i+2 on one strand and j, j+2 on the other. Second, characterizing the tertiary packing between β‐sheets, the knob‐socket XY:H+B consists of a three‐residue XY:H socket (i, i+2 on one strand and j on the other) packed against a knob B residue (residue k distant in sequence). Depending on the packing depth of the knob B residue, two types of knob‐sockets are found: side‐chain and main‐chain sockets. The amino acid composition of the pockets and knob‐sockets reveal the sequence specificity of β‐sheet packing. For β‐sheet formation, the XY:HG pocket clearly shows sequence specificity of amino acids. For tertiary packing, the XY:H+B side‐chain and main‐chain sockets exhibit distinct amino acid preferences at each position. These relationships define an amino acid code for β‐sheet structure and provide an intuitive topological mapping of β‐sheet packing. Proteins 2014; 82:2128–2140. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
A survey was compiled of several characteristics of the intersubunit contacts in 58 oligomeric proteins, and of the intermolecular contacts in the lattice for 223 protein crystal structures. The total number of atoms in contact and the secondary structure elements involved are similar in the two types of interfaces. Crystal contact patches are frequently smaller than patches involved in oligomer interfaces. Crystal contacts result from more numerous interactions by polar residues, compared with a tendency toward nonpolar amino acids at oligomer interfaces. Arginine is the only amino acid prominent in both types of interfaces. Potentials of mean force for residue–residue contacts at both crystal and oligomer interfaces were derived from comparison of the number of observed residue–residue interactions with the number expected by mass action. They show that hydrophobic interactions at oligomer interfaces favor aromatic amino acids and methionine over aliphatic amino acids; and that crystal contacts form in such a way as to avoid inclusion of hydrophobic interactions. They also suggest that complex salt bridges with certain amino acid compositions might be important in oligomer formation. For a protein that is recalcitrant to crystallization, substitution of lysine residues with arginine or glutamine is a recommended strategy. Proteins 28:494–514, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

12.
Structural and functional relations among thioredoxins of different species   总被引:24,自引:0,他引:24  
Three-dimensional models have been constructed of homologous thioredoxins and protein disulfide isomerases based on the high resolution x-ray crystallographic structure of the oxidized form of Escherichia coli thioredoxin. The thioredoxins, from archebacteria to humans, have 27-69% sequence identity to E. coli thioredoxin. The models indicate that all the proteins have similar three-dimensional structures despite the large variation in amino acid sequences. As expected, residues in the active site region of thioredoxins are highly conserved. These include Asp-26, Ala-29, Trp-31, Cys-32, Gly-33, Pro-34, Cys-35, Asp-61, Pro-76, and Gly-92. Similar residues occur in most protein disulfide isomerase sequences. Most of these residues form the surface around the active site that appears to facilitate interactions with other enzymes. Other structurally important residues are also conserved. A proline at position 40 causes a kink in the alpha-2 helix and thus provides the proper position of the active site residues at the amino end of this helix. Pro-76 is important in maintaining the native structure of the molecule. In addition, residues forming the internal contact surfaces between the secondary structural elements are generally unchanged such as Phe-12, Val-25, and Phe-27.  相似文献   

13.

Background  

Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C β atoms in other residues within a sphere around the C β atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence.  相似文献   

14.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

15.
Prediction of protein residue contacts with a PDB-derived likelihood matrix   总被引:8,自引:0,他引:8  
Proteins with similar folds often display common patterns of residue variability. A widely discussed question is how these patterns can be identified and deconvoluted to predict protein structure. In this respect, correlated mutation analysis (CMA) has shown considerable promise. CMA compares multiple members of a protein family and detects residues that remain constant or mutate in tandem. Often this behavior points to structural or functional interdependence between residues. CMA has been used to predict pairs of amino acids that are distant in the primary sequence but likely to form close contacts in the native three-dimensional structure. Until now these methods have used evolutionary or biophysical models to score the fit between residues. We wished to test whether empirical methods, derived from known protein structures, would provide useful predictive power for CMA. We analyzed 672 known protein structures, derived contact likelihood scores for all possible amino acid pairs, and used these scores to predict contacts. We then tested the method on 118 different protein families for which structures have been solved to atomic resolution. The mean performance was almost seven times better than random prediction. Used in concert with secondary structure prediction, the new CMA method could supply restraints for predicting still undetermined structures.  相似文献   

16.
S Miyazawa  R L Jernigan 《Proteins》1999,36(3):357-369
We consider modifications of an empirical energy potential for fold and sequence recognition to represent approximately the stabilities of proteins in various environments. A potential used here includes a secondary structure potential representing short-range interactions for secondary structures of proteins, and a tertiary structure potential consisting of a long-range, pairwise contact potential and a repulsive packing potential. This potential is devised to evaluate together the total conformational energy of a protein at the coarse grained residue level. It was previously estimated from the observed frequencies of secondary structures, from contact frequencies between residues, and from the distributions of the number of residues in contact in known protein structures by regarding those distributions as the equilibrium distributions with the Boltzmann factor of these interaction energies. The stability of native structures is assumed as a primary requirement for proteins to fold into their native structures. A collapse energy is subtracted from the contact energies to remove the protein size dependence and to represent protein stabilities for monomeric and multimeric states. The free energy of the whole ensemble of protein conformations that is subtracted from the conformational energy to represent protein stability is approximated as the average energy expected for a typical native structure with the same amino acid composition. This term may be constant in fold recognition but essentially varies in sequence recognition. A simple test of threading sequences into structures without gaps is employed to demonstrate the importance of the present modifications that permit the same potential to be utilized for both fold and sequence recognition. Proteins 1999;36:357-369. Published 1999 Wiley-Liss, Inc.  相似文献   

17.
Using the data from Protein Data Bank the correlations of primary and secondary structures of proteins were analyzed. The correlation values of the amino acids and the eight secondary structure types were calculated, where the position of the amino acid and the position in sequence with the particular secondary structure differ at most 25. The diagrams describing these results indicate that correlations are significant at distances between −9 and 10. The results show that the substituents on Cβ or Cγ atoms of amino acid play major role in their preference for particular secondary structure at the same position in the sequence, while the polarity of amino acid has significant influence on α-helices and strands at some distance in the sequence. The diagrams corresponding to polar amino acids are noticeably asymmetric. The diagrams point out the exchangeability of residues in the proteins; the amino acids with similar diagrams have similar local folding requirements. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
Assuming that the protein primary sequence contains all information required to fold a protein into its native tertiary structure, we propose a new computational approach to protein folding by distributing the total energy of the macromolecular system along the torsional axes.We further derive a new semiempirical equation to calculate the total energy of a macromolecular system including its free energy of solvation. The energy of solvation makes an important contribution to the stability of biological structures. The segregation of hydrophilic and hydrophobic domains is essential for the formation of micelles, lipid bilayers, and biological membranes, and it is also important for protein folding. The free energy of solvation consists of two components: one derived from interactions between the atoms of the protein, and the second resulting from interactions between the protein and the solvent. The latter component is expressed as a function of the fractional area of protein atoms accessible to the solvent.The protein-folding procedure described in this article consists of two successive steps: a theoretical transition from an ideal α helix to an ideal β sheet is first imposed on the protein conformation, in order to calculate an initial secondary structure. The most stable secondary structure is built from a combination of the lowest energy structures calculated for each amino acid during this transition. An angular molecular dynamics step is then applied to this secondary structure. In this computational step, the total energy of the system consisting of the sum of the torsional energy, the van der Waals energy, the electrostatic energy, and the solvation energy is minimized. This process yields 3-D structures of minimal total energy that are considered to be the most probable native-like structures for the protein.This method therefore requires no prior hypothesis about either the secondary or the tertiary structure of the protein and restricts the input of data to its sequence. The validity of the results is tested by comparing the crystalline and computed structures of four proteins, i.e., the avian and bovine pancreatic polypeptide (36 residues each), uteroglobin (70 residues), and the calcium-binding protein (75 residues); the Cα-Cα maps show significant homologies and the position of secondary structure domains; that of the α helices is particularly close.  相似文献   

19.
Statistical analyses of genome sequence‐derived protein sequence data can identify amino acid residues that interact between proteins or between domains of a protein. These statistical methods are based on evolution‐directed amino acid variation responding to structural and functional constraints in proteins. The identified residues form a basis for determining structure and folding of proteins as well as inferring mechanisms of protein function. When applied to two‐component systems, several research groups have shown they can be used to identify the amino acid interactions between response regulators and histidine kinases and the specificity therein. Recently, statistical studies between the HisKA and HATPase‐ATP‐binding domains of histidine kinases identified amino acid interactions for both the inactive and the active catalytic states of such kinases. The identified interactions generated a model structure for the domain conformation of the active state. This conformation requires an unwinding of a portion of the C‐terminal helix of the HisKA domain that destroys the inactive state residue contacts and suggests how signal‐binding determines the equilibrium between the inactive and active states of histidine kinases. The rapidly accumulating protein sequence databases from genome, metagenome and microbiome studies are an important resource for functional and structural understanding of proteins and protein complexes in microbes.  相似文献   

20.
Dupuis F  Sadoc JF  Mornon JP 《Proteins》2004,55(3):519-528
We present a new automatic algorithm, named VoTAP (Vo ronoï T essellation A ssignment P rocedure), which assigns secondary structures of a polypeptide chain using the list of α‐carbon coordinates. This program uses three‐dimensional Voronoï tessellation. This geometrical tool associates with each amino acid a Voronoï polyhedron, the faces of which unambiguously define contacts between residues. Thanks to the face area, for the contacts close together along the primary structure (low‐order contacts) a distinction is made between strong and normal ones. This new definition yields new contact matrices, which are analyzed and used to assign secondary structures. This assignment is performed in two stages. The first one uses contacts between residues close together along the primary structure and is based on data collected on a bank of 282 well‐refined nonredundant structures. In this bank, associations were made between the prints defined by these low‐order contacts and the assignments performed by different automatic methods. The second step focuses on the strand assignment and uses contacts between distant residues. Comparison with several other automatic assignment methods are presented, and the influence of resolution on the assignment is investigated. Proteins 2004. © 2004 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号