首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Solis AD  Rackovsky S 《Proteins》2008,71(3):1071-1087
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.  相似文献   

2.
Chao Zhang 《Proteins》1998,31(3):299-308
In this study, we exploited an elementary 2-dimensional square lattice model of HP polymers to test the premise of extracting contact energies from protein structures. Given a set of prespecified energies for H–H, H–P, and P–P contacts, all possible sequences of various lengths were exhaustively enumerated to find sequences that have unique lowest-energy conformations. The lowest-energy structures (or native structures) of such (native) sequences were used to extract contact energies using the Miyazawa-Jernigan procedure and here-defined reference state. The relative magnitudes of the original energies were restored reasonably well, but the extracted contact energies were independent of the absolute magnitudes of the initial energies. We turned to a more detailed characterization of the energy landscapes of the native sequences in light of a new theoretical framework on protein folding. Foldability of such sequences imposes two limits on the absolute value of the prespecified energies: a lower bound entailed by the minimum requirement for thermodynamic stability and an upper bound associated with the entrapment of the chain to local minima. We found that these two limits confine the prespecified energy values to a rather narrow range which, surprisingly, also contains the extracted energies in all the cases examined. These results indicate that the quasi-chemical approximation can be used to connect quantitatively the occurrence of various residue–residue contacts in an ensemble of native structures with the energies of the contacts. More importantly, they suggest that the extracted contact energies do contain information on structural stability and can be used to estimate actual structural energetics. This study also encourages the use of structure-derived contact energies in threading. The finding that there is a rather narrow range of energies that are optimal for folding a sequence also cautions the use of arbitrary energy Hamiltonion in minimal folding models. Proteins 31:299–308, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

3.
Many protein architectures exhibit evidence of internal rotational symmetry postulated to be the result of gene duplication/fusion events involving a primordial polypeptide motif. A common feature of such structures is a domain‐swapped arrangement at the interface of the N‐ and C‐termini motifs and postulated to provide cooperative interactions that promote folding and stability. De novo designed symmetric protein architectures have demonstrated an ability to accommodate circular permutation of the N‐ and C‐termini in the overall architecture; however, the folding requirement of the primordial motif is poorly understood, and tolerance to circular permutation is essentially unknown. The β‐trefoil protein fold is a threefold‐symmetric architecture where the repeating ~42‐mer “trefoil‐fold” motif assembles via a domain‐swapped arrangement. The trefoil‐fold structure in isolation exposes considerable hydrophobic area that is otherwise buried in the intact β‐trefoil trimeric assembly. The trefoil‐fold sequence is not predicted to adopt the trefoil‐fold architecture in ab initio folding studies; rather, the predicted fold is closely related to a compact “blade” motif from the β‐propeller architecture. Expression of a trefoil‐fold sequence and circular permutants shows that only the wild‐type N‐terminal motif definition yields an intact β‐trefoil trimeric assembly, while permutants yield monomers. The results elucidate the folding requirements of the primordial trefoil‐fold motif, and also suggest that this motif may sample a compact conformation that limits hydrophobic residue exposure, contains key trefoil‐fold structural features, but is more structurally homologous to a β‐propeller blade motif.  相似文献   

4.
Transmembrane beta-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts. The cellular location and functional diversity of beta-barrel outer membrane proteins (omps) makes them an important protein class. At the present time, very few nonhomologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane proteins. A novel method using pairwise interstrand residue statistical potentials derived from globular (nonouter membrane) proteins is introduced to predict the supersecondary structure of transmembrane beta-barrel proteins. The algorithm transFold employs a generalized hidden Markov model (i.e., multitape S-attribute grammar) to describe potential beta-barrel supersecondary structures and then computes by dynamic programming the minimum free energy beta-barrel structure. Hence, the approach can be viewed as a "wrapping" component that may capture folding processes with an initiation stage followed by progressive interaction of the sequence with the already-formed motifs. This approach differs significantly from others, which use traditional machine learning to solve this problem, because it does not require a training phase on known TMB structures and is the first to explicitly capture and predict long-range interactions. TransFold outperforms previous programs for predicting TMBs on smaller (相似文献   

5.
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.  相似文献   

6.
One focus of our research is to further our understanding of the physico-chemical properties of non-canonical nucleic acid structures. In this work, DNA hairpins are used to mimic a common motif present in RNA, i.e. a stem-loop motif with a bulge or internal loop in their stem. Specifically, we used a combination of temperature-dependent UV spectroscopy, differential scanning (DSC), and pressure perturbation (PPC) calorimetric techniques to determine complete thermodynamic profiles for the helix–coil transitions of two sets of hairpins with 5′–3′ sequences: d(GCGCT n GTAACT5GTTACGCGC) and d(GCGCT n GTAACT5GTTACT n GCGC). “T n ” is a variable loop of thymines, n?=?1, 3 or 5; and “T5” is an end-loop of five thymines. Unfolding curves show monophasic transitions with TMs independent of strand concentration, confirming their intramolecular formation. DSC thermodynamic profiles indicate that the favorable folding of each hairpin results from the typical compensation of favorable enthalpy and unfavorable entropy contributions, while the DSC curves as a function of salt concentration yielded an uptake of cations and negative heat capacity effects. PPC melting curves yielded positive folding volumes ranging 12–31?cm3/mol, corresponding to releases of water molecules; in contrast, an uptake of water (ranging from 32 to 63?mol of H2O/mol) is observed from osmotic stress experiments using ethylene glycol as the osmolyte. Overall, the increase in the size of the variable bulge or internal-loop yielded lower TMs and slightly more favorable enthalpies, corresponding to less favorable free energy contributions of ~0.7?kcal/mol per thymine residue. The volume measurements will be correlated with the unfolding entropies and discussed in terms of the type of water that is hydrating these stem-loop motifs structures.  相似文献   

7.
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.  相似文献   

8.

Background  

Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed.  相似文献   

9.
Protein structure prediction is limited by the inaccuracy of the simplified energy functions necessary for efficient sorting over many conformations. It was recently suggested (Finkelstein, Phys Rev Lett 1998;80:4823-4825) that these errors can be reduced by energy averaging over a set of homologous sequences. This conclusion is confirmed in this study by testing protein structure recognition in gapless threading. The accuracy of recognition was estimated by the Z-score values obtained in gapless threading tests. For threading, we used 20 target proteins, each having from 20 to 70 homologs taken from the HSSP sequence base. The energy of the native structures was compared with the energy from 34 to 75 thousand of alternative structures generated by threading. The energy calculations were done with our recently developed Calpha atom-based phenomenological potentials. We show that averaging of protein energies over homologs reduces the Z-score from approximately -6.1 (average Z-score for individual chains) to approximately -8.1. This means that a correct fold can be found among 3 x 10(9) random folds in the first case and among 3 x 10(15) in the second. Such increase in selectivity is important for recognition of protein folds.  相似文献   

10.
We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

11.
Hue Sun Chan  Ken A. Dill 《Proteins》1996,24(3):335-344
Proteins fold to unique compact native structures. Perhaps other polymers could be designed to fold in similar ways. The chemical nature of the monomer “alphabet” determines the “energy matrix” of monomer interactions—which defines the folding code, the relationship between sequence and structure. We study two properties of energy matrices using two-dimensional lattice models: uniqueness, the number of sequences that fold to only one structure, and encodability, the number of folds that are unique lowest-energy structures of certain monomer sequences. For the simplest model folding code, involving binary sequences of H (hydrophobic) and P (polar) monomers, only a small fraction of sequences fold uniquely, and not all structures can be encoded. Adding strong repulsive interactions results in a folding code with more sequences folding uniquely and more designable folds. Some theories suggest that the quality of a folding code depends only on the number of letters in the monomer alphabet, but we find that the energy matrix itself can be at least as important as the size of the alphabet. Certain multi-letter codes, including some with 20 letters, may be less physical or protein-like than codes with smaller numbers of letters because they neglect correlations among inter-residue interactions, treat only maximally compact conformations, or add arbitrary energies to the energy matrix.  相似文献   

12.
13.
Using information‐theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information‐based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi‐chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information‐theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
HIV-1 protease is a major drug target against AIDS as it permits viral maturation by processing the gag and pol polyproteins of the virus. The cleavage sites in these polyproteins do not have obvious sequence homology or a binding motif and the specificity of the protease is not easily determined. We used various threading approaches, together with the crystal structures of substrate complexes which served as template structures, to study the substrate specificity of HIV-1 protease with the aim of obtaining a better differentiation between binding and nonbinding sequences. The predictions from threading improved when distance-dependent interaction energy functions were used instead of contact matrices. To rank the peptides and properly account for the peptide's conformation in the total energy, the results from using short-range potentials on multiple template structures were averaged. Finally, a dynamic threading approach is introduced which is potentially useful for cases when there is only one template structure available. The conformational energy of the peptide-especially the term accounting for the side chains-was found to be important in differentiating between binding and nonbinding sequences. Hence, the substrate specificity, and thus the ability of the virus to mature, is affected by the compatibility of the substrate peptide to fit within the limited conformational space of the active site groove.  相似文献   

15.
Protein folding into tertiary structures is controlled by an interplay of attractive contact interactions and steric effects. We investigate the balance between these contributions using structure‐based models using an all‐atom representation of the structure combined with a coarse‐grained contact potential. Tertiary contact interactions between atoms are collected into a single broad attractive well between the Cβ atoms between each residue pair in a native contact. Through the width of these contact potentials we control their tolerance for deviations from the ideal structure and the spatial range of attractive interactions. In the compact native state dominant packing constraints limit the effects of a coarse‐grained contact potential. During folding, however, the broad attractive potentials allow an early collapse that starts before the native local structure is completely adopted. As a consequence the folding transition is broadened and the free energy barrier is decreased. Eventually two‐state folding behavior is lost completely for systems with very broad attractive potentials. The stabilization of native‐like residue interactions in non‐perfect geometries early in the folding process frequently leads to structural traps. Global mirror images are a notable example. These traps are penalized by the details of the repulsive interactions only after further collapse. Successful folding to the native state requires simultaneous guidance from both attractive and repulsive interactions. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

16.
Meller J  Elber R 《Proteins》2001,45(3):241-261
The design of scoring functions (or potentials) for threading, differentiating native-like from non-native structures with a limited computational cost, is an active field of research. We revisit two widely used families of threading potentials: the pairwise and profile models. To design optimal scoring functions we use linear programming (LP). The LP protocol makes it possible to measure the difficulty of a particular training set in conjunction with a specific form of the scoring function. Gapless threading demonstrates that pair potentials have larger prediction capacity compared with profile energies. However, alignments with gaps are easier to compute with profile potentials. We therefore search and propose a new profile model with comparable prediction capacity to contact potentials. A protocol to determine optimal energy parameters for gaps, using LP, is also presented. A statistical test, based on a combination of local and global Z-scores, is employed to filter out false-positives. Extensive tests of the new protocol are presented. The new model provides an efficient alternative for threading with pair energies, maintaining comparable accuracy. The code, databases, and a prediction server are available at http://www.tc.cornell.edu/CBIO/loopp.  相似文献   

17.
Beta‐turns in beta‐hairpins have been implicated as important sites in protein folding. In particular, two residue β‐turns, the most abundant connecting elements in beta‐hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta‐hairpins with a large data set. For this, 3977 beta‐turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta‐hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two‐residue turns, and their distributions were examined. From this study, we could identify code‐like sequence motifs for the two residue beta‐turn types. Finally, structural and sequence properties of beta‐strands in the beta‐hairpins were analyzed, which revealed that the beta‐strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta‐hairpin turn structures and sequences. Proteins 2014; 82:1721–1733. © 2014 Wiley Periodicals, Inc.  相似文献   

18.
In order to probe the relative contribution of local and non-local interactions to the thermodynamic stability of proteins, we have devised an experimental approach based on a combination of motif engineering and sequence shuffling. Candidate chain segments in an immunoglobulin V(L) domain were identified whose conformation is proposed to be dominated by non-local interactions. Locally interacting structural motifs of a different conformation were then constructed as replacements, by introducing motif consensus sequences. We find that all nine replacements we constructed systematically reduce the folding cooperativity. By comparing this destabilising effect with the folding transitions of shuffled sequences for three of these motifs, we estimate the contribution of local, native interactions to the free energy of folding. Our results suggest that local and non-local interactions contribute to stability by an approximately equal amount, but that local interactions stabilise by increasing the resistance to denaturation while non-local interactions increase folding cooperativity. The systematic loss of stability by sequence shuffling in these host-guest experiments suggests that the designed interactions indeed are present in the native state, thus consensus sequence engineering may be a useful tool in structure design, but non-local interactions must be taken into account for global stability engineering. Statistical approaches are powerful tools for engineering protein structure and stability, but an analysis based on local sequence propensities alone does not adequately represent the balance of sequence and context in protein structures.  相似文献   

19.
Multibody potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Our goal was to combine long range multibody potentials and short range potentials to improve recognition of native structure among misfolded decoys. We optimized the weights for four-body nonsequential, four-body sequential, and short range potentials to obtain optimal model ranking results for threading and have compared these data against results obtained with other potentials (26 different coarse-grained potentials from the Potentials 'R'Us web server have been used). Our optimized multibody potentials outperform all other contact potentials in the recognition of the native structure among decoys, both for models from homology template-based modeling and from template-free modeling in CASP8 decoy sets. We have compared the results obtained for this optimized coarse-grained potentials, where each residue is represented by a single point, with results obtained by using the DFIRE potential, which takes into account atomic level information of proteins. We found that for all proteins larger than 80 amino acids our optimized coarse-grained potentials yield results comparable to those obtained with the atomic DFIRE potential.  相似文献   

20.

Background

Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design.

Results

In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys.

Conclusions

Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号