首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Zhang C  Liu S  Zhou H  Zhou Y 《Biophysical journal》2004,86(6):3349-3358
An accurate statistical energy function that is suitable for the prediction of protein structures of all classes should be independent of the structural database used for energy extraction. Here, two high-resolution, low-sequence-identity structural databases of 333 alpha-proteins and 271 beta-proteins were built for examining the database dependence of three all-atom statistical energy functions. They are RAPDF (residue-specific all-atom conditional probability discriminatory function), atomic KBP (atomic knowledge-based potential), and DFIRE (statistical potential based on distance-scaled finite ideal-gas reference state). These energy functions differ in the reference states used for energy derivation. The energy functions extracted from the different structural databases are used to select native structures from multiple decoys of 64 alpha-proteins and 28 beta-proteins. The performance in native structure selections indicates that the DFIRE-based energy function is mostly independent of the structural database whereas RAPDF and KBP have a significant dependence. The construction of two additional structural databases of alpha/beta and alpha + beta-proteins further confirmed the weak dependence of DFIRE on the structural databases of various structural classes. The possible source for the difference between the three all-atom statistical energy functions is that the physical reference state of ideal gas used in the DFIRE-based energy function is least dependent on the structural database.  相似文献   

2.
Huang SY  Zou X 《Proteins》2011,79(9):2648-2661
In this study, we have developed a statistical mechanics-based iterative method to extract statistical atomic interaction potentials from known, nonredundant protein structures. Our method circumvents the long-standing reference state problem in deriving traditional knowledge-based scoring functions, by using rapid iterations through a physical, global convergence function. The rapid convergence of this physics-based method, unlike other parameter optimization methods, warrants the feasibility of deriving distance-dependent, all-atom statistical potentials to keep the scoring accuracy. The derived potentials, referred to as ITScore/Pro, have been validated using three diverse benchmarks: the high-resolution decoy set, the AMBER benchmark decoy set, and the CASP8 decoy set. Significant improvement in performance has been achieved. Finally, comparisons between the potentials of our model and potentials of a knowledge-based scoring function with a randomized reference state have revealed the reason for the better performance of our scoring function, which could provide useful insight into the development of other physical scoring functions. The potentials developed in this study are generally applicable for structural selection in protein structure prediction.  相似文献   

3.
Structure prediction on a genomic scale requires a simplified energy function that can efficiently sample the conformational space of polypeptide chains. A good energy function at minimum should discriminate native structures against decoys. Here, we show that a recently developed, residue-specific, all-atom knowledge-based potential (167 atomic types) based on distance-scaled, finite ideal-gas reference state (DFIRE-all-atom) can be substantially simplified to 20 residue types located at side-chain center of mass (DFIRE-SCM) without a significant change in its capability of structure discrimination. Using 96 standard multiple decoy sets, we show that there is only a small reduction (from 80% to 78%) in success rate of ranking native structures as the top 1. The success rate is higher than two previously developed, all-atom distance-dependent statistical pair potentials. Applied to structure selections of 21 docking decoys without modification, the DFIRE-SCM potential is 29% more successful in recognizing native complex structures than an all-atom statistical potential trained by a database of dimeric interfaces. The potential also achieves 92% accuracy in distinguishing true dimeric interfaces from artificial crystal interfaces. In addition, the DFIRE potential with the C(alpha) positions as the interaction centers recognizes 123 native structures out of a comprehensive 125-protein TOUCHSTONE decoy set in which each protein has 24,000 decoys with only C(alpha) positions. Furthermore, the performance by DFIRE-SCM on newly established 25 monomeric and 31 docking Rosetta-decoy sets is comparable to (or better than in the case of monomeric decoy sets) that of a recently developed, all-atom Rosetta energy function enhanced with an orientation-dependent hydrogen bonding potential.  相似文献   

4.
Protein decoy data sets provide a benchmark for testing scoring functions designed for fold recognition and protein homology modeling problems. It is commonly believed that statistical potentials based on reduced atomic models are better able to discriminate native-like from misfolded decoys than scoring functions based on more detailed molecular mechanics models. Recent benchmark tests on small data sets, however, suggest otherwise. In this work, we report the results of extensive decoy detection tests using an effective free energy function based on the OPLS all-atom (OPLS-AA) force field and the Surface Generalized Born (SGB) model for the solvent electrostatic effects. The OPLS-AA/SGB effective free energy is used as a scoring function to detect native protein folds among a total of 48,832 decoys for 32 different proteins from Park and Levitt's 4-state-reduced, Levitt's local-minima, Baker's ROSETTA all-atom, and Skolnick's decoy sets. Solvent electrostatic effects are included through the Surface Generalized Born (SGB) model. All structures are locally minimized without restraints. From an analysis of the individual energy components of the OPLS-AA/SGB energy function for the native and the best-ranked decoy, it is determined that a balance of the terms of the potential is responsible for the minimized energies that most successfully distinguish the native from the misfolded conformations. Different combinations of individual energy terms provide less discrimination than the total energy. The results are consistent with observations that all-atom molecular potentials coupled with intermediate level solvent dielectric models are competitive with knowledge-based potentials for decoy detection and protein modeling problems such as fold recognition and homology modeling.  相似文献   

5.
Hsieh MJ  Luo R 《Proteins》2004,56(3):475-486
A well-behaved physics-based all-atom scoring function for protein structure prediction is analyzed with several widely used all-atom decoy sets. The scoring function, termed AMBER/Poisson-Boltzmann (PB), is based on a refined AMBER force field for intramolecular interactions and an efficient PB model for solvation interactions. Testing on the chosen decoy sets shows that the scoring function, which is designed to consider detailed chemical environments, is able to consistently discriminate all 62 native crystal structures after considering the heteroatom groups, disulfide bonds, and crystal packing effects that are not included in the decoy structures. When NMR structures are considered in the testing, the scoring function is able to discriminate 8 out of 10 targets. In the more challenging test of selecting near-native structures, the scoring function also performs very well: for the majority of the targets studied, the scoring function is able to select decoys that are close to the corresponding native structures as evaluated by ranking numbers and backbone Calpha root mean square deviations. Various important components of the scoring function are also studied to understand their discriminative contributions toward the rankings of native and near-native structures. It is found that neither the nonpolar solvation energy as modeled by the surface area model nor a higher protein dielectric constant improves its discriminative power. The terms remaining to be improved are related to 1-4 interactions. The most troublesome term is found to be the large and highly fluctuating 1-4 electrostatics term, not the dihedral-angle term. These data support ongoing efforts in the community to develop protein structure prediction methods with physics-based potentials that are competitive with knowledge-based potentials.  相似文献   

6.
The conformations of loops are determined by the water-mediated interactions between amino acid residues. Energy functions that describe the interactions can be derived either from physical principles (physical-based energy function) or statistical analysis of known protein structures (knowledge-based statistical potentials). It is commonly believed that statistical potentials are appropriate for coarse-grained representation of proteins but are not as accurate as physical-based potentials when atomic resolution is required. Several recent applications of physical-based energy functions to loop selections appear to support this view. In this article, we apply a recently developed DFIRE-based statistical potential to three different loop decoy sets (RAPPER, Jacobson, and Forrest-Woolf sets). Together with a rotamer library for side-chain optimization, the performance of DFIRE-based potential in the RAPPER decoy set (385 loop targets) is comparable to that of AMBER/GBSA for short loops (two to eight residues). The DFIRE is more accurate for longer loops (9 to 12 residues). Similar trend is observed when comparing DFIRE with another physical-based OPLS/SGB-NP energy function in the large Jacobson decoy set (788 loop targets). In the Forrest-Woolf decoy set for the loops of membrane proteins, the DFIRE potential performs substantially better than the combination of the CHARMM force field with several solvation models. The results suggest that a single-term DFIRE-statistical energy function can provide an accurate loop prediction at a fraction of computing cost required for more complicate physical-based energy functions. A Web server for academic users is established for loop selection at the softwares/services section of the Web site http://theory.med.buffalo.edu/.  相似文献   

7.
Statistical potential for assessment and prediction of protein structures   总被引:2,自引:0,他引:2  
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.  相似文献   

8.
We developed a series of statistical potentials to recognize the native protein from decoys, particularly when using only a reduced representation in which each side chain is treated as a single C(beta) atom. Beginning with a highly successful all-atom statistical potential, the Discrete Optimized Protein Energy function (DOPE), we considered the implications of including additional information in the all-atom statistical potential and subsequently reducing to the C(beta) representation. One of the potentials includes interaction energies conditional on backbone geometries. A second potential separates sequence local from sequence nonlocal interactions and introduces a novel reference state for the sequence local interactions. The resultant potentials perform better than the original DOPE statistical potential in decoy identification. Moreover, even upon passing to a reduced C(beta) representation, these statistical potentials outscore the original (all-atom) DOPE potential in identifying native states for sets of decoys. Interestingly, the backbone-dependent statistical potential is shown to retain nearly all of the information content of the all-atom representation in the C(beta) representation. In addition, these new statistical potentials are combined with existing potentials to model hydrogen bonding, torsion energies, and solvation energies to produce even better performing potentials. The ability of the C(beta) statistical potentials to accurately represent protein interactions bodes well for computational efficiency in protein folding calculations using reduced backbone representations, while the extensions to DOPE illustrate general principles for improving knowledge-based potentials.  相似文献   

9.
An accurate scoring function is a key component for successful protein structure prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled structures that are close to native (all within ∼4.0 Å) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.  相似文献   

10.
We describe the construction of a scoring function designed to model the free energy of protein folding. An optimization technique is used to determine the best functional forms of the hydrophobic, residue-residue and hydrogen-bonding components of the potential. The scoring function is expanded by use of Chebyshev polynomials, the coefficients of which are determined by minimizing the score, in units of standard deviation, of native structures in the ensembles of alternate decoy conformations. The derived effective potential is then tested on decoy sets used conventionally in such studies. Using our scoring function, we achieve a high level of discrimination between correct and incorrect folds. In addition, our method is able to represent functions of arbitrary shape with fewer parameters than the usual histogram potentials of similar resolution. Finally, our representation can be combined easily with many optimization methods, because the total energy is a linear function of the parameters. Our results show that the techniques of Z-score optimization and Chebyshev expansion work well.  相似文献   

11.
One of the common methods for assessing energy functions of proteins is selection of native or near-native structures from decoys. This is an efficient but indirect test of the energy functions because decoy structures are typically generated either by sampling procedures or by a separate energy function. As a result, these decoys may not contain the global minimum structure that reflects the true folding accuracy of the energy functions. This paper proposes to assess energy functions by ab initio refolding of fully unfolded terminal segments with secondary structures while keeping the rest of the proteins fixed in their native conformations. Global energy minimization of these short unfolded segments, a challenging yet tractable problem, is a direct test of the energy functions. As an illustrative example, refolding terminal segments is employed to assess two closely related all-atom statistical energy functions, DFIRE (distance-scaled, finite, ideal-gas reference state) and DOPE (discrete optimized protein energy). We found that a simple sequence-position dependence contained in the DOPE energy function leads to an intrinsic bias toward the formation of helical structures. Meanwhile, a finer statistical treatment of short-range interactions yields a significant improvement in the accuracy of segment refolding by DFIRE. The updated DFIRE energy function yields success rates of 100% and 67%, respectively, for its ability to sample and fold fully unfolded terminal segments of 15 proteins to within 3.5 A global root-mean-squared distance from the corresponding native structures. The updated DFIRE energy function is available as DFIRE 2.0 upon request.  相似文献   

12.
13.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.  相似文献   

14.
Here we report an orientation-dependent statistical all-atom potential derived from side-chain packing, named OPUS-PSP. It features a basis set of 19 rigid-body blocks extracted from the chemical structures of all 20 amino acid residues. The potential is generated from the orientation-specific packing statistics of pairs of those blocks in a non-redundant structural database. The purpose of such an approach is to capture the essential elements of orientation dependence in molecular packing interactions. Tests of OPUS-PSP on commonly used decoy sets demonstrate that it significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and consistency in achieving high Z-scores across decoy sets. As OPUS-PSP excludes interactions among main-chain atoms, its success highlights the crucial importance of side-chain packing in forming native protein structures. Moreover, OPUS-PSP does not explicitly include solvation terms, and thus the potential should perform well when the solvation effect is difficult to determine, such as in membrane proteins. Overall, OPUS-PSP is a generally applicable potential for protein structure modeling, especially for handling side-chain conformations, one of the most difficult steps in high-accuracy protein structure prediction and refinement.  相似文献   

15.
Abstract

Arriving at the native conformation of a polypeptide chain characterized by minimum most free energy is a problem of long standing interest in protein structure prediction endeavors. Owing to the computational requirements in developing free energy estimates, scoring functions—energy based or statistical—have received considerable renewed attention in recent years for distinguishing native structures of proteins from non-native like structures. Several cleverly designed decoy sets, CASP (Critical Assessment of Techniques for Protein Structure Prediction) structures and homology based internet accessible three dimensional model builders are now available for validating the scoring functions. We describe here an all-atom energy based empirical scoring function and examine its performance on a wide series of publicly available decoys. Barring two protein sequences where native structure is ranked second and seventh, native is identified as the lowest energy structure in 67 protein sequences from among 61,659 decoys belonging to 12 different decoy sets. We further illustrate a potential application of the scoring function in bracketing native-like structures of two small mixed alpha/beta globular proteins starting from sequence and secondary structural information. The scoring function has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp  相似文献   

16.
The DOcking decoy‐based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance‐dependent atom‐pair interactions. To optimize the atom‐pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand–receptor systems (or just pairs). Thus, a total of 8609 ligand–receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand–receptor systems, 1000 evenly sampled docking decoys with 0–10 Å interface root‐mean‐square‐deviation (iRMSD) were generated with a method used before for protein–protein docking. A neural network‐based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel‐like energy landscape for the interaction between these hypothetical ligand–receptor systems. Thus, our method hierarchically models the overall funnel‐like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom‐pair‐based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation‐dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand–receptor systems and their decoys are freely available at http://agknapp.chemie.fu‐berlin.de/doop/ . Proteins 2015; 83:881–890. © 2015 Wiley Periodicals, Inc.  相似文献   

17.
Song Y  Tyka M  Leaver-Fay A  Thompson J  Baker D 《Proteins》2011,79(6):1898-1909
Accurate modeling of biomolecular systems requires accurate forcefields. Widely used molecular mechanics (MM) forcefields obtain parameters from experimental data and quantum chemistry calculations on small molecules but do not have a clear way to take advantage of the information in high-resolution macromolecular structures. In contrast, knowledge-based methods largely ignore the physical chemistry of interatomic interactions, and instead derive parameters almost exclusively from macromolecular structures. This can involve considerable double counting of the same physical interactions. Here, we describe a method for forcefield improvement that combines the strengths of the two approaches. We use this method to improve the Rosetta all-atom forcefield, in which the total energy is expressed as the sum of terms representing different physical interactions as in MM forcefields and the parameters are tuned to reproduce the properties of macromolecular structures. To resolve inaccuracies resulting from possible double counting of interactions, we compare distribution functions from low-energy modeled structures to those from crystal structures. The structural and physical bases of the deviations between the modeled and reference structures are identified and used to guide forcefield improvements. We describe improvements resolving double counting between backbone hydrogen bond interactions and Lennard-Jones interactions in helices; between sidechain-backbone hydrogen bonds and the backbone torsion potential; and between the sidechain torsion potential and Lennard-Jones interactions. Discrepancies between computed and observed distributions are also used to guide the incorporation of an explicit Cα-hydrogen bond in β sheets. The method can be used generally to integrate different sources of information for forcefield improvement.  相似文献   

18.

Background

Previous studies on protein-DNA interaction mostly focused on the bound structure of DNA-binding proteins but few paid enough attention to the unbound structures. As more new proteins are discovered, it is useful and imperative to develop algorithms for the functional prediction of unbound proteins. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and extract useful statistical and geometric features, and use structural alignment and support vector machines for the prediction of unbound DNA-binding proteins.

Results

The performance of our method is evaluated by discriminating a set of 104 DNA-binding proteins from 401 non-DNA-binding proteins. In the same test, the proposed method outperforms the other method using conditional probability. The results achieved by our proposed method for; precision, 83.33%; accuracy, 86.53%; and MCC, 0.5368 demonstrate its good performance.

Conclusions

In this study we develop an effective method for the prediction of protein-DNA interactions based on statistical and geometric features and support vector machines. Our results show that interface surface features play an important role in protein-DNA interaction. Our technique is able to predict unbound DNA-binding protein and discriminatory DNA-binding proteins from proteins that bind with other molecules.
  相似文献   

19.
Arriving at the native conformation of a polypeptide chain characterized by minimum most free energy is a problem of long standing interest in protein structure prediction endeavors. Owing to the computational requirements in developing free energy estimates, scoring functions--energy based or statistical--have received considerable renewed attention in recent years for distinguishing native structures of proteins from non-native like structures. Several cleverly designed decoy sets, CASP (Critical Assessment of Techniques for Protein Structure Prediction) structures and homology based internet accessible three dimensional model builders are now available for validating the scoring functions. We describe here an all-atom energy based empirical scoring function and examine its performance on a wide series of publicly available decoys. Barring two protein sequences where native structure is ranked second and seventh, native is identified as the lowest energy structure in 67 protein sequences from among 61,659 decoys belonging to 12 different decoy sets. We further illustrate a potential application of the scoring function in bracketing native-like structures of two small mixed alpha/beta globular proteins starting from sequence and secondary structural information. The scoring function has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp.  相似文献   

20.
Tobi D  Elber R 《Proteins》2000,41(1):40-46
The results of an optimization of a folding potential are reported. The complete energy function is modeled as a sum of pairwise interactions with a flexible functional form. The relevant distance between two amino acids (2 - 9 A) is divided into 13 intervals, and the energy of each interval is optimized independently. We show, in accord with a previous publication (Tobi et al., Proteins 2000;40:71-85) that it is impossible to find a pair potential with the above flexible form that recognizes all native folds. Nevertheless, a potential that rates correctly a subset of the decoy structures was constructed and optimized. The resulting potential is compared with a distance-dependent statistical potential of Bahar and Jernigan. It is further tested against decoy structures that were created in the Levitt's group. On average, the new potential places native shapes lower in energy and provides higher Z scores than other potentials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号