首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
In order to investigate the level of representation required to simulate folding and predict structure, we test the ability of a variety of reduced representations to identify native states in decoy libraries and to recover the native structure given the advanced knowledge of the very broad native Ramachandran basin assignments. Simplifications include the removal of the entire side-chain or the retention of only the Cbeta atoms. Scoring functions are derived from an all-atom statistical potential that distinguishes between atoms and different residue types. Structures are obtained by minimizing the scoring function with a computationally rapid simulated annealing algorithm. Results are compared for simulations in which backbone conformations are sampled from a Protein Data Bank-based backbone rotamer library generated by either ignoring or including a dependence on the identity and conformation of the neighboring residues. Only when the Cbeta atoms and nearest neighbor effects are included do the lowest energy structures generally fall within 4 A of the native backbone root-mean square deviation (RMSD), despite the initial configuration being highly expanded with an average RMSD > or = 10 A. The side-chains are reinserted into the Cbeta models with minimal steric clash. Therefore, the detailed, all-atom information lost in descending to a Cbeta-level representation is recaptured to a large measure using backbone dihedral angle sampling that includes nearest neighbor effects and an appropriate scoring function.  相似文献   

2.
Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein ‘structure prediction’ problem.  相似文献   

3.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.  相似文献   

4.
5.
We develop coarse-grained, distance- and orientation-dependent statistical potentials from the growing protein structural databases. For protein structural classes (alpha, beta, and alpha/beta), a substantial number of backbone-backbone and backbone-side-chain contacts stabilize the native folds. By taking into account the importance of backbone interactions with a virtual backbone interaction center as the 21st anisotropic site, we construct a 21 x 21 interaction scheme. The new potentials are studied using spherical harmonics analysis (SHA) and a smooth, continuous version is constructed using spherical harmonic synthesis (SHS). Our approach has the following advantages: (1) The smooth, continuous form of the resulting potentials is more realistic and presents significant advantages for computational simulations, and (2) with SHS, the potential values can be computed efficiently for arbitrary coordinates, requiring only the knowledge of a few spherical harmonic coefficients. The performance of the new orientation-dependent potentials was tested using a standard database of decoy structures. The results show that the ability of the new orientation-dependent potentials to recognize native protein folds from a set of decoy structures is strongly enhanced by the inclusion of anisotropic backbone interaction centers. The anisotropic potentials can be used to develop realistic coarse-grained simulations of proteins, with direct applications to protein design, folding, and aggregation.  相似文献   

6.
7.
Guang Hu  Bairong Shen 《Proteins》2014,82(4):556-564
An accurate score function for detecting the most native‐like models among a huge number of decoy sets is essential to the protein structure prediction. In this work, we developed a novel integrated score function (SVR_CAF) to discriminate native structures from decoys, as well as to rank near‐native structures and select best decoys when native structures are absent. SVR_CAF is a machine learning score, which incorporates the contact energy based score ( C E_score), amino acid network based score ( A AN_score), and the fast Fourier transform based score ( F FT_score). The score function was evaluated with four decoy sets for its discriminative ability and it shows higher overall performance than the state‐of‐the‐art score functions. Proteins 2014; 82:556–564. © 2013 Wiley Periodicals, Inc.  相似文献   

8.
Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein 'structure prediction' problem.  相似文献   

9.
An accurate scoring function is a key component for successful protein structure prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled structures that are close to native (all within ∼4.0 Å) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.  相似文献   

10.
Statistical potential for assessment and prediction of protein structures   总被引:2,自引:0,他引:2  
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.  相似文献   

11.
pi-pi, Cation-pi, and hydrophobic packing interactions contribute specificity to protein folding and stability to the native state. As a step towards developing improved models of these interactions in proteins, we compare the side-chain packing arrangements in native proteins to those found in compact decoys produced by the Rosetta de novo structure prediction method. We find enrichments in the native distributions for T-shaped and parallel offset arrangements of aromatic residue pairs, in parallel stacked arrangements of cation-aromatic pairs, in parallel stacked pairs involving proline residues, and in parallel offset arrangements for aliphatic residue pairs. We then investigate the extent to which the distinctive features of native packing can be explained using Lennard-Jones and electrostatics models. Finally, we derive orientation-dependent pi-pi, cation-pi and hydrophobic interaction potentials based on the differences between the native and compact decoy distributions and investigate their efficacy for high-resolution protein structure prediction. Surprisingly, the orientation-dependent potential derived from the packing arrangements of aliphatic side-chain pairs distinguishes the native structure from compact decoys better than the orientation-dependent potentials describing pi-pi and cation-pi interactions.  相似文献   

12.
Protein structure refinement by optimization   总被引:1,自引:0,他引:1       下载免费PDF全文
Martin Carlsen  Peter Røgen 《Proteins》2015,83(9):1616-1624
Knowledge‐based protein potentials are simplified potentials designed to improve the quality of protein models, which is important as more accurate models are more useful for biological and pharmaceutical studies. Consequently, knowledge‐based potentials often are designed to be efficient in ordering a given set of deformed structures denoted decoys according to how close they are to the relevant native protein structure. This, however, does not necessarily imply that energy minimization of this potential will bring the decoys closer to the native structure. In this study, we introduce an iterative strategy to improve the convergence of decoy structures. It works by adding energy optimized decoys to the pool of decoys used to construct the next and improved knowledge‐based potential. We demonstrate that this strategy results in significantly improved decoy convergence on Titan high resolution decoys and refinement targets from Critical Assessment of protein Structure Prediction competitions. Our potential is formulated in Cartesian coordinates and has a fixed backbone potential to restricts motions to be close to those of a dihedral model, a fixed hydrogen‐bonding potential and a variable coarse grained carbon alpha potential consisting of a pair potential and a novel solvent potential that are b‐spline based as we use explicit gradient and Hessian for efficient energy optimization. Proteins 2015; 83:1616–1624. © 2015 Wiley Periodicals, Inc.  相似文献   

13.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

14.
Mirzaie M  Sadeghi M 《Proteins》2012,80(3):683-690
We have recently introduced a novel model for discriminating the correctly folded proteins from well-designed decoy structures using mechanical interatomic forces. In the model, we considered a protein as a collection of springs and the force imposed to each atom was calculated by using the relation between the potential energy and the force. A mean force potential function is obtained from statistical contact preferences within the known protein structures. In this article, the interatomic forces are calculated by numerical derivation of the potential function. For assessing the knowledge-based force function we consider an optimal structure and define a score function on the 3D structure of a protein. We compare the force imposed to each atom of a protein with the corresponding atom in the optimum structure. Afterwards we assign larger scores to those atoms with the lower forces. The total score is the sum of partial scores of atoms. The optimal structure is assumed to be the one with the highest score in the dataset. Finally, several decoy sets are applied in order to evaluate the performance of our model.  相似文献   

15.
We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.  相似文献   

16.
A new computational approach for real protein folding prediction   总被引:4,自引:0,他引:4  
An effective and fast minimization approach is proposed for the prediction of protein folding, in which the 'relative entropy' is used as a minimization function and the off-lattice model is used. In this approach, we only use the information of distances between the consecutive Calpha atoms along the peptide chain and a generalized form of the contact potential for 20 types of amino acids. Tests of the algorithm are performed on the real proteins. The root mean square deviations of the structures of eight folded target proteins versus the native structures are in a reasonable range. In principle, this method is an improvement on the energy minimization approach.  相似文献   

17.
Li X  Hu C  Liang J 《Proteins》2003,53(4):792-805
Protein representation and potential function are two important ingredients for studying protein folding, equilibrium thermodynamics, and sequence design. We introduce a novel geometric representation of protein contact interactions using the edge simplices from the alpha shape of the protein structure. This representation can eliminate implausible neighbors that are not in physical contact, and can avoid spurious contact between two residues when a third residue is between them. We developed statistical alpha contact potential using an odds-ratio model. A studentized bootstrap method was then introduced to assess the 95% confidence intervals for each of the 210 propensity parameters. We found, with confidence, that there is significant long-range propensity (>30 residues apart) for hydrophobic interactions. We tested alpha contact potential for native structure discrimination using several sets of decoy structures, and found that it often performs comparably with atom-based potentials requiring many more parameters. We also show that accurate geometric representation is important, and that alpha contact potential has better performance than potential defined by cutoff distance between geometric centers of side chains. Hierarchical clustering of alpha contact potentials reveals natural grouping of residues. To explore the relationship between shape and physicochemical representations, we tested the minimum alphabet size necessary for native structure discrimination. We found that there is no significant difference in performance of discrimination when alphabet size varies from 7 to 20, if geometry is represented accurately by alpha simplicial edges. This result suggests that the geometry of packing plays an important role, but the specific residue types are often interchangeable.  相似文献   

18.
Statistical potentials based on pairwise interactions between C alpha atoms are commonly used in protein threading/fold-recognition attempts. Inclusion of higher order interaction is a possible means of improving the specificity of these potentials. Delaunay tessellation of the C alpha-atom representation of protein structure has been suggested as a means of defining multi-body interactions. A large number of parameters are required to define all four-body interactions of 20 amino acid types (20(4) = 160,000). Assuming that residue order within a four-body contact is irrelevant reduces this to a manageable 8,855 parameters, using a nonredundant dataset of 608 protein structures. Three lines of evidence support the significance and utility of the four-body potential for sequence-structure matching. First, compared to the four-body model, all lower-order interaction models (three-body, two-body, one-body) are found statistically inadequate to explain the frequency distribution of residue contacts. Second, coherent patterns of interaction are seen in a graphic presentation of the four-body potential. Many patterns have plausible biophysical explanations and are consistent across sets of residues sharing certain properties (e.g., size, hydrophobicity, or charge). Third, the utility of the multi-body potential is tested on a test set of 12 same-length pairs of proteins of known structure for two protocols: Sequence-recognizes-structure, where a query sequence is threaded (without gap) through the native and a non-native structure; and structure-recognizes-sequence, where a query structure is threaded by its native and another non-native sequence. Using cross-validated training, protein sequences correctly recognized their native structure in all 24 cases. Conversely, structures recognized the native sequence in 23 of 24 cases. Further, the score differences between correct and decoy structures increased significantly using the three- or four-body potential compared to potentials of lower order.  相似文献   

19.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

20.
We have improved the original Rosetta centroid/backbone decoy set by increasing the number of proteins and frequency of near native models and by building on sidechains and minimizing clashes. The new set consists of 1,400 model structures for 78 different and diverse protein targets and provides a challenging set for the testing and evaluation of scoring functions. We evaluated the extent to which a variety of all-atom energy functions could identify the native and close-to-native structures in the new decoy sets. Of various implicit solvent models, we found that a solvent-accessible surface area-based solvation provided the best enrichment and discrimination of close-to-native decoys. The combination of this solvation treatment with Lennard Jones terms and the original Rosetta energy provided better enrichment and discrimination than any of the individual terms. The results also highlight the differences in accuracy of NMR and X-ray crystal structures: a large energy gap was observed between native and non-native conformations for X-ray structures but not for NMR structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号