首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.  相似文献   

2.
Side-chain modeling with an optimized scoring function   总被引:1,自引:0,他引:1       下载免费PDF全文
Modeling side-chain conformations on a fixed protein backbone has a wide application in structure prediction and molecular design. Each effort in this field requires decisions about a rotamer set, scoring function, and search strategy. We have developed a new and simple scoring function, which operates on side-chain rotamers and consists of the following energy terms: contact surface, volume overlap, backbone dependency, electrostatic interactions, and desolvation energy. The weights of these energy terms were optimized to achieve the minimal average root mean square (rms) deviation between the lowest energy rotamer and real side-chain conformation on a training set of high-resolution protein structures. In the course of optimization, for every residue, its side chain was replaced by varying rotamers, whereas conformations for all other residues were kept as they appeared in the crystal structure. We obtained prediction accuracy of 90.4% for chi(1), 78.3% for chi(1 + 2), and 1.18 A overall rms deviation. Furthermore, the derived scoring function combined with a Monte Carlo search algorithm was used to place all side chains onto a protein backbone simultaneously. The average prediction accuracy was 87.9% for chi(1), 73.2% for chi(1 + 2), and 1.34 A rms deviation for 30 protein structures. Our approach was compared with available side-chain construction methods and showed improvement over the best among them: 4.4% for chi(1), 4.7% for chi(1 + 2), and 0.21 A for rms deviation. We hypothesize that the scoring function instead of the search strategy is the main obstacle in side-chain modeling. Additionally, we show that a more detailed rotamer library is expected to increase chi(1 + 2) prediction accuracy but may have little effect on chi(1) prediction accuracy.  相似文献   

3.
The performance of the self-consistent mean field theory (SCMFT) method for side-chain modeling, employing rotamer energies calculated with the flexible rotamer model (FRM), is evaluated in the context of comparative modeling of protein structure. Predictions were carried out on a test set of 56 model backbones of varying accuracy, to allow side-chain prediction accuracy to be analyzed as a function of backbone accuracy. A progressive decrease in the accuracy of prediction was observed as backbone accuracy decreased. However, even for very low backbone accuracy, prediction was substantially higher than random, indicating that the FRM can, in part, compensate for the errors in the modeled tertiary environment. It was also investigated whether the introduction in the FRM-SCMFT method of knowledge-based biases, derived from a backbone-dependent rotamer library, could enhance its performance. A bias derived from the backbone-dependent rotamer conformations alone did not improve prediction accuracy. However, a bias derived from the backbone-dependent rotamer probabilities improved prediction accuracy considerably. This bias was incorporated through two different strategies. In one (the indirect strategy), rotamer probabilities were used to reject unlikely rotamers a priori, thus restricting prediction by FRM-SCMFT to a subset containing only the most probable rotamers in the library. In the other (the direct strategy), rotamer energies were transformed into pseudo-energies that were added to the average potential energies of the respective rotamers, thereby creating hybrid energy-based/knowledge-based average rotamer energies, which were used by the FRM-SCMFT method for prediction. For all degrees of backbone accuracy, an optimal strength of the knowledge-based bias existed for both strategies for which predictions were more accurate than pure energy-based predictions, and also than pure knowledge-based predictions. Hybrid knowledge-based/energy-based methods were obtained from both strategies and compared with the SCWRL method, a hybrid method based on the same backbone-dependent rotamer library. The accuracy of the indirect method was approximately the same as that of the SCWRL method, but that of the direct method was significantly higher.  相似文献   

4.
We present a novel, knowledge-based method for the side-chain addition step in protein structure modeling. The foundation of the method is a conditional probability equation, which specifies the probability that a side-chain will occupy a specific rotamer state, given a set of evidence about the rotamer states adopted by the side-chains at aligned positions in structurally homologous crystal structures. We demonstrate that our method increases the accuracy of homology model side-chain addition when compared with the widely employed practice of preserving the side-chain conformation from the homology template to the target at conserved residue positions. Furthermore, we demonstrate that our method accurately estimates the probability that the correct rotamer state has been selected. This interesting result implies that our method can be used to understand the reliability of each and every side-chain in a protein homology model.  相似文献   

5.
Extending the accuracy limits of prediction for side-chain conformations   总被引:1,自引:0,他引:1  
Current techniques for the prediction of side-chain conformations on a fixed backbone have an accuracy limit of about 1.0-1.5 A rmsd for core residues. We have carried out a detailed and systematic analysis of the factors that influence the prediction of side-chain conformation and, on this basis, have succeeded in extending the limits of side-chain prediction for core residues to about 0.7 A rmsd from native, and 94 % and 89 % of chi(1) and chi(1+2 ) dihedral angles correctly predicted to within 20 degrees of native, respectively. These results are obtained using a force-field that accounts for only van der Waals interactions and torsional potentials. Prediction accuracy is strongly dependent on the rotamer library used. That is, a complete and detailed rotamer library is essential. The greatest accuracy was obtained with an extensive rotamer library, containing over 7560 members, in which bond lengths and bond angles were taken from the database rather than simply assuming idealized values. Perhaps the most surprising finding is that the combinatorial problem normally associated with the prediction of the side-chain conformation does not appear to be important. This conclusion is based on the fact that the prediction of the conformation of a single side-chain with all others fixed in their native conformations is only slightly more accurate than the simultaneous prediction of all side-chain dihedral angles.  相似文献   

6.
Combinatorial sequence optimization for protein design requires libraries of discrete side-chain conformations. The discreteness of these libraries is problematic, particularly for long, polar side chains, since favorable interactions can be missed. Previously, an approach to loop remodeling where protein backbone movement is directed by side-chain rotamers predicted to form interactions previously observed in native complexes (termed "motifs") was described. Here, we show how such motif libraries can be incorporated into combinatorial sequence optimization protocols and improve native complex recapitulation. Guided by the motif rotamer searches, we made improvements to the underlying energy function, increasing recapitulation of native interactions. To further test the methods, we carried out a comprehensive experimental scan of amino acid preferences in the I-AniI protein-DNA interface and found that many positions tolerated multiple amino acids. This sequence plasticity is not observed in the computational results because of the fixed-backbone approximation of the model. We improved modeling of this diversity by introducing DNA flexibility and reducing the convergence of the simulated annealing algorithm that drives the design process. In addition to serving as a benchmark, this extensive experimental data set provides insight into the types of interactions essential to maintain the function of this potential gene therapy reagent.  相似文献   

7.
Amir ED  Kalisman N  Keasar C 《Proteins》2008,72(1):62-73
Rotatable torsion angles are the major degrees of freedom in proteins. Adjacent angles are highly correlated and energy terms that rely on these correlations are intensively used in molecular modeling. However, the utility of torsion based terms is not yet fully exploited. Many of these terms do not capture the full scale of the correlations. Other terms, which rely on lookup tables, cannot be used in the context of force-driven algorithms because they are not fully differentiable. This study aims to extend the usability of torsion terms by presenting a set of high-dimensional and fully-differentiable energy terms that are derived from high-resolution structures. The set includes terms that describe backbone conformational probabilities and propensities, side-chain rotamer probabilities, and an elaborate term that couples all the torsion angles within the same residue. The terms are constructed by cubic spline interpolation with periodic boundary conditions that enable full differentiability and high computational efficiency. We show that the spline implementation does not compromise the accuracy of the original database statistics. We further show that the side-chain relevant terms are compatible with established rotamer probabilities. Despite their very local characteristics, the new terms are often able to identify native and native-like structures within decoy sets. Finally, force-based minimization of NMR structures with the new terms improves their torsion angle statistics with minor structural distortion (0.5 A RMSD on average). The new terms are freely available in the MESHI molecular modeling package. The spline coefficients are also available as a documented MATLAB file.  相似文献   

8.
Motivation. Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. Such a scoring function should be able to characterize the global fitness landscape of many proteins simultaneously. RESULTS: To find optimal design scoring functions, we introduce two geometric views and propose a formulation using a mixture of non-linear Gaussian kernel functions. We aim to solve a simplified protein sequence design problem. Our goal is to distinguish each native sequence for a major portion of representative protein structures from a large number of alternative decoy sequences, each a fragment from proteins of different folds. Our scoring function discriminates perfectly a set of 440 native proteins from 14 million sequence decoys. We show that no linear scoring function can succeed in this task. In a blind test of unrelated proteins, our scoring function misclassfies only 13 native proteins out of 194. This compares favorably with about three-four times more misclassifications when optimal linear functions reported in the literature are used. We also discuss how to develop protein folding scoring function.  相似文献   

9.
Here we report an orientation-dependent statistical all-atom potential derived from side-chain packing, named OPUS-PSP. It features a basis set of 19 rigid-body blocks extracted from the chemical structures of all 20 amino acid residues. The potential is generated from the orientation-specific packing statistics of pairs of those blocks in a non-redundant structural database. The purpose of such an approach is to capture the essential elements of orientation dependence in molecular packing interactions. Tests of OPUS-PSP on commonly used decoy sets demonstrate that it significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and consistency in achieving high Z-scores across decoy sets. As OPUS-PSP excludes interactions among main-chain atoms, its success highlights the crucial importance of side-chain packing in forming native protein structures. Moreover, OPUS-PSP does not explicitly include solvation terms, and thus the potential should perform well when the solvation effect is difficult to determine, such as in membrane proteins. Overall, OPUS-PSP is a generally applicable potential for protein structure modeling, especially for handling side-chain conformations, one of the most difficult steps in high-accuracy protein structure prediction and refinement.  相似文献   

10.
We introduce a new algorithm, IRECS (Iterative REduction of Conformational Space), for identifying ensembles of most probable side-chain conformations for homology modeling. On the basis of a given rotamer library, IRECS ranks all side-chain rotamers of a protein according to the probability with which each side chain adopts the respective rotamer conformation. This ranking enables the user to select small rotamer sets that are most likely to contain a near-native rotamer for each side chain. IRECS can therefore act as a fast heuristic alternative to the Dead-End-Elimination algorithm (DEE). In contrast to DEE, IRECS allows for the selection of rotamer subsets of arbitrary size, thus being able to define structure ensembles for a protein. We show that the selection of more than one rotamer per side chain is generally meaningful, since the selected rotamers represent the conformational space of flexible side chains. A knowledge-based statistical potential ROTA was constructed for the IRECS algorithm. The potential was optimized to discriminate between side-chain conformations of native and rotameric decoys of protein structures. By restricting the number of rotamers per side chain to one, IRECS can optimize side chains for a single conformation model. The average accuracy of IRECS for the chi1 and chi1+2 dihedral angles amounts to 84.7% and 71.6%, respectively, using a 40 degrees cutoff. When we compared IRECS with SCWRL and SCAP, the performance of IRECS was comparable to that of both methods. IRECS and the ROTA potential are available for download from the URL http://irecs.bioinf.mpi-inf.mpg.de.  相似文献   

11.
Zhang J  Zhang Y 《PloS one》2010,5(10):e15386

Background

An accurate potential function is essential to attack protein folding and structure prediction problems. The key to developing efficient knowledge-based potential functions is to design reference states that can appropriately counteract generic interactions. The reference states of many knowledge-based distance-dependent atomic potential functions were derived from non-interacting particles such as ideal gas, however, which ignored the inherent sequence connectivity and entropic elasticity of proteins.

Methodology

We developed a new pair-wise distance-dependent, atomic statistical potential function (RW), using an ideal random-walk chain as reference state, which was optimized on CASP models and then benchmarked on nine structural decoy sets. Second, we incorporated a new side-chain orientation-dependent energy term into RW (RWplus) and found that the side-chain packing orientation specificity can further improve the decoy recognition ability of the statistical potential.

Significance

RW and RWplus demonstrate a significantly better ability than the best performing pair-wise distance-dependent atomic potential functions in both native and near-native model selections. It has higher energy-RMSD and energy-TM-score correlations compared with other potentials of the same type in real-life structure assembly decoys. When benchmarked with a comprehensive list of publicly available potentials, RW and RWplus shows comparable performance to the state-of-the-art scoring functions, including those combining terms from multiple resources. These data demonstrate the usefulness of random-walk chain as reference states which correctly account for sequence connectivity and entropic elasticity of proteins. It shows potential usefulness in structure recognition and protein folding simulations. The RW and RWplus potentials, as well as the newly generated I-TASSER decoys, are freely available in http://zhanglab.ccmb.med.umich.edu/RW.  相似文献   

12.
The excluded volume occupied by protein side-chains and the requirement of high packing density in the protein interior should severely limit the number of side-chain conformations compatible with a given native backbone. To examine the relationship between side-chain geometry and side-chain packing, we use an all-atom Monte Carlo simulation to sample the large space of side-chain conformations. We study three models of excluded volume and use umbrella sampling to effectively explore the entire space. We find that while excluded volume constraints reduce the size of conformational space by many orders of magnitude, the number of allowed conformations is still large. An average repacked conformation has 20 % of its chi angles in a non-native state, a marked reduction from the expected 67 % in the absence of excluded volume. Interestingly, well-packed conformations with up to 50 % non-native chi angles exist. The repacked conformations have native packing density as measured by a standard Voronoi procedure. Entropy is distributed non-uniformly over positions, and we partially explain the observed distribution using rotamer probabilities derived from the Protein Data Bank database. In several cases, native rotamers that occur infrequently in the database are seen with high probability in our simulation, indicating that sequence-specific excluded volume interactions can stabilize rotamers that are rare for a given backbone. In spite of our finding that 65 % of the native rotamers and 85 % of chi(1) angles can be predicted correctly on the basis of excluded volume only, 95 % of positions can accommodate more than one rotamer in simulation. We estimate that, in order to quench the side-chain entropy observed in the presence of excluded volume interactions, other interactions (hydrophobic, polar, electrostatic) must provide an additional stabilization of at least 0.6 kT per residue in order to single out the native state.  相似文献   

13.
Hu X  Kuhlman B 《Proteins》2006,62(3):739-748
Loss of side-chain conformational entropy is an important force opposing protein folding and the relative preferences of the amino acids for being buried or solvent exposed may be partially determined by which amino acids lose more side-chain entropy when placed in the core of a protein. To investigate these preferences, we have incorporated explicit modeling of side-chain entropy into the protein design algorithm, RosettaDesign. In the standard version of the program, the energy of a particular sequence for a fixed backbone depends only on the lowest energy side-chain conformations that can be identified for that sequence. In the new model, the free energy of a single amino acid sequence is calculated by evaluating the average energy and entropy of an ensemble of structures generated by Monte Carlo sampling of amino acid side-chain conformations. To evaluate the impact of including explicit side-chain entropy, sequences were designed for 110 native protein backbones with and without the entropy model. In general, the differences between the two sets of sequences are modest, with the largest changes being observed for the longer amino acids: methionine and arginine. Overall, the identity between the designed sequences and the native sequences does not increase with the addition of entropy, unlike what is observed when other key terms are added to the model (hydrogen bonding, Lennard-Jones energies, and solvation energies). These results suggest that side-chain conformational entropy has a relatively small role in determining the preferred amino acid at each residue position in a protein.  相似文献   

14.
15.
Graph representations are traditionally used to represent protein structures in sequence design protocols in which the protein backbone conformation is known. This infrequently extends to machine learning projects: existing graph convolution algorithms have shortcomings when representing protein environments. One reason for this is the lack of emphasis on edge attributes during massage-passing operations. Another reason is the traditionally shallow nature of graph neural network architectures. Here we introduce an improved message-passing operation that is better equipped to model local kinematics problems such as protein design. Our approach, XENet, pays special attention to both incoming and outgoing edge attributes. We compare XENet against existing graph convolutions in an attempt to decrease rotamer sample counts in Rosetta’s rotamer substitution protocol, used for protein side-chain optimization and sequence design. This use case is motivating because it both reduces the size of the search space for classical side-chain optimization algorithms, and allows larger protein design problems to be solved with quantum algorithms on near-term quantum computers with limited qubit counts. XENet outperformed competing models while also displaying a greater tolerance for deeper architectures. We found that XENet was able to decrease rotamer counts by 40% without loss in quality. This decreased the memory consumption for classical pre-computation of rotamer energies in our use case by more than a factor of 3, the qubit consumption for an existing sequence design quantum algorithm by 40%, and the size of the solution space by a factor of 165. Additionally, XENet displayed an ability to handle deeper architectures than competing convolutions.  相似文献   

16.
The structure-based design of protein–ligand interfaces with respect to different small molecules is of great significance in the discovery of functional proteins. By statistical analysis of a set of protein–ligand complex structures, it was determined that water-mediated hydrogen bonding at the protein–ligand interface plays a crucial role in governing the binding between the protein and the ligand. Based on the novel statistic results, a solvated ligand rotamer approach was developed to explicitly describe the key water molecules at the protein–ligand interface and a water-mediated hydrogen bonding model was applied in the computational protein design context to complement the continuum solvent model. The solvated ligand rotamer approach produces only one additional solvated rotamer for each rotamer in the ligand rotamer library and does not change the number of side-chain rotamers at each protein design site. This has greatly reduced the total combinatorial number in sequence selection for protein design, and the accuracy of the model was confirmed by two tests. For the water placement test, 61 % of the crystal water molecules were predicted correctly in five protein-ligand complex structures. For the sequence recapitulation test, 44.7 % of the amino acid identities were recovered using the solvated ligand rotamer approach and the water-mediated hydrogen bonding model, while only 30.4 % were recovered when the explicitly bound waters were removed. These results indicated that the developed solvated ligand rotamer approach is promising for functional protein design targeting novel protein–ligand interactions.  相似文献   

17.
It is widely believed that the dominant force opposing protein folding is the entropic cost of restricting internal rotations. The energetic changes from restricting side-chain torsional motion are more complex than simply a loss of conformational entropy, however. A second force opposing protein folding arises when a side-chain in the folded state is not in its lowest-energy rotamer, giving rotameric strain. chi strain energy results from a dihedral angle being shifted from the most stable conformation of a rotamer when a protein folds. We calculated the energy of a side-chain as a function of its dihedral angles in a poly(Ala) helix. Using these energy profiles, we quantify conformational entropy, rotameric strain energy and chi strain energy for all 17 amino acid residues with side-chains in alpha-helices. We can calculate these terms for any amino acid in a helix interior in a protein, as a function of its side-chain dihedral angles, and have implemented this algorithm on a web page. The mean change in rotameric strain energy on folding is 0.42 kcal mol-1 per residue and the mean chi strain energy is 0.64 kcal mol-1 per residue. Loss of conformational entropy opposes folding by a mean of 1.1 kcal mol-1 per residue, and the mean total force opposing restricting a side-chain into a helix is 2.2 kcal mol-1. Conformational entropy estimates alone therefore greatly underestimate the forces opposing protein folding. The introduction of strain when a protein folds should not be neglected when attempting to quantify the balance of forces affecting protein stability. Consideration of rotameric strain energy may help the use of rotamer libraries in protein design and rationalise the effects of mutations where side-chain conformations change.  相似文献   

18.
Negron C  Fufezan C  Koder RL 《Proteins》2009,74(2):400-416
Helical bundles which bind heme and porphyrin cofactors have been popular targets for cofactor-containing de novo protein design. By analyzing a highly nonredundant subset of the protein databank we have determined a rotamer distribution for helical histidines bound to heme cofactors. Analysis of the entire nonredundant database for helical sequence preferences near the ligand histidine demonstrated little preference for amino acid side chain identity, size, or charge. Analysis of the database subdivided by ligand histidine rotamer, however, reveals strong preferences in each case, and computational modeling illuminates the structural basis for some of these findings. The majority of the rotamer distribution matches that predicted by molecular simulation of a single porphyrin-bound histidine residue placed in the center of an all-alanine helix, and the deviations explain two prominent features of natural heme protein binding sites: heme distortion in the case of the cytochromes C in the m166 histidine rotamer, and a highly prevalent glycine residue in the t73 histidine rotamer. These preferences permit derivation of helical consensus sequence templates which predict optimal side chain-cofactor packing interactions for each rotamer. These findings thus promise to guide future design endeavors not only in the creation of higher affinity heme and porphyrin binding sites, but also in the direction of bound cofactor geometry.  相似文献   

19.
Vicinal coupling constants between various nuclei provide backbone and side-chain conformational information for a series of asparagine- and tyrosine-containing peptides in DMSO and in H2O. By enriching Tyr of Ac-Asn-Pro-Tyr-NHMe with 15N, it has been possible to distinguish between the resonances of the two side-chain beta protons of Tyr. Analysis of the coupling constants in terms of the distributions of side-chain conformations in these peptides indicates that the addition of Asn to the Pro-Tyr sequence leads to a less random conformational distribution. When compared to the side-chain rotamer distribution of Ac-Asn-NHMe and Ac-Tyr-NHMe, particular Asn and Tyr side-chain conformations of Ac-Asn-Pro-Tyr-NHMe are stabilized in dimethylsulfoxide solution. The interaction(s) which stabilize a unique Tyr side-chain conformation of Ac-Asn-Pro-Tyr-NHMe in dimethylsulfoxide are not present in Ac-Ala-Pro-Tyr-NHMe and are unaffected by the addition of Val-Pro to the C-terminus of Asn-Pro-Tyr. In water, a preferential stabilization of one Asn side-chain conformation of Ac-Asn-Pro-Tyr-NHMe is also observed, while the Tyr side-chain rotamer distribution is similar to that of Ac-Tyr-NHMe. An interaction between the Asn side chain and the Pro-Tyr-NHMe backbone was previously shown to stabilize a beta-bend conformation at Pro-Tyr in water. Data are also presented for Ac-Tyr-Pro-Asn-NHMe, for which local interactions do not stabilize particular backbone conformations in dimethylsulfoxide or in water. The conformations of the peptides studied here are relatively insensitive to temperatures between 27 degrees and 62 degrees, both in dimethylsulfoxide and in water. The sequences Asn-Pro-Tyr and Tyr-Pro-Asn occur in ribonuclease A, and these tripeptides serve as models for the interactions involved in the folding of this protein.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号