首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Catalytic activity and protein-protein recognition have proven to be significant challenges for computational protein design. Electrostatic interactions are crucial for these and other protein functions, and therefore accurate modeling of electrostatics is necessary for successfully advancing protein design into the realm of protein function. This review focuses on recent progress in modeling electrostatic interactions in computational protein design, with particular emphasis on continuum models.  相似文献   

2.
Deep learning approaches have produced substantial breakthroughs in fields such as image classification and natural language processing and are making rapid inroads in the area of protein design. Many generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins. Those generative models can learn protein representations that are often more informative of protein structure and function than hand-engineered features. Furthermore, they can be used to quickly propose millions of novel proteins that resemble the native counterparts in terms of expression level, stability, or other attributes. The protein design process can further be guided by discriminative oracles to select candidates with the highest probability of having the desired properties. In this review, we discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.  相似文献   

3.
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.  相似文献   

4.
A long-standing goal of computational protein design is to create proteins similar to those found in Nature. One motivation is to harness the exquisite functional capabilities of proteins for our own purposes. The extent of similarity between designed and natural proteins also reports on how faithfully our models represent the selective pressures that determine protein sequences. As the field of protein design shifts emphasis from reproducing native-like protein structure to function, it has become important that these models treat the notion of specificity in molecular interactions. Although specificity may, in some cases, be achieved by optimization of a desired protein in isolation, methods have been developed to address directly the desire for proteins that exhibit specific functions and interactions.  相似文献   

5.
Functional genomics and proteomics are identifying many potential drug targets for novel therapeutic proteins, and both rational and combinatorial protein engineering methods are available for creating drug candidates. A central challenge is the definition of the most appropriate design criteria, which will benefit critically from computational kinetic models that incorporate integration from the molecular level to the whole systems level. Interpretation of these processes will require mathematical models that are refined in combination with relevant data derived from quantitative assays, to correctly set biophysical objectives for protein design.  相似文献   

6.
Computational protein design can generate proteins not found in nature that adopt desired structures and perform novel functions. Although proteins could, in theory, be designed with ab initio methods, practical success has come from using large amounts of data that describe the sequences, structures, and functions of existing proteins and their variants. We present recent creative uses of multiple-sequence alignments, protein structures, and high-throughput functional assays in computational protein design. Approaches range from enhancing structure-based design with experimental data to building regression models to training deep neural nets that generate novel sequences. Looking ahead, deep learning will be increasingly important for maximizing the value of data for protein design.  相似文献   

7.
Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of < 2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.  相似文献   

8.
Different potential energy functions have predominated in protein dynamics simulations, protein design calculations, and protein structure prediction. Clearly, the same physics applies in all three cases. The differences in potential energy functions reflect differences in how the calculations are performed. With improvements in computer power and algorithms, the same potential energy function should be applicable to all three problems. In this review, we examine energy functions currently used for protein design, and look to the molecular mechanics field for advances that could be used in the next generation of design algorithms. In particular, we focus on improved models of the hydrophobic effect, polarization and hydrogen bonding.  相似文献   

9.
The design of novel metal‐ion binding sites along symmetric axes in protein oligomers could provide new avenues for metalloenzyme design, construction of protein‐based nanomaterials and novel ion transport systems. Here, we describe a computational design method, symmetric protein recursive ion‐cofactor sampling (SyPRIS), for locating constellations of backbone positions within oligomeric protein structures that are capable of supporting desired symmetrically coordinated metal ion(s) chelated by sidechains (chelant model). Using SyPRIS on a curated benchmark set of protein structures with symmetric metal binding sites, we found high recovery of native metal coordinating rotamers: in 65 of the 67 (97.0%) cases, native rotamers featured in the best scoring model while in the remaining cases native rotamers were found within the top three scoring models. In a second test, chelant models were crossmatched against protein structures with identical cyclic symmetry. In addition to recovering all native placements, 10.4% (8939/86013) of the non‐native placements, had acceptable geometric compatibility scores. Discrimination between native and non‐native metal site placements was further enhanced upon constrained energy minimization using the Rosetta energy function. Upon sequence design of the surrounding first‐shell residues, we found further stabilization of native placements and a small but significant (1.7%) number of non‐native placement‐based sites with favorable Rosetta energies, indicating their designability in existing protein interfaces. The generality of the SyPRIS approach allows design of novel symmetric metal sites including with non‐natural amino acid sidechains, and should enable the predictive incorporation of a variety of metal‐containing cofactors at symmetric protein interfaces.  相似文献   

10.
The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.  相似文献   

11.
提出一个简单有效的蛋白质设计方法,这一方法完全基于物理学原理. 与同类工作相比,该方法在很大程度上可节省对序列空间进行的搜索,是对同类工作的简化与发展. 对三个平面格子模型进行的检验表明该方法是成功的. 该方法可进一步用于真实蛋白质的三维非格子模型.  相似文献   

12.
Many different types of generative models for protein sequences have been proposed in literature. Their uses include the prediction of mutational effects, protein design and the prediction of structural properties. Neural network (NN) architectures have shown great performances, commonly attributed to the capacity to extract non-trivial higher-order interactions from the data. In this work, we analyze two different NN models and assess how close they are to simple pairwise distributions, which have been used in the past for similar problems. We present an approach for extracting pairwise models from more complex ones using an energy-based modeling framework. We show that for the tested models the extracted pairwise models can replicate the energies of the original models and are also close in performance in tasks like mutational effect prediction. In addition, we show that even simpler, factorized models often come close in performance to the original models.  相似文献   

13.
Protein folding research during the past decade has emphasized the dominant role of native state topology in determining the speed and mechanism of folding for small proteins; this has been illustrated by simulations using minimalist protein models. The advantages of minimalist protein models lie in their ability to rapidly collect meaningful statistics about folding pathways and kinetics, their ease of characterization with coarse-grained order parameters and their concentration on the essential physics of the problem to connect with experimental observables for a target protein. The maturation of experimental protein folding has driven the need for more quantitative protein simulations to better understand the balance between sequence details and fold topology. In the past year, we have seen the emergence of more complex minimalist models, ranging from all-atom Gō potentials to coarse-grained bead models in which Gō interactions are replaced or supplemented by more physically motivated potentials. The reduced computational cost at the coarse-grained level of abstraction will potentially enable both folding studies on a genomic scale and systematic application in protein design.  相似文献   

14.
Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available.  相似文献   

15.
Repeat proteins are ubiquitous and are involved in a myriad of essential processes. They are typically non-globular structures that act as diverse scaffolds for the mediation of protein-protein interactions. These excitingly different structures, which arise from tandem arrays of a repeated structural motif, have generated significant interest with respect to protein engineering and design. Recent advances have been made in the design and characterisation of repeat proteins. The highlights include re-engineering of binding specificity, quantitative models of repeat protein stability and kinetic studies of repeat protein folding.  相似文献   

16.
The development of the EGAD program and energy function for protein design is described. In contrast to most protein design methods, which require several empirical parameters or heuristics such as patterning of residues or rotamers, EGAD has a minimalist philosophy; it uses very few empirical factors to account for inaccuracies resulting from the use of fixed backbones and discrete rotamers in protein design calculations, and describes the unfolded state, aggregates, and alternative conformers explicitly with physical models instead of fitted parameters. This approach unveils important issues in protein design that are often camouflaged by heuristic-emphasizing methods. Inter-atom energies are modeled with the OPLS-AA all-atom forcefield, electrostatics with the generalized Born continuum model, and the hydrophobic effect with a solvent-accessible surface area-dependent term. Experimental characterization of proteins designed with an unmodified version of the energy function revealed problems with under-packing, stability, aggregation, and structural specificity. Under-packing was addressed by modifying the van der Waals function. By optimizing only three parameters, the effects of >400 mutations on protein-protein complex formation were predicted to within 1.0 kcal mol(-1). As an independent test, this modified energy function was used to predict the stabilities of >1500 mutants to within 1.0 kcal mol(-1); this required a physical model of the unfolded state that includes more interactions than traditional tripeptide-based models. Solubility and structural specificity were addressed with simple physical approximations of aggregation and conformational equilibria. The complete energy function can design protein sequences that have high levels of identity with their natural counterparts, and have predicted structural properties more consistent with soluble and uniquely folded proteins than the initial designs.  相似文献   

17.
Mathematical modeling was used to evaluate experimental data for bacterial binding protein-dependent transport systems. Two simple models were considered in which ligand-free periplasmic binding protein interacts with the membrane-bound components of transport. In one, this interaction was viewed as a competition with the ligand-bound binding protein, whereas in the other, it was considered to be a consequence of the complexes formed during the transport process itself. Two sets of kinetic parameters were derived for each model that fit the available experimental results for the maltose system. By contrast, a model that omitted the interaction of ligand-free binding protein did not fit the experimental data. Some applications of the successful models for the interpretation of existing mutant data are illustrated, as well as the possibilities of using mutant data to test the original models and sets of kinetic parameters. Practical suggestions are given for further experimental design.  相似文献   

18.
Structure prediction and computational protein design should benefit from accurate solvent models. We have applied implicit solvent models to two problems that are central to this area. First, we performed sidechain placement for 29 proteins, using a solvent model that combines a screened Coulomb term with an Accessible Surface Area term (CASA model). With optimized parameters, the prediction quality is comparable with earlier work that omitted electrostatics and solvation altogether. Second, we computed the stability changes associated with point mutations involving ionized sidechains. For over 1000 mutations, including many fully or partly buried positions, we compared CASA and two generalized Born models (GB) with a more accurate model, which solves the Poisson equation of continuum electrostatics numerically. CASA predicts the correct sign and order of magnitude of the stability change for 81% of the mutations, compared to 97% with the best GB. We also considered 140 mutations for which experimental data are available. Comparing to experiment requires additional assumptions about the unfolded protein structure, protein relaxation in response to the mutations, and contributions from the hydrophobic effect. With a simple, commonly-used unfolded state model, the mean unsigned error is 2.1 kcal/mol with both CASA and the best GB. Overall, the electrostatic model is not important for sidechain placement; CASA and GB are equivalent for surface mutations, while GB is far superior for fully or partly buried positions. Thus, for problems like protein design that involve all these aspects, the most recent GB models represent an important step forward. Along with the recent discovery of efficient, pairwise implementations of GB, this will open new possibilities for the computational engineering of proteins.  相似文献   

19.
Automated minimization of steric clashes in protein structures   总被引:1,自引:0,他引:1  
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two nonbonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated, and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared with other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.  相似文献   

20.
Knowledge-based potentials are statistical parameters derived from databases of known protein properties that empirically capture aspects of the physical chemistry of protein structure and function. These potentials play a key role in protein design by improving the accuracy of physics-based models of interatomic interactions and enhancing the computational efficiency of the design process by limiting the complexity of searching sequence space. Recently, knowledge-based potentials (in isolation or in combination with physics-based potentials) have been applied to the modification of existing protein function, the redesign of natural protein folds and the complete design of a non-natural protein fold. In addition, knowledge-based potentials appear to be providing important information about the global topology of amino acid interactions in natural proteins. A detailed study of the methods and products of these protein design efforts promises to greatly expand our understanding of proteins and the evolutionary process that created them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号