首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Deep generative models have gained recent popularity for chemical design. Many of these models have historically operated in 2D space; however, more recently explicit 3D molecular generative models have become of interest, which are the topic of this article. Dozens of published models have been developed in the last few years to generate molecules directly in 3D, outputting both the atom types and coordinates, either in one-shot or adding atoms or fragments step-by-step. These 3D generative models can also be guided by structural information such as a binding pocket representation to successfully generate molecules with docking score ranges similar to known actives, but still showing lower computational efficiency and generation throughput than 1D/2D generative models and sometimes producing unrealistic conformations. We advocate for a unified benchmark of metrics to evaluate generation and propose perspectives to be addressed in next implementations.  相似文献   

2.
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.  相似文献   

3.
Since the first revelation of proteins functioning as macromolecular machines through their three dimensional structures, researchers have been intrigued by the marvelous ways the biochemical processes are carried out by proteins. The aspiration to understand protein structures has fueled extensive efforts across different scientific disciplines. In recent years, it has been demonstrated that proteins with new functionality or shapes can be designed via structure-based modeling methods, and the design strategies have combined all available information — but largely piece-by-piece — from sequence derived statistics to the detailed atomic-level modeling of chemical interactions. Despite the significant progress, incorporating data-derived approaches through the use of deep learning methods can be a game changer. In this review, we summarize current progress, compare the arc of developing the deep learning approaches with the conventional methods, and describe the motivation and concepts behind current strategies that may lead to potential future opportunities.  相似文献   

4.
While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used with Rosetta to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model.  相似文献   

5.
Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification.  相似文献   

6.
Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of < 2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.  相似文献   

7.
We review state-of-the-art computational methods for constructing, from image data, generative statistical models of cellular and nuclear shapes and the arrangement of subcellular structures and proteins within them. These automated approaches allow consistent analysis of images of cells for the purposes of learning the range of possible phenotypes, discriminating between them, and informing further investigation. Such models can also provide realistic geometry and initial protein locations to simulations in order to better understand cellular and subcellular processes. To determine the structures of cellular components and how proteins and other molecules are distributed among them, the generative modeling approach described here can be coupled with high throughput imaging technology to infer and represent subcellular organization from data with few a priori assumptions. We also discuss potential improvements to these methods and future directions for research.  相似文献   

8.
The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.  相似文献   

9.
The catalytic subunit of cAMP-dependent protein kinase (PKA) can easily be expressed in Escherichia coli and is catalytically active. Four phosphorylation sites are known in PKA (S10, S139, T197 and S338), and the isolated recombinant protein is a mixture of different phosphorylated forms. Obtaining uniformly phosphorylated protein requires separation of the protein preparation leading to significant loss in protein yield. It is found that the mutant S10A/S139D/S338D has similar properties as the wild-type protein, whereas additional replacement of T197 with either E or D reduces protein expression yield as well as folding propensity of the protein. Due to its high sequence homology to Akt/PKB, which cannot easily be expressed in E. coli, PKA has been used as a surrogate kinase for drug design. Several mutations within the ATP binding site have been described to make PKA even more similar to Akt/PKB. Two proteins with Akt/PKB-like mutations in the ATP binding site were made (PKAB6 and PKAB8), and in addition S10, S139 and S338 phosphorylation sites have been removed. These proteins can be expressed in high yields but have reduced activity compared to the wild-type. Proper folding of all proteins was analyzed by 2D 1H, 15N-TROSY NMR experiments.  相似文献   

10.
Many different types of generative models for protein sequences have been proposed in literature. Their uses include the prediction of mutational effects, protein design and the prediction of structural properties. Neural network (NN) architectures have shown great performances, commonly attributed to the capacity to extract non-trivial higher-order interactions from the data. In this work, we analyze two different NN models and assess how close they are to simple pairwise distributions, which have been used in the past for similar problems. We present an approach for extracting pairwise models from more complex ones using an energy-based modeling framework. We show that for the tested models the extracted pairwise models can replicate the energies of the original models and are also close in performance in tasks like mutational effect prediction. In addition, we show that even simpler, factorized models often come close in performance to the original models.  相似文献   

11.
Kristina Westerlund 《BBA》2005,1707(1):103-116
Amino-acid radical enzymes are often highly complex structures containing multiple protein subunits and cofactors. These properties have in many cases hampered the detailed characterization of their amino-acid redox cofactors. To address this problem, a range of approaches has recently been developed in which a common strategy is to reduce the complexity of the radical-containing system. This work will be reviewed and it includes the light-induced generation of aromatic radicals in small-molecule and peptide systems. Natural redox proteins, including the blue copper protein azurin and a bacterial photosynthetic reaction center, have been engineered to introduce amino-acid radical chemistry. The redesign strategies to achieve this remarkable change in the properties of these proteins will be described. An additional approach to gain insights into the properties of amino-acid radicals is to synthesize de novo designed model proteins in which the redox chemistry of these species can be studied. Here we describe the design, synthesis and characteristics of monomeric three-helix bundle and four-helix bundle proteins designed to study the redox chemistry of tryptophan and tyrosine. This work demonstrates that de novo protein design combined with structural, electrochemical and quantum chemical analyses can provide detailed information on how the protein matrix tunes the thermodynamic properties of tryptophan.  相似文献   

12.
Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at and from the PSI Structural Genomics Knowledgebase.  相似文献   

13.
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.  相似文献   

14.
《Journal of molecular biology》2019,431(24):4784-4795
Multidomain proteins often interact through several independent binding sites connected by disordered linkers. The architecture of such linkers affects avidity by modulating the effective concentration of intramolecular binding. The linker dependence of avidity has been estimated theoretically using simple physical models, but such models have not been tested experimentally because the effective concentrations could not be measured directly. We have developed a model system for bivalent protein interactions connected by disordered linkers, where the effective concentration can be measured using a competition experiment. We characterized the bivalent protein interactions kinetically and thermodynamically for a variety of linker lengths and interaction strengths. In total, this allowed us to critically assess the existing theoretical models of avidity in disordered, multivalent interactions. As expected, the onset of avidity occurs when the effective concentration reached the dissociation constant of the weakest interaction. Avidity decreased monotonously with linker length, but only by a third of what is predicted by theoretical models. We suggest that the length dependence of avidity is attenuated by compensating mechanisms such as linker interactions or entanglement. The direct role of linkers in avidity suggests they provide a generic mechanism for allosteric regulation of disordered, multivalent proteins.  相似文献   

15.
16.
Several binding scaffolds that are not based on immunoglobulins have been designed as alternatives to traditional monoclonal antibodies. Many of them have been developed to bind to folded proteins, yet cellular networks for signaling and protein trafficking often depend on binding to unfolded regions of proteins. This type of binding can thus be well described as a peptide–protein interaction. In this review, we compare different peptide-binding scaffolds, highlighting that armadillo repeat proteins (ArmRP) offer an attractive modular system, as they bind a stretch of extended peptide in a repeat-wise manner. Instead of generating each new binding molecule by an independent selection, preselected repeats – each complementary to a piece of the target peptide – could be designed and assembled on demand into a new protein, which then binds the prescribed complete peptide. Stacked armadillo repeats (ArmR), each typically consisting of 42 amino acids arranged in three α-helices, build an elongated superhelical structure which enables binding of peptides in extended conformation. A consensus-based design approach, complemented with molecular dynamics simulations and rational engineering, resulted in well-expressed monomeric proteins with high stability. Peptide binders were selected and several structures were determined, forming the basis for the future development of modular peptide-binding scaffolds.  相似文献   

17.
Lattice models have been previously used to model ligand diffusion on protein surfaces. Using such models, it has been shown that the presence of pathways (or 'chreodes') of consecutive residues with certain properties can decrease the number of steps required for the arrival of a ligand at the active site. In this work, we show that, based on a genetic algorithm, ligand-diffusion pathways can evolve on a protein surface, when this surface is selected for shortening the travel length toward the active site. Biological implications of these results are discussed.  相似文献   

18.
Membrane proteins are hard to handle and consequently the purification of functional protein in milligram quantities is a major problem. One reason for this is that once integral membrane proteins are outside their native membrane, they are prone to aggregation, are unstable and are frequently only partially functional. Knowledge of membrane protein folding mechanisms in vitro can help to understand the causes of these problems and work toward strategies to disaggregate and fold proteins correctly. Kinetic and stability studies are emerging on membrane protein folding, mainly on bacterial proteins. Mutagenesis methods have also been used to probe specific structural features or bonds in proteins. In addition, manipulation of lipid properties can be used to improve the efficiency of folding as well as the stability and function of the protein.  相似文献   

19.
Lee SY  Zhang Y  Skolnick J 《Proteins》2006,63(3):451-456
The TASSER structure prediction algorithm is employed to investigate whether NMR structures can be moved closer to their corresponding X-ray counterparts by automatic refinement procedures. The benchmark protein dataset includes 61 nonhomologous proteins whose structures have been determined by both NMR and X-ray experiments. Interestingly, by starting from NMR structures, the majority (79%) of TASSER refined models show a structural shift toward their X-ray structures. On average, the TASSER refined models have a root-mean-square-deviation (RMSD) from the X-ray structure of 1.785 A (1.556 A) over the entire chain (aligned region), while the average RMSD between NMR and X-ray structures (RMSD(NMR_X-ray)) is 2.080 A (1.731 A). For all proteins having a RMSD(NMR_X-ray) >2 A, the TASSER refined structures show consistent improvement. However, for the 34 proteins with a RMSD(NMR_X-ray) <2 A, there are only 21 cases (60%) where the TASSER model is closer to the X-ray structure than NMR, which may be due to the inherent resolution of TASSER. We also compare the TASSER models with 12 NMR models in the RECOORD database that have been recalculated recently by Nederveen et al. from original NMR restraints using the newest molecular dynamics tools. In 8 of 12 cases, TASSER models show a smaller RMSD to X-ray structures; in 3 of 12 cases, where RMSD(NMR_X-ray) <1 A, RECOORD does better than TASSER. These results suggest that TASSER can be a useful tool to improve the quality of NMR structures.  相似文献   

20.
Jianxing Song 《FEBS letters》2009,583(6):953-3132
Many proteins are not refoldable and also insoluble. Previously no general method was available to solubilize them and consequently their structural properties remained unknown. Surprisingly, we recently discovered that all insoluble proteins in our laboratory, which are highly diverse, can be solubilized in pure water. Structural characterization by CD and NMR led to their classification into three groups, all of which appear trapped in the highly disordered or partially-folded states with a substantial exposure of hydrophobic side chains. In this review, I discuss our results in a wide context and subsequently propose a model to rationalize the discovery. The potential applications are also explored in studying protein folding, design and membrane proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号