首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Modeling the impact of amino acid mutations on protein-protein interaction plays a crucial role in protein engineering and drug design. In this study, we develop GeoPPI, a novel structure-based deep-learning framework to predict the change of binding affinity upon mutations. Based on the three-dimensional structure of a protein, GeoPPI first learns a geometric representation that encodes topology features of the protein structure via a self-supervised learning scheme. These representations are then used as features for training gradient-boosting trees to predict the changes of protein-protein binding affinity upon mutations. We find that GeoPPI is able to learn meaningful features that characterize interactions between atoms in protein structures. In addition, through extensive experiments, we show that GeoPPI achieves new state-of-the-art performance in predicting the binding affinity changes upon both single- and multi-point mutations on six benchmark datasets. Moreover, we show that GeoPPI can accurately estimate the difference of binding affinities between a few recently identified SARS-CoV-2 antibodies and the receptor-binding domain (RBD) of the S protein. These results demonstrate the potential of GeoPPI as a powerful and useful computational tool in protein design and engineering. Our code and datasets are available at: https://github.com/Liuxg16/GeoPPI.  相似文献   

2.
We have assembled a nonredundant set of 144 protein-protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners' affinity for each other, with K(d) ranging between 10(-5) and 10(-14) M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. The unbound structures of the component proteins being available, conformation changes can be assessed. They are significant in most of the complexes, and large movements or disorder-to-order transitions are frequently observed. The set may be used to benchmark biophysical models aiming to relate affinity to structure in protein-protein interactions, taking into account the reactants and the conformation changes that accompany the association reaction, instead of just the final product.  相似文献   

3.
Sun J  Xu J  Liu Z  Liu Q  Zhao A  Shi T  Li Y 《Bioinformatics (Oxford, England)》2005,21(16):3409-3415
MOTIVATION: The increasing availability of complete genome sequences provides excellent opportunity for the further development of tools for functional studies in proteomics. Several experimental approaches and in silico algorithms have been developed to cluster proteins into networks of biological significance that may provide new biological insights, especially into understanding the functions of many uncharacterized proteins. Among these methods, the phylogenetic profiles method has been widely used to predict protein-protein interactions. It involves the selection of reference organisms and identification of homologous proteins. Up to now, no published report has systematically studied the effects of the reference genome selection and the identification of homologous proteins upon the accuracy of this method. RESULTS: In this study, we optimized the phylogenetic profiles method by integrating phylogenetic relationships among reference organisms and sequence homology information to improve prediction accuracy. Our results revealed that the selection of the reference organisms set and the criteria for homology identification significantly are two critical factors for the prediction accuracy of this method. Our refined phylogenetic profiles method shows greater performance and potentially provides more reliable functional linkages compared with previous methods.  相似文献   

4.
5.
Li L  Zhao B  Du J  Zhang K  Ling CX  Li SS 《PloS one》2011,6(10):e25528
Protein-protein interactions (PPIs) are frequently mediated by the binding of a modular domain in one protein to a short, linear peptide motif in its partner. The advent of proteomic methods such as peptide and protein arrays has led to the accumulation of a wealth of interaction data for modular interaction domains. Although several computational programs have been developed to predict modular domain-mediated PPI events, they are often restricted to a given domain type. We describe DomPep, a method that can potentially be used to predict PPIs mediated by any modular domains. DomPep combines proteomic data with sequence information to achieve high accuracy and high coverage in PPI prediction. Proteomic binding data were employed to determine a simple yet novel parameter Ligand-Binding Similarity which, in turn, is used to calibrate Domain Sequence Identity and Position-Weighted-Matrix distance, two parameters that are used in constructing prediction models. Moreover, DomPep can be used to predict PPIs for both domains with experimental binding data and those without. Using the PDZ and SH2 domain families as test cases, we show that DomPep can predict PPIs with accuracies superior to existing methods. To evaluate DomPep as a discovery tool, we deployed DomPep to identify interactions mediated by three human PDZ domains. Subsequent in-solution binding assays validated the high accuracy of DomPep in predicting authentic PPIs at the proteome scale. Because DomPep makes use of only interaction data and the primary sequence of a domain, it can be readily expanded to include other types of modular domains.  相似文献   

6.
Computational methods for predicting protein-protein interaction sites based on structural data are characterized by an accuracy between 70 and 80%. Some experimental studies indicate that only a fraction of the residues, forming clusters in the center of the interaction site, are energetically important for binding. In addition, the analysis of amino acid composition has shown that residues located in the center of the interaction site can be better discriminated from the residues in other parts of the protein surface. In the present study, we implement a simple method to predict interaction site residues exploiting this fact and show that it achieves a very competitive performance compared to other methods using the same dataset and criteria for performance evaluation (success rate of 82.1%).  相似文献   

7.
Predicting the interactions between all the possible pairs of proteins in a given organism (making a protein-protein interaction map) is a crucial subject in bioinformatics. Most of the previous methods based on supervised machine learning use datasets containing approximately the same number of interacting pairs of proteins (positives) and non-interacting pairs of proteins (negatives) for training a classifier and are estimated to yield a large number of false positives. Thinking that the negatives used in previous studies cannot adequately represent all the negatives that need to be taken into account, we have developed a method based on multiple Support Vector Machines (SVMs) that uses more negatives than positives for predicting interactions between pairs of yeast proteins and pairs of human proteins. We show that the performance of a single SVM improved as we increased the number of negatives used for training and that, if more than one CPU is available, an approach using multiple SVMs is useful not only for improving the performance of classifiers but also for reducing the time required for training them. Our approach can also be applied to assessing the reliability of high-throughput interactions.  相似文献   

8.
Dong F  Zhou HX 《Proteins》2006,65(1):87-102
To investigate roles of electrostatic interactions in protein binding stability, electrostatic calculations were carried out on a set of 64 mutations over six protein-protein complexes. These mutations alter polar interactions across the interface and were selected for putative dominance of electrostatic contributions to the binding stability. Three protocols of implementing the Poisson-Boltzmann model were tested. In vdW4 the dielectric boundary between the protein low dielectric and the solvent high dielectric is defined as the protein van der Waals surface and the protein dielectric constant is set to 4. In SE4 and SE20, the dielectric boundary is defined as the surface of the protein interior inaccessible to a 1.4-A solvent probe, and the protein dielectric constant is set to 4 and 20, respectively. In line with earlier studies on the barnase-barstar complex, the vdW4 results on the large set of mutations showed the closest agreement with experimental data. The agreement between vdW4 and experiment supports the contention of dominant electrostatic contributions for the mutations, but their differences also suggest van der Waals and hydrophobic contributions. The results presented here will serve as a guide for future refinement in electrostatic calculation and inclusion of nonelectrostatic effects.  相似文献   

9.
Apoptosis is a matter of life and death for cells and both inhibited and enhanced apoptosis may be involved in the pathogenesis of human diseases. The structures of protein-protein complexes in the apoptosis signaling pathway are important as the structural pathway helps in understanding the mechanism of the regulation and information transfer, and in identifying targets for drug design. Here, we aim to predict the structures toward a more informative pathway than currently available. Based on the 3D structures of complexes in the target pathway and a protein-protein interaction modeling tool which allows accurate and proteome-scale applications, we modeled the structures of 29 interactions, 21 of which were previously unknown. Next, 27 interactions which were not listed in the KEGG apoptosis pathway were predicted and subsequently validated by the experimental data in the literature. Additional interactions are also predicted. The multi-partner hub proteins are analyzed and interactions that can and cannot co-exist are identified. Overall, our results enrich the understanding of the pathway with interactions and provide structural details for the human apoptosis pathway. They also illustrate that computational modeling of protein-protein interactions on a large scale can help validate experimental data and provide accurate, structural atom-level detail of signaling pathways in the human cell.  相似文献   

10.
Protein-protein interactions are governed by the change in free energy upon binding, ΔG = ΔH - TΔS. These interactions are often marginally stable, so one must examine the balance between the change in enthalpy, ΔH, and the change in entropy, ΔS, when investigating known complexes, characterizing the effects of mutations, or designing optimized variants. To perform a large-scale study into the contribution of conformational entropy to binding free energy, we developed a technique called GOBLIN (Graphical mOdel for BiomoLecular INteractions) that performs physics-based free energy calculations for protein-protein complexes under both side-chain and backbone flexibility. Goblin uses a probabilistic graphical model that exploits conditional independencies in the Boltzmann distribution and employs variational inference techniques that approximate the free energy of binding in only a few minutes. We examined the role of conformational entropy on a benchmark set of more than 700 mutants in eight large, well-studied complexes. Our findings suggest that conformational entropy is important in protein-protein interactions--the root mean square error (RMSE) between calculated and experimentally measured ΔΔGs decreases by 12% when explicit entropic contributions were incorporated. GOBLIN models all atoms of the protein complex and detects changes to the binding entropy along the interface as well as positions distal to the binding interface. Our results also suggest that a variational approach to entropy calculations may be quantitatively more accurate than the knowledge-based approaches used by the well-known programs FOLDX and Rosetta--GOBLIN's RMSEs are 10 and 36% lower than these programs, respectively.  相似文献   

11.
12.
Identification of the interfaces of large (Mr > 50,000) protein-protein complexes in solution by high resolution NMR has typically been achieved using experiments involving chemical shift perturbation and/or hydrogen-deuterium exchange of the main chain amide groups of the proteins. Interfaces identified using these techniques, however, are not always identical to those revealed using X-ray crystallography. In order to identify the contact residues in a large protein-protein complex more accurately, we developed a novel NMR method that uses cross-saturation phenomena in combination with TROSY detection in an optimally deuterium labeled system.  相似文献   

13.
Plant diseases have recently increased and exacerbated due to several factors such as climate change, chemicals’ misuse and pollution. They represent a severe threat for both economy and global food security. Recently, several researches have been proposed for plant disease identification through modern image-based recognition systems based on deep learning. However, several challenges still require further investigation. One is related to the high variety of leaf diseases/ species along with constraints related to the collection and annotation of real-world datasets. Other challenges are related to the study of leaf disease in uncontrolled environment. Compared to major existing researches, we propose in this article a new perspective to handle the problem with two main differences: First, while most approach aims to identify simultaneously a pair of species-disease, we propose to identify diseases independently of leaf species. This helps to recognize new species holding diseases that were previously learnt. Moreover, instead of using the global leaf image, we directly predict disease on the basis of the local disease symptom features. We believe that this may decrease the bias related to common context and/or background and enables to build a more generalised model for disease classification. In particular, we propose an hybrid system that combines strengths of deep learning-based semantic segmentation with classification capabilities to respectively extract infected regions and determine their identity. For that, an extensive experimentation including a comparison of different semantic segmentation and classification CNNs has been conducted on PlantVillage dataset (leaves within homogeneous background) in order to study the extent of use of local disease symptoms features to identify diseases. Specifically, a particular enhancement of disease identification accuracy has been demonstrated in IPM and BING datasets (leaves within uncontrolled background).  相似文献   

14.
15.
Liu X  Liu B  Huang Z  Shi T  Chen Y  Zhang J 《PloS one》2012,7(1):e30938

Background

The molecular network sustained by different types of interactions among proteins is widely manifested as the fundamental driving force of cellular operations. Many biological functions are determined by the crosstalk between proteins rather than by the characteristics of their individual components. Thus, the searches for protein partners in global networks are imperative when attempting to address the principles of biology.

Results

We have developed a web-based tool “Sequence-based Protein Partners Search” (SPPS) to explore interacting partners of proteins, by searching over a large repertoire of proteins across many species. SPPS provides a database containing more than 60,000 protein sequences with annotations and a protein-partner search engine in two modes (Single Query and Multiple Query). Two interacting proteins of human FBXO6 protein have been found using the service in the study. In addition, users can refine potential protein partner hits by using annotations and possible interactive network in the SPPS web server.

Conclusions

SPPS provides a new type of tool to facilitate the identification of direct or indirect protein partners which may guide scientists on the investigation of new signaling pathways. The SPPS server is available to the public at http://mdl.shsmu.edu.cn/SPPS/.  相似文献   

16.
17.
ABSTRACT: BACKGROUND: Identification of essential proteins plays a significant role in understanding minimal requirements for the cellular survival and development. Many computational methods have been proposed for predicting essential proteins by using the topological features of protein-protein interaction (PPI) networks. However, most of these methods ignored intrinsic biological meaning of proteins. Moreover, PPI data contains many false positives and false negatives. To overcome these limitations, recently many research groups have started to focus on identification of essential proteins by integrating PPI networks with other biological information. However, none of their methods has widely been acknowledged. RESULTS: By considering the facts that essential proteins are more evolutionarily conserved than nonessential proteins and essential proteins frequently bind each other, we propose an iteration method for predicting essential proteins by integrating the orthology with PPI networks, named by ION. Differently from other methods, ION identifies essential proteins depending on not only the connections between proteins but also their orthologous properties and features of their neighbors. ION is implemented to predict essential proteins in S. cerevisiae. Experimental results show that ION can achieve higher identification accuracy than eight other existing centrality methods in terms of area under the curve (AUC). Moreover, ION identifies a large amount of essential proteins which have been ignored by eight other existing centrality methods because of their low-connectivity. Many proteins ranked in top 100 by ION are both essential and belong to the complexes with certain biological functions. Furthermore, no matter how many reference organisms were selected, ION outperforms all eight other existing centrality methods. While using as many as possible reference organisms can improve the performance of ION. Additionally, ION also shows good prediction performance in E.Coli K-12. CONCLUSIONS: The accuracy of predicting essential proteins can be improved by integrating the orthology with PPI networks.  相似文献   

18.

Background

Understanding the information-processing capabilities of signal transduction networks, how those networks are disrupted in disease, and rationally designing therapies to manipulate diseased states require systematic and accurate reconstruction of network topology. Data on networks central to human physiology, such as the inflammatory signalling networks analyzed here, are found in a multiplicity of on-line resources of pathway and interactome databases (Cancer CellMap, GeneGo, KEGG, NCI-Pathway Interactome Database (NCI-PID), PANTHER, Reactome, I2D, and STRING). We sought to determine whether these databases contain overlapping information and whether they can be used to construct high reliability prior knowledge networks for subsequent modeling of experimental data.

Results

We have assembled an ensemble network from multiple on-line sources representing a significant portion of all machine-readable and reconcilable human knowledge on proteins and protein interactions involved in inflammation. This ensemble network has many features expected of complex signalling networks assembled from high-throughput data: a power law distribution of both node degree and edge annotations, and topological features of a ??bow tie?? architecture in which diverse pathways converge on a highly conserved set of enzymatic cascades focused around PI3K/AKT, MAPK/ERK, JAK/STAT, NF??B, and apoptotic signaling. Individual pathways exhibit ??fuzzy?? modularity that is statistically significant but still involving a majority of ??cross-talk?? interactions. However, we find that the most widely used pathway databases are highly inconsistent with respect to the actual constituents and interactions in this network. Using a set of growth factor signalling networks as examples (epidermal growth factor, transforming growth factor-beta, tumor necrosis factor, and wingless), we find a multiplicity of network topologies in which receptors couple to downstream components through myriad alternate paths. Many of these paths are inconsistent with well-established mechanistic features of signalling networks, such as a requirement for a transmembrane receptor in sensing extracellular ligands.

Conclusions

Wide inconsistencies among interaction databases, pathway annotations, and the numbers and identities of nodes associated with a given pathway pose a major challenge for deriving causal and mechanistic insight from network graphs. We speculate that these inconsistencies are at least partially attributable to cell, and context-specificity of cellular signal transduction, which is largely unaccounted for in available databases, but the absence of standardized vocabularies is an additional confounding factor. As a result of discrepant annotations, it is very difficult to identify biologically meaningful pathways from interactome networks a priori. However, by incorporating prior knowledge, it is possible to successively build out network complexity with high confidence from a simple linear signal transduction scaffold. Such reduced complexity networks appear suitable for use in mechanistic models while being richer and better justified than the simple linear pathways usually depicted in diagrams of signal transduction.  相似文献   

19.
Absolute binding free energy calculations and free energy decompositions are presented for the protein-protein complexes H-Ras/C-Raf1 and H-Ras/RalGDS. Ras is a central switch in the regulation of cell proliferation and differentiation. In our study, we investigate the capability of the molecular mechanics (MM)-generalized Born surface area (GBSA) approach to estimate absolute binding free energies for the protein-protein complexes. Averaging gas-phase energies, solvation free energies, and entropic contributions over snapshots extracted from trajectories of the unbound proteins and the complexes, calculated binding free energies (Ras-Raf: -15.0(+/-6.3)kcal mol(-1); Ras-RalGDS: -19.5(+/-5.9)kcal mol(-1)) are in fair agreement with experimentally determined values (-9.6 kcal mol(-1); -8.4 kcal mol(-1)), if appropriate ionic strength is taken into account. Structural determinants of the binding affinity of Ras-Raf and Ras-RalGDS are identified by means of free energy decomposition. For the first time, computationally inexpensive generalized Born (GB) calculations are applied in this context to partition solvation free energies along with gas-phase energies between residues of both binding partners. For selected residues, in addition, entropic contributions are estimated by classical statistical mechanics. Comparison of the decomposition results with experimentally determined binding free energy differences for alanine mutants of interface residues yielded correlations with r(2)=0.55 and 0.46 for Ras-Raf and Ras-RalGDS, respectively. Extension of the decomposition reveals residues as far apart as 25A from the binding epitope that can contribute significantly to binding free energy. These "hotspots" are found to show large atomic fluctuations in the unbound proteins, indicating that they reside in structurally less stable regions. Furthermore, hotspot residues experience a significantly larger-than-average decrease in local fluctuations upon complex formation. Finally, by calculating a pair-wise decomposition of interactions, interaction pathways originating in the binding epitope of Raf are found that protrude through the protein structure towards the loop L1. This explains the finding of a conformational change in this region upon complex formation with Ras, and it may trigger a larger structural change in Raf, which is considered to be necessary for activation of the effector by Ras.  相似文献   

20.
When two proteins associate they form a molecular interface that is a structural and energetic mosaic. Within such interfaces, individual amino acid residues contribute distinct binding energies to the complex. In combination, these energies are not necessarily additive, and significant positive or negative cooperative effects often exist. The basis of reliable algorithms to predict the specificities and energies of protein-protein interactions depends critically on a quantitative understanding of this cooperativity. We have used a model protein-protein system defined by an affinity maturation pathway, comprising variants of a T cell receptor Vbeta domain that exhibit an overall affinity range of approximately 1500-fold for binding to the superantigen staphylococcal enterotoxin C3, in order to dissect the cooperative and additive energetic contributions of residues within an interface. This molecular interaction has been well characterized previously both structurally, by x-ray crystallographic analysis, and energetically, by scanning alanine mutagenesis. Through analysis of group and individual maturation and reversion mutations using surface plasmon resonance spectroscopy, we have identified energetically important interfacial residues, determined their cooperative and additive energetic properties, and elucidated the kinetic and thermodynamic bases for molecular evolution in this system. The summation of the binding free energy changes associated with the individual mutations that define this affinity maturation pathway is greater than that of the fully matured variant, even though the affinity gap between the end point variants is relatively large. Two mutations in particular, both located in the complementarity determining region 2 loop of the Vbeta domain, exhibit negative cooperativity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号