首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phage display enables the presentation of a large number of peptides on the surface of phage particles. Such libraries can be tested for binding to target molecules of interest by means of affinity selection. Here we present SiteLight, a novel computational tool for binding site prediction using phage display libraries. SiteLight is an algorithm that maps the 1D peptide library onto a three-dimensional (3D) protein surface. It is applicable to complexes made up of a protein Template and any type of molecule termed Target. Given the three-dimensional structure of a Template and a collection of sequences derived from biopanning against the Target, the Template interaction site with the Target is predicted. We have created a large diverse data set for assessing the ability of SiteLight to correctly predict binding sites. SiteLight predictive mapping enables discrimination between the binding and nonbinding parts of the surface. This prediction can be used to effectively reduce the surface by 75% without excluding the binding site. In 63% of the cases we have tested, there is at least one binding site prediction that overlaps the interface by at least 50%. These results suggest the applicability of phage display libraries for automated binding site prediction on three-dimensional structures. For most effective binding site prediction we propose using a random phage display library twice, to scan both binding partners of a given complex. The derived peptides are mapped to the other binding partner (now used as a Template). Here, the surface of each partner is reduced by 75%, focusing their relative positions with respect to each other significantly. Such information can be utilized to improve docking algorithms and scoring functions.  相似文献   

2.
La D  Kihara D 《Proteins》2012,80(1):126-141
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.  相似文献   

3.
An important objective of computational protein design is the generation of high affinity peptide inhibitors of protein-peptide interactions, both as a precursor to the development of therapeutics aimed at disrupting disease causing complexes, and as a tool to aid investigators in understanding the role of specific complexes in the cell. We have developed a computational approach to increase the affinity of a protein-peptide complex by designing N or C-terminal extensions which interact with the protein outside the canonical peptide binding pocket. In a first in silico test, we show that by simultaneously optimizing the sequence and structure of three to nine residue peptide extensions starting from short (1-6 residue) peptide stubs in the binding pocket of a peptide binding protein, the approach can recover both the conformations and the sequences of known binding peptides. Comparison with phage display and other experimental data suggests that the peptide extension approach recapitulates naturally occurring peptide binding specificity better than fixed backbone design, and that it should be useful for predicting peptide binding specificities from crystal structures. We then experimentally test the approach by designing extensions for p53 and dystroglycan-based peptides predicted to bind with increased affinity to the Mdm2 oncoprotein and to dystrophin, respectively. The measured increases in affinity are modest, revealing some limitations of the method. Based on these in silico and experimental results, we discuss future applications of the approach to the prediction and design of protein-peptide interactions.  相似文献   

4.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

5.
蛋白质序列中的关联规则发现及其应用   总被引:2,自引:0,他引:2  
随着蛋白质序列-结构分析中使用的机器学习算法越来越复杂,其结果的解释和发现过程也随之复杂化,因此有必要寻找简单且理论上可靠的方法。通过引入原理简单、理论可靠、结果具有很强实际意义的关联规则发现算法,找到了蛋白质序列中数以万计的模式。结合实例演示了如何将这些模式应用于蛋白质序列分析中,如保守区域发现、二级结构预测等。同时根据这些结果构建了一个二级结构规则库和一种简单的二级结构预测算法,实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。  相似文献   

6.
Computational protein design can be used to select sequences that are compatible with a fixed-backbone template. This strategy has been used in numerous instances to engineer novel proteins. However, the fixed-backbone assumption severely restricts the sequence space that is accessible via design. For challenging problems, such as the design of functional proteins, this may not be acceptable. Here, we present a method for introducing backbone flexibility into protein design calculations and apply it to the design of diverse helical BH3 ligands that bind to the anti-apoptotic protein Bcl-xL, a member of the Bcl-2 protein family. We demonstrate how normal mode analysis can be used to sample different BH3 backbones, and show that this leads to a larger and more diverse set of low-energy solutions than can be achieved using a native high-resolution Bcl-xL complex crystal structure as a template. We tested several of the designed solutions experimentally and found that this approach worked well when normal mode calculations were used to deform a native BH3 helix structure, but less well when they were used to deform an idealized helix. A subsequent round of design and testing identified a likely source of the problem as inadequate sampling of the helix pitch. In all, we tested 17 designed BH3 peptide sequences, including several point mutants. Of these, eight bound well to Bcl-xL and four others showed weak but detectable binding. The successful designs showed a diversity of sequences that would have been difficult or impossible to achieve using only a fixed backbone. Thus, introducing backbone flexibility via normal mode analysis effectively broadened the set of sequences identified by computational design, and provided insight into positions important for binding Bcl-xL.  相似文献   

7.
King CA  Bradley P 《Proteins》2010,78(16):3437-3449
Protein-peptide interactions mediate many of the connections in intracellular signaling networks. A generalized computational framework for atomically precise modeling of protein-peptide specificity may allow for predicting molecular interactions, anticipating the effects of drugs and genetic mutations, and redesigning molecules for new interactions. We have developed an extensible, general algorithm for structure-based prediction of protein-peptide specificity as part of the Rosetta molecular modeling package. The algorithm is not restricted to any one peptide-binding domain family and, at minimum, does not require an experimentally characterized structure of the target protein nor any information about sequence specificity; although known structural data can be incorporated when available to improve performance. We demonstrate substantial success in specificity prediction across a diverse set of peptide-binding proteins, and show how performance is affected when incorporating varying degrees of input structural data. We also illustrate how structure-based approaches can provide atomic-level insight into mechanisms of peptide recognition and can predict the effects of point mutations on peptide specificity. Shortcomings and artifacts of our benchmark predictions are explained and limits on the generality of the method are explored. This work provides a promising foundation upon which further development of completely generalized, de novo prediction of peptide specificity may progress.  相似文献   

8.
Prediction of amino acid sequence from structure   总被引:2,自引:0,他引:2       下载免费PDF全文
We have developed a method for the prediction of an amino acid sequence that is compatible with a three-dimensional backbone structure. Using only a backbone structure of a protein as input, the algorithm is capable of designing sequences that closely resemble natural members of the protein family to which the template structure belongs. In general, the predicted sequences are shown to have multiple sequence profile scores that are dramatically higher than those of random sequences, and sometimes better than some of the natural sequences that make up the superfamily. As anticipated, highly conserved but poorly predicted residues are often those that contribute to the functional rather than structural properties of the protein. Overall, our analysis suggests that statistical profile scores of designed sequences are a novel and valuable figure of merit for assessing and improving protein design algorithms.  相似文献   

9.
10.
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes.  相似文献   

11.
12.
Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.  相似文献   

13.
The phosphotyrosine-binding (PTB) domain of the cell fate determinant Numb is involved in the formation of multiple protein complexes in vivo and can bind a diverse array of peptide sequences in vitro. To investigate the structural basis for the promiscuous nature of this protein module, we have determined its solution structure by NMR in a complex with a peptide containing an NMSF sequence derived from the Numb-associated kinase (Nak). The Nak peptide was found to adopt a significantly different structure from that of a GPpY sequence-containing peptide previously determined. In contrast to the helical turn adopted by the GPpY peptide, the Nak peptide forms a beta-turn at the NMSF site followed by another turn near the C-terminus. The Numb PTB domain appears to recognize peptides that differ in both primary and secondary structures by engaging various amounts of the binding surface of the protein. Our results suggest a mechanism through which a single PTB domain might interact with multiple distinct target proteins to control a complex biological process such as asymmetric cell division.  相似文献   

14.
A structure-based approach for prediction of MHC-binding peptides   总被引:5,自引:0,他引:5  
Identification of immunodominant peptides is the first step in the rational design of peptide vaccines aimed at T-cell immunity. The advances in sequencing techniques and the accumulation of many protein sequences without the purified protein challenge the development of computer algorithms to identify dominant T-cell epitopes based on sequence data alone. Here, we focus on antigenic peptides recognized by cytotoxic T cells. The selection of T-cell epitopes along a protein sequence is influenced by the specificity of each of the processing stages that precede antigen presentation. The most selective of these processing stages is the binding of the peptides to the major histocompatibility complex molecules, and therefore many of the predictive algorithms focus on this stage. Most of these algorithms are based on known binding peptides whose sequences have been used for the characterization of binding motifs or profiles. Here, we describe a structure-based algorithm that does not rely on previous binding data. It is based on observations from crystal structures that many of the bound peptides adopt similar conformations and placements within the MHC groove. The algorithm uses a structural template of the peptide in the MHC groove upon which peptide candidates are threaded and their fit to the MHC groove is evaluated by statistical pairwise potentials. It can rank all possible peptides along a protein sequence or within a suspected group of peptides, directing the experimental efforts towards the most promising peptides. This approach is especially useful when no previous peptide binding data are available.  相似文献   

15.
Successful predictions of peptide MHC binding typically require a large set of binding data for the specific MHC molecule that is examined. Structure based prediction methods promise to circumvent this requirement by evaluating the physical contacts a peptide can make with an MHC molecule based on the highly conserved 3D structure of peptide:MHC complexes. While several such methods have been described before, most are not publicly available and have not been independently tested for their performance. We here implemented and evaluated three prediction methods for MHC class II molecules: statistical potentials derived from the analysis of known protein structures; energetic evaluation of different peptide snapshots in a molecular dynamics simulation; and direct analysis of contacts made in known 3D structures of peptide:MHC complexes. These methods are ab initio in that they require structural data of the MHC molecule examined, but no specific peptide:MHC binding data. Moreover, these methods retain the ability to make predictions in a sufficiently short time scale to be useful in a real world application, such as screening a whole proteome for candidate binding peptides. A rigorous evaluation of each methods prediction performance showed that these are significantly better than random, but still substantially lower than the best performing sequence based class II prediction methods available. While the approaches presented here were developed independently, we have chosen to present our results together in order to support the notion that generating structure based predictions of peptide:MHC binding without using binding data is unlikely to give satisfactory results.  相似文献   

16.
Knowing the ligand or peptide binding site in proteins is highly important to guide drug discovery, but experimental elucidation of the binding site is difficult. Therefore, various computational approaches have been developed to identify potential binding sites in protein structures. However, protein and ligand flexibility are often neglected in these methods due to efficiency considerations despite the recognition that protein–ligand interactions can be strongly affected by mutual structural adaptations. This is particularly true if the binding site is unknown, as the screening will typically be performed based on an unbound protein structure. Herein we present DynaBiS, a hierarchical sampling algorithm to identify flexible binding sites for a target ligand with explicit consideration of protein and ligand flexibility, inspired by our previously presented flexible docking algorithm DynaDock. DynaBiS applies soft-core potentials between the ligand and the protein, thereby allowing a certain protein–ligand overlap resulting in efficient sampling of conformational adaptation effects. We evaluated DynaBiS and other commonly used binding site identification algorithms against a diverse evaluation set consisting of 26 proteins featuring peptide as well as small ligand binding sites. We show that DynaBiS outperforms the other evaluated methods for the identification of protein binding sites for large and highly flexible ligands such as peptides, both with a holo or apo structure used as input.  相似文献   

17.
The display of peptide sequences on the surface of bacteria is a technology that offers exciting applications in biotechnology and medical research. Type 1 fimbriae are surface organelles of Escherichia coli which mediate D-mannose-sensitive binding to different host surfaces by virtue of the FimH adhesin. FimH is a component of the fimbrial organelle that can accommodate and display a diverse range of peptide sequences on the E. coli cell surface. In this study we have constructed a random peptide library in FimH. The library, consisting of approximately 40 million individual clones, was screened for peptide sequences that conferred on recombinant cells the ability to bind Zn(2+). By serial selection, sequences that exhibited various degrees of binding affinity and specificity toward Zn(2+) were enriched. None of the isolated sequences showed similarity to known Zn(2+)-binding proteins, indicating that completely novel Zn(2+)-binding peptide sequences had been isolated. By changing the protein scaffold system, we demonstrated that the Zn(2+)-binding seems to be uniquely mediated by the peptide insert and to be independent of the sequence of the carrier protein. These findings might be applied in the design of biomatrices for bioremediation purposes or in the development of sensors for detection of heavy metals.  相似文献   

18.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

19.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.  相似文献   

20.
Due to Ca2+‐dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet‐lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet‐lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large‐margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM‐binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome‐wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif‐based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub‐sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号