首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein binding site prediction using an empirical scoring function   总被引:4,自引:1,他引:3  
Liang S  Zhang C  Liu S  Zhou Y 《Nucleic acids research》2006,34(13):3698-3707
Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches.  相似文献   

2.
Given current computational environments, it is worthwhile to establish amino acid residue-level functions which approximate protein folds quite well. Such functions must be the interim steps toward protein three-dimensional structure prediction, I have shown that an empirical hydrophobic penalty function of protein, derived from the number of residues in a sphere around each residue, could be utilized to distinguish the correctly folded structure from the incorrect ones. In order to assess the predictive power of the penalty function, I had generated conformations by randomly changing main chain dihedral angles, and applied the penalty function to them. If only a local region was allowed to change its conformation, native like structures could be generated within a reasonable computational time. In global simulations, however, a considerable number of nonnative conformations, which gave as small a penalty value as that of the native protein, were found. Although some of the conformations were compact and globular, they were quite different from the native structure in that they lacked most of the secondary structures. This result shows that the penalty function alone cannot define the native structure, and that substructure information may help the penalty function to reach the correctly folded structure.  相似文献   

3.
4.

Background  

The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01.  相似文献   

5.
A two-stage neural network has been used to predict protein secondary structure based on the position specific scoring matrices generated by PSI-BLAST. Despite the simplicity and convenience of the approach used, the results are found to be superior to those produced by other methods, including the popular PHD method according to our own benchmarking results and the results from the recent Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP3), where the method was evaluated by stringent blind testing. Using a new testing set based on a set of 187 unique folds, and three-way cross-validation based on structural similarity criteria rather than sequence similarity criteria used previously (no similar folds were present in both the testing and training sets) the method presented here (PSIPRED) achieved an average Q3 score of between 76.5% to 78.3% depending on the precise definition of observed secondary structure used, which is the highest published score for any method to date. Given the success of the method in CASP3, it is reasonable to be confident that the evaluation presented here gives a fair indication of the performance of the method in general.  相似文献   

6.
We extended a mean-field model to proteins with all atomic detail. The all-atom mean-field model was used to calculate the dynamic and thermodynamic properties of a three-helix bundle fragment of Staphylococcal protein A (Protein Data Bank [PDB] ID 1BDD) and alpha-spectrin SH3 domain protein (PDB ID 1SHG). We show that a model with all-atomic detail provides a significantly more accurate prediction of flexibility of residues in proteins than does a coarse-grained residue-level model. The accuracy of flexibility prediction is further confirmed by application of the method to 18 additional proteins with the largest size of 224 residues.  相似文献   

7.
Brunette TJ  Brock O 《Proteins》2008,73(4):958-972
The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration toward regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy.  相似文献   

8.
We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org /#Software and http://mamiris.com/ . Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc.  相似文献   

9.
RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations.  相似文献   

10.
Assessing protein-ligand interaction is of great importance for virtual screening initiatives in order to discover new drugs. The present work describes a set of empirical scoring functions to assess the binding affinity, involving terms for intermolecular hydrogen bonds and contact surface. The results show that our methodology works better to predict protein-ligand affinity when compared with XSCORE, a popular empirical scoring function.  相似文献   

11.
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an “MMGBSA” energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803–819. © 2016 Wiley Periodicals, Inc.  相似文献   

12.
Side-chain modeling with an optimized scoring function   总被引:1,自引:0,他引:1       下载免费PDF全文
Modeling side-chain conformations on a fixed protein backbone has a wide application in structure prediction and molecular design. Each effort in this field requires decisions about a rotamer set, scoring function, and search strategy. We have developed a new and simple scoring function, which operates on side-chain rotamers and consists of the following energy terms: contact surface, volume overlap, backbone dependency, electrostatic interactions, and desolvation energy. The weights of these energy terms were optimized to achieve the minimal average root mean square (rms) deviation between the lowest energy rotamer and real side-chain conformation on a training set of high-resolution protein structures. In the course of optimization, for every residue, its side chain was replaced by varying rotamers, whereas conformations for all other residues were kept as they appeared in the crystal structure. We obtained prediction accuracy of 90.4% for chi(1), 78.3% for chi(1 + 2), and 1.18 A overall rms deviation. Furthermore, the derived scoring function combined with a Monte Carlo search algorithm was used to place all side chains onto a protein backbone simultaneously. The average prediction accuracy was 87.9% for chi(1), 73.2% for chi(1 + 2), and 1.34 A rms deviation for 30 protein structures. Our approach was compared with available side-chain construction methods and showed improvement over the best among them: 4.4% for chi(1), 4.7% for chi(1 + 2), and 0.21 A for rms deviation. We hypothesize that the scoring function instead of the search strategy is the main obstacle in side-chain modeling. Additionally, we show that a more detailed rotamer library is expected to increase chi(1 + 2) prediction accuracy but may have little effect on chi(1) prediction accuracy.  相似文献   

13.
Hsieh MJ  Luo R 《Proteins》2004,56(3):475-486
A well-behaved physics-based all-atom scoring function for protein structure prediction is analyzed with several widely used all-atom decoy sets. The scoring function, termed AMBER/Poisson-Boltzmann (PB), is based on a refined AMBER force field for intramolecular interactions and an efficient PB model for solvation interactions. Testing on the chosen decoy sets shows that the scoring function, which is designed to consider detailed chemical environments, is able to consistently discriminate all 62 native crystal structures after considering the heteroatom groups, disulfide bonds, and crystal packing effects that are not included in the decoy structures. When NMR structures are considered in the testing, the scoring function is able to discriminate 8 out of 10 targets. In the more challenging test of selecting near-native structures, the scoring function also performs very well: for the majority of the targets studied, the scoring function is able to select decoys that are close to the corresponding native structures as evaluated by ranking numbers and backbone Calpha root mean square deviations. Various important components of the scoring function are also studied to understand their discriminative contributions toward the rankings of native and near-native structures. It is found that neither the nonpolar solvation energy as modeled by the surface area model nor a higher protein dielectric constant improves its discriminative power. The terms remaining to be improved are related to 1-4 interactions. The most troublesome term is found to be the large and highly fluctuating 1-4 electrostatics term, not the dihedral-angle term. These data support ongoing efforts in the community to develop protein structure prediction methods with physics-based potentials that are competitive with knowledge-based potentials.  相似文献   

14.
MOTIVATION: We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects. RESULTS: We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5. AVAILABILITY: The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.  相似文献   

15.
16.
Protein folding, binding, catalytic activity and molecular recognition all involve molecular movements, with varying extents. The molecular movements are brought upon via flexible regions. Stemming from sequence, a fine tuning of electrostatic and hydrophobic properties of the protein fold determine flexible and rigid regions. Studies show flexible regions usually lack electrostatic interactions, such as salt-bridges and hydrogen-bonds, while the rigid regions often have larger number of such electrostatic interactions. Protein flexible regions are not simply an outcome of looser packing or instability, rather they are evolutionally selected. In this review article we highlight the significance of protein flexibilities in folding, binding and function, and their structural and thermodynamic determinants. Our electrostatic calculations and molecular dynamic simulations on an antibody-antigen complex further illustrate the importance of protein flexibilities in binding and function.  相似文献   

17.
We present a novel Monte Carlo simulation of protein folding, in which all heavy atoms are represented as interacting hard spheres. This model includes all degrees of freedom relevant to folding, all side-chain and backbone torsions, and uses a Go potential. In this study, we focus on the 46 residue alpha/beta protein crambin and two of its structural components, the helix and helix hairpin. For a wide range of temperatures, we recorded multiple folding events of these three structures from random coils to native conformations that differ by less than 1 A C(alpha) dRMS from their crystal structure coordinates. The thermodynamics and kinetic mechanism of the helix-coil transition obtained from our simulation shows excellent agreement with currently available experimental and molecular dynamics data. Based on insights obtained from folding its smaller structural components, a possible folding mechanism for crambin is proposed. We observed that the folding occurs via a cooperative, first order-like process, and that many folding pathways to the native state exist. One particular sequence of events constitutes a "fast-folding" pathway where kinetic traps are avoided. At very low temperatures, a kinetic trap arising from the incorrect packing of side-chains was observed. These results demonstrate that folding to the native state can be observed in a reasonable amount of time on desktop computers even when an all-atom representation is used, provided the energetics sufficiently stabilize the native state.  相似文献   

18.
简述了蛋白质结构预测中存在的问题,主要介绍了蛋白质分子力场的一种模型即联合残基力场,然后利用这种模型对一种多肽和葡萄球菌蛋白的部分段进行试验,使它们的能量不同程度的得到了最小化,其中前者的初始能量为-71.50461kcal/mol,最小化后为-73.32767kcal/mol.  相似文献   

19.
20.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号