首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling.  相似文献   

2.
The DOcking decoy‐based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance‐dependent atom‐pair interactions. To optimize the atom‐pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand–receptor systems (or just pairs). Thus, a total of 8609 ligand–receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand–receptor systems, 1000 evenly sampled docking decoys with 0–10 Å interface root‐mean‐square‐deviation (iRMSD) were generated with a method used before for protein–protein docking. A neural network‐based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel‐like energy landscape for the interaction between these hypothetical ligand–receptor systems. Thus, our method hierarchically models the overall funnel‐like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom‐pair‐based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation‐dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand–receptor systems and their decoys are freely available at http://agknapp.chemie.fu‐berlin.de/doop/ . Proteins 2015; 83:881–890. © 2015 Wiley Periodicals, Inc.  相似文献   

3.
The Medium-Chain Dehydrogenase/Reductase Engineering Database (MDRED, http://www.mdred.uni-stuttgart.de) has been established to serve as an analysis tool for a systematic investigation of sequence-structure-function relationships. It includes sequence and structure information of 2684 and 42 medium-chain dehydrogenases/reductases (MDRs), respectively. Although MDRs are very diverse in sequence, they have a conserved tertiary structure. MDRs are assigned to 199 homologous families and 29 superfamilies. For each family, annotated multiple sequence alignments are provided, and functionally relevant residues are annotated. Twenty-five superfamilies were classified as zinc-containing MDRs, four as non-zinc-containing MDRs. For the zinc-containing MDRs, three subclasses were identified by systematic analysis of a variable loop region, the quaternary structure determining loop (QSDL): the class of short, medium, and long QSDL, which include 11, 3, and 5 superfamilies, respectively. The length of the QSDL is predictive for tetramer (short QSDL) and dimer (long QSDL) formation. The class of medium QSDL includes both tetrameric and dimeric MDRs. The shape of the substrate-binding site is highly conserved in all zinc-containing MDRs with the exception of two variable regions, the substrate recognition sites (SRS): two residues located on the QSDL (SRS1) and, for the class of long QSDL, one residue located in the catalytic domain (SRS2). The MDRED is the first online-accessible resource of MDRs that integrates information on sequence, structure, and function. Annotation of functionally relevant residues assist the understanding of sequence-structure-function relationships. Thus, the MDRED serves as a valuable tool to identify potential hotspots for engineering properties such as substrate specificity.  相似文献   

4.
We present a new four‐body knowledge‐based potential for recognizing the native state of proteins from their misfolded states. This potential was extracted from a large set of protein structures determined by X‐ray crystallography using BetaMol, a software based on the recent theory of the beta‐complex (β‐complex) and quasi‐triangulation of the Voronoi diagram of spheres. This geometric construct reflects the size difference among atoms in their full Euclidean metric; property not accounted for in a typical 3D Delaunay triangulation. The ability of this potential to identify the native conformation over a large set of decoys was evaluated. Experiments show that this potential outperforms a potential constructed with a classical Delaunay triangulation in decoy discrimination tests. The addition of a statistical hydrogen bond potential to our four‐body potential allows a significant improvement in the decoy discrimination, in such a way that we are able to predict successfully the native structure in 90% of cases. Proteins 2013; 81:1420–1433. © 2013 Wiley Periodicals, Inc.  相似文献   

5.
Hyungrae Kim  Daisuke Kihara 《Proteins》2014,82(12):3255-3272
We developed a new representation of local amino acid environments in protein structures called the Side‐chain Depth Environment (SDE). An SDE defines a local structural environment of a residue considering the coordinates and the depth of amino acids that locate in the vicinity of the side‐chain centroid of the residue. SDEs are general enough that similar SDEs are found in protein structures with globally different folds. Using SDEs, we developed a procedure called PRESCO (Protein Residue Environment SCOre) for selecting native or near‐native models from a pool of computational models. The procedure searches similar residue environments observed in a query model against a set of representative native protein structures to quantify how native‐like SDEs in the model are. When benchmarked on commonly used computational model datasets, our PRESCO compared favorably with the other existing scoring functions in selecting native and near‐native models. Proteins 2014; 82:3255–3272. © 2014 Wiley Periodicals, Inc.  相似文献   

6.
We introduce a side‐chain‐inclusive scoring function, named OPUS‐SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS‐CSF [Xu et al., Protein Sci. 2018; 27: 286–292], which exclusively uses main chain information, OPUS‐SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS‐SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS‐CSF, OPUS‐SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS‐SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.  相似文献   

7.
8.
Statistical potential for assessment and prediction of protein structures   总被引:2,自引:0,他引:2  
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.  相似文献   

9.
Coarse‐grained models for protein structure are increasingly used in simulations and structural bioinformatics. In this study, we evaluated the effectiveness of three granularities of protein representation based on their ability to discriminate between correctly folded native structures and incorrectly folded decoy structures. The three levels of representation used one bead per amino acid (coarse), two beads per amino acid (medium), and all atoms (fine). Multiple structure features were compared at each representation level including two‐body interactions, three‐body interactions, solvent exposure, contact numbers, and angle bending. In most cases, the all‐atom level was most successful at discriminating decoys, but the two‐bead level provided a good compromise between the number of model parameters which must be estimated and the accuracy achieved. The most effective feature type appeared to be two‐body interactions. Considering three‐body interactions increased accuracy only marginally when all atoms were used and not at all in medium and coarse representations. Though two‐body interactions were most effective for the coarse representations, the accuracy loss for using only solvent exposure or contact number was proportionally less at these levels than in the all‐atom representation. We propose an optimization method capable of selecting bead types of different granularities to create a mixed representation of the protein. We illustrate its behavior on decoy discrimination and discuss implications for data‐driven protein model selection. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

10.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

11.
Misura KM  Baker D 《Proteins》2005,59(1):15-29
Achieving atomic level accuracy in de novo structure prediction presents a formidable challenge even in the context of protein models with correct topologies. High-resolution refinement is a fundamental test of force field accuracy and sampling methodology, and its limited success in both comparative modeling and de novo prediction contexts highlights the limitations of current approaches. We constructed four tests to identify bottlenecks in our current approach and to guide progress in this challenging area. The first three tests showed that idealized native structures are stable under our refinement simulation conditions and that the refinement protocol can significantly decrease the root mean square deviation (RMSD) of perturbed native structures. In the fourth test we applied the refinement protocol to de novo models and showed that accurate models could be identified based on their energies, and in several cases many of the buried side chains adopted native-like conformations. We also showed that the differences in backbone and side-chain conformations between the refined de novo models and the native structures are largely localized to loop regions and regions where the native structure has unusual features such as rare rotamers or atypical hydrogen bonding between beta-strands. The refined de novo models typically have higher energies than refined idealized native structures, indicating that sampling of local backbone conformations and side-chain packing arrangements in a condensed state is a primary obstacle.  相似文献   

12.
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three‐dimensional context in proteins. We have constructed a knowledge‐based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon‐alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α‐carbon virtual bond opening and dihedral angles, pair‐wise contacts and hydrogen bond donor‐acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α‐carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full‐length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

13.
Fujitsuka Y  Chikenji G  Takada S 《Proteins》2006,62(2):381-398
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.  相似文献   

14.
Evaluation of protein models against the native structure is essential for the development and benchmarking of protein structure prediction methods. Although a number of evaluation scores have been proposed to date, many aspects of model assessment still lack desired robustness. In this study we present CAD‐score, a new evaluation function quantifying differences between physical contacts in a model and the reference structure. The new score uses the concept of residue–residue contact area difference (CAD) introduced by Abagyan and Totrov (J Mol Biol 1997; 268:678–685). Contact areas, the underlying basis of the score, are derived using the Voronoi tessellation of protein structure. The newly introduced CAD‐score is a continuous function, confined within fixed limits, free of any arbitrary thresholds or parameters. The built‐in logic for treatment of missing residues allows consistent ranking of models of any degree of completeness. We tested CAD‐score on a large set of diverse models and compared it to GDT‐TS, a widely accepted measure of model accuracy. Similarly to GDT‐TS, CAD‐score showed a robust performance on single‐domain proteins, but displayed a stronger preference for physically more realistic models. Unlike GDT‐TS, the new score revealed a balanced assessment of domain rearrangement, removing the necessity for different treatment of single‐domain, multi‐domain, and multi‐subunit structures. Moreover, CAD‐score makes it possible to assess the accuracy of inter‐domain or inter‐subunit interfaces directly. In addition, the approach offers an alternative to the superposition‐based model clustering. The CAD‐score implementation is available both as a web server and a standalone software package at http://www.ibt.lt/bioinformatics/cad‐score/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

15.
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge‐based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue‐residue or atom‐atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation‐based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single‐model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa . Proteins 2017; 85:1131–1145. © 2017 Wiley Periodicals, Inc.  相似文献   

16.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.  相似文献   

17.
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.  相似文献   

18.
Using information‐theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information‐based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi‐chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information‐theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
QMEAN: A comprehensive scoring function for model quality assessment   总被引:3,自引:0,他引:3  
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号