共查询到20条相似文献,搜索用时 0 毫秒
1.
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling. 相似文献
2.
Fold assessment for comparative protein structure modeling 总被引:1,自引:0,他引:1
3.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction. 相似文献
4.
The protein structures of six comparative modeling targets were predicted in a procedure that relied on improved energy minimization, without empirical rules, to position all new atoms. The structures of human nucleoside diphosphate kinase NM23-H2, HPr from Mycoplasma capricolum, 2Fe-2S ferredoxin from Haloarcula marismortui, eosinophil-derived neurotoxin (EDN), mouse cellular retinoic acid protein I (CRABP1), and P450eryf were predicted with root mean square deviations on Cα atoms of 0.69, 0.73, 1.11, 1.48, 1.69, and 1.73 Å, respectively, compared to the target crystal structures. These differences increased as the sequence similarity between the target and parent proteins decreased from about 60 to 20% identity. More residues were predicted than form the common region shared by the two crystal structures. In most cases insertions or deletions between the target and the related protein of known structure were not correctly positioned. One two residue insertion in CRABP1 was predicted in the correct conformation, while a nine residue insertion in EDN was predicted in the correct spatial region, although not in the correct conformation. The positions of common cofactors and their binding sites were predicted correctly, even when overall sequence similarity was low. © 1995 Wiley-Liss, Inc. 相似文献
5.
We developed a method for structure characterization of assembly components by iterative comparative protein structure modeling and fitting into cryo-electron microscopy (cryoEM) density maps. Specifically, we calculate a comparative model of a given component by considering many alternative alignments between the target sequence and a related template structure while optimizing the fit of a model into the corresponding density map. The method relies on the previously developed Moulder protocol that iterates over alignment, model building, and model assessment. The protocol was benchmarked using 20 varied target-template pairs of known structures with less than 30% sequence identity and corresponding simulated density maps at resolutions from 5A to 25A. Relative to the models based on the best existing sequence profile alignment methods, the percentage of C(alpha) atoms that are within 5A of the corresponding C(alpha) atoms in the superposed native structure increases on average from 52% to 66%, which is half-way between the starting models and the models from the best possible alignments (82%). The test also reveals that despite the improvements in the accuracy of the fitness function, this function is still the bottleneck in reducing the remaining errors. To demonstrate the usefulness of the protocol, we applied it to the upper domain of the P8 capsid protein of rice dwarf virus that has been studied by cryoEM at 6.8A. The C(alpha) root-mean-square deviation of the model based on the remotely related template, bluetongue virus VP7, improved from 8.7A to 6.0A, while the best possible model has a C(alpha) RMSD value of 5.3A. Moreover, the resulting model fits better into the cryoEM density map than the initial template structure. The method is being implemented in our program MODELLER for protein structure modeling by satisfaction of spatial restraints and will be applicable to the rapidly increasing number of cryoEM density maps of macromolecular assemblies. 相似文献
6.
Francisco Melo Roberto Snchez Andrej Sali 《Protein science : a publication of the Protein Society》2002,11(2):430-448
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment. 相似文献
7.
Local quality assessment in homology models using statistical potentials and support vector machines
Fasnacht M Zhu J Honig B 《Protein science : a publication of the Protein Society》2007,16(8):1557-1568
In this study, we address the problem of local quality assessment in homology models. As a prerequisite for the evaluation of methods for predicting local model quality, we first examine the problem of measuring local structural similarities between a model and the corresponding native structure. Several local geometric similarity measures are evaluated. Two methods based on structural superposition are found to best reproduce local model quality assessments by human experts. We then examine the performance of state-of-the-art statistical potentials in predicting local model quality on three qualitatively distinct data sets. The best statistical potential, DFIRE, is shown to perform on par with the best current structure-based method in the literature, ProQres. A combination of different statistical potentials and structural features using support vector machines is shown to provide somewhat improved performance over published methods. 相似文献
8.
In spite of the tremendous increase in the rate at which protein structures are being determined, there is still an enormous gap between the numbers of known DNA-derived sequences and the numbers of three-dimensional structures. In order to shed light on the biological functions of the molecules, researchers often resort to comparative molecular modeling. Earlier work has shown that when the sequence alignment is in error, then the comparative model is guaranteed to be wrong. In addition, loops, the sites of insertions and deletions in families of homologous proteins, are exceedingly difficult to model. Thus, many of the current problems in comparative molecular modeling are minor versions of the global protein folding problem. In order to assess objectively the current state of comparative molecular modeling, 13 groups submitted blind predictions of seven different proteins of undisclosed tertiary structure. This assessment shows that where sequence identity between the target and the template structure is high (> 70%), comparative molecular modeling is highly successful. On the other hand, automated modeling techniques and sophisticated energy minimization methods fail to improve upon the starting structures when the sequence identity is low (~30%). Based on these results it appears that insertions and deletions are still major problems. Successfully deducing the correct sequence alignment when the local similarity is low is still difficult. We suggest some minimal testing of submitted coordinates that should be required of authors before papers on comparative molecular modeling are accepted for publication in journals. © 1995 Wiley-Liss, Inc. 相似文献
9.
Eramian D Shen MY Devos D Melo F Sali A Marti-Renom MA 《Protein science : a publication of the Protein Society》2006,15(7):1653-1666
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling. 相似文献
10.
The use of classical molecular dynamics simulations, performed in explicit water, for the refinement of structural models of proteins generated ab initio or based on homology has been investigated. The study involved a test set of 15 proteins that were previously used by Baker and coworkers to assess the efficiency of the ROSETTA method for ab initio protein structure prediction. For each protein, four models generated using the ROSETTA procedure were simulated for periods of between 5 and 400 nsec in explicit solvent, under identical conditions. In addition, the experimentally determined structure and the experimentally derived structure in which the side chains of all residues had been deleted and then regenerated using the WHATIF program were simulated and used as controls. A significant improvement in the deviation of the model structures from the experimentally determined structures was observed in several cases. In addition, it was found that in certain cases in which the experimental structure deviated rapidly from the initial structure in the simulations, indicating internal strain, the structures were more stable after regenerating the side-chain positions. Overall, the results indicate that molecular dynamics simulations on a tens to hundreds of nanoseconds time scale are useful for the refinement of homology or ab initio models of small to medium-size proteins. 相似文献
11.
12.
13.
Protein structure refinement from comparative models with the goal of predicting structures at near-experimental accuracy remains an unsolved problem. Structure refinement might be achieved with an iterative protocol where the most native-like structure from a set of decoys generated from an initial model in one cycle is used as the starting structure for the next cycle. Conformational sampling based on the coarse-grained SICHO model, atomic level of detail molecular dynamics simulations, and normal-mode analysis is compared in the context of such a protocol. All of the sampling methods can achieve significant refinement close to experimental structures, although the distribution of structures and the ability to reach native-like structures differs greatly. Implications for the practical application of such sampling methods and the requirements for scoring functions in an iterative refinement protocol are analyzed in the context of theoretical predictions for the distribution of protein-like conformations with a random sampling protocol. 相似文献
14.
A set of grid type knowledge‐based energy functions is introduced for ?–χ1, ψ–χ1, ?–ψ, and χ1–χ2 torsion angle combinations. Boltzmann distribution is assumed for the torsion angle populations from protein X‐ray structures, and the functions are named as statistical torsion angle potential energy functions. The grid points around periodic boundaries are duplicated to force periodicity, and the remedy relieves the derivative discontinuity problem. The devised functions rapidly improve the quality of model structures. The potential bias in the functions and the usefulness of additional secondary structure information are also investigated. The proposed guiding functions are expected to facilitate protein structure modeling, such as protein structure prediction, protein design, and structure refinement. Proteins 2013. Proteins 2013; 81:1156–1165. © 2013 Wiley Periodicals, Inc. 相似文献
15.
Modeling mutations in protein structures 总被引:2,自引:0,他引:2
Feyfant E Sali A Fiser A 《Protein science : a publication of the Protein Society》2007,16(9):2030-2041
We describe an automated method for the modeling of point mutations in protein structures. The protein is represented by all non-hydrogen atoms. The scoring function consists of several types of physical potential energy terms and homology-derived restraints. The optimization method implements a combination of conjugate gradient minimization and molecular dynamics with simulated annealing. The testing set consists of 717 pairs of known protein structures differing by a single mutation. Twelve variations of the scoring function were tested in three different environments of the mutated residue. The best-performing protocol optimizes all the atoms of the mutated residue, with respect to a scoring function that includes molecular mechanics energy terms for bond distances, angles, dihedral angles, peptide bond planarity, and non-bonded atomic contacts represented by Lennard-Jones potential, dihedral angle restraints derived from the aligned homologous structure, and a statistical potential for non-bonded atomic interactions extracted from a large set of known protein structures. The current method compares favorably with other tested approaches, especially when predicting long and flexible side-chains. In addition to the thoroughness of the conformational search, sampled degrees of freedom, and the scoring function type, the accuracy of the method was also evaluated as a function of the flexibility of the mutated side-chain, the relative volume change of the mutated residue, and its residue type. The results suggest that further improvement is likely to be achieved by concentrating on the improvement of the scoring function, in addition to or instead of increasing the variety of sampled conformations. 相似文献
16.
We present loop structure prediction results of the intracellular and extracellular loops of four G‐protein‐coupled receptors (GPCRs): bovine rhodopsin (bRh), the turkey β1‐adrenergic (β1Ar), the human β2‐adrenergic (β2Ar) and the human A2a adenosine receptor (A2Ar) in perturbed environments. We used the protein local optimization program, which builds thousands of loop candidates by sampling rotamer states of the loops' constituent amino acids. The candidate loops are discriminated between with our physics‐based, all‐atom energy function, which is based on the OPLS force field with implicit solvent and several correction terms. For relevant cases, explicit membrane molecules are included to simulate the effect of the membrane on loop structure. We also discuss a new sampling algorithm that divides phase space into different regions, allowing more thorough sampling of long loops that greatly improves results. In the first half of the paper, loop prediction is done with the GPCRs' transmembrane domains fixed in their crystallographic positions, while the loops are built one‐by‐one. Side chains near the loops are also in non‐native conformations. The second half describes a full homology model of β2Ar using β1Ar as a template. No information about the crystal structure of β2Ar was used to build this homology model. We are able to capture the architecture of short loops and the very long second extracellular loop, which is key for ligand binding. We believe this the first successful example of an RMSD validated, physics‐based loop prediction in the context of a GPCR homology model. Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献
17.
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 . 相似文献
18.
19.
A novel method for the refinement of misfolded protein structures is proposed in which the properties of the solvent environment are oscillated in order to mimic some aspects of the role of molecular chaperones play in protein folding in vivo. Specifically, the hydrophobicity of the solvent is cycled by repetitively altering the partial charges on solvent molecules (water) during a molecular dynamics simulation. During periods when the hydrophobicity of the solvent is increased, intramolecular hydrogen bonding and secondary structure formation are promoted. During periods of increased solvent polarity, poorly packed regions of secondary structures are destabilized, promoting structural rearrangement. By cycling between these two extremes, the aim is to minimize the formation of long-lived intermediates. The approach has been applied to the refinement of structural models of three proteins generated by using the ROSETTA procedure for ab initio structure prediction. A significant improvement in the deviation of the model structures from the corresponding experimental structures was observed. Although preliminary, the results indicate computationally mimicking some functions of molecular chaperones in molecular dynamics simulations can promote the correct formation of secondary structure and thus be of general use in protein folding simulations and in the refinement of structural models of small- to medium-size proteins. 相似文献