首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Modeling of loops in protein structures   总被引:27,自引:0,他引:27       下载免费PDF全文
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.  相似文献   

This paper provides an unbiased comparison of four commercially available programs for loop sampling, Prime, Modeler, ICM, and Sybyl, each of which uses a different modeling protocol. The study assesses the quality of results and examines the relative strengths and weaknesses of each method. The set of loops to be modeled varied in length from 4-12 amino acids. The approaches used for loop modeling can be classified into two methodologies: ab initio loop generation (Modeler and Prime) and database searches (Sybyl and ICM). Comparison of the modeled loops to the native structures was used to determine the accuracy of each method. All of the protocols returned similar results for short loop lengths (four to six residues), but as loop length increased, the quality of the results varied among the programs. Prime generated loops with RMSDs <2.5 A for loops up to 10 residues, while the other three methods met the 2.5 A criteria at seven-residue loops. Additionally, the ability of the software to utilize disulfide bonds and X-ray crystal packing influenced the quality of the results. In the final analysis, the top-ranking loop from each program was rarely the loop with the lowest RMSD with respect to the native template, revealing a weakness in all programs to correctly rank the modeled loops.  相似文献   

In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment‐based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ~25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root‐mean‐square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments .  相似文献   

Stumpff-Kane AW  Maksimiak K  Lee MS  Feig M 《Proteins》2008,70(4):1345-1356
Protein structure refinement from comparative models with the goal of predicting structures at near-experimental accuracy remains an unsolved problem. Structure refinement might be achieved with an iterative protocol where the most native-like structure from a set of decoys generated from an initial model in one cycle is used as the starting structure for the next cycle. Conformational sampling based on the coarse-grained SICHO model, atomic level of detail molecular dynamics simulations, and normal-mode analysis is compared in the context of such a protocol. All of the sampling methods can achieve significant refinement close to experimental structures, although the distribution of structures and the ability to reach native-like structures differs greatly. Implications for the practical application of such sampling methods and the requirements for scoring functions in an iterative refinement protocol are analyzed in the context of theoretical predictions for the distribution of protein-like conformations with a random sampling protocol.  相似文献   

Current methods for antibody structure prediction rely on sequence homology to known structures. Although this strategy often yields accurate predictions, models can be stereo‐chemically strained. Here, we present a fully automated algorithm, called AbPredict, that disregards sequence homology, and instead uses a Monte Carlo search for low‐energy conformations built from backbone segments and rigid‐body orientations that appear in antibody molecular structures. We find cases where AbPredict selects accurate loop templates with sequence identity as low as 10%, whereas the template of highest sequence identity diverges substantially from the query's conformation. Accordingly, in several cases reported in the recent Antibody Modeling Assessment benchmark, AbPredict models were more accurate than those from any participant, and the models' stereo‐chemical quality was consistently high. Furthermore, in two blind cases provided to us by crystallographers prior to structure determination, the method achieved <1.5 Ångstrom overall backbone accuracy. Accurate modeling of unstrained antibody structures will enable design and engineering of improved binders for biomedical research directly from sequence. Proteins 2016; 85:30–38. © 2016 Wiley Periodicals, Inc.  相似文献   

Five models have been built by the ICM method for the Comparative Modeling section of the Meeting on the Critical Assessment of Techniques for Protein Structure Prediction. The targets have homologous proteins with known three-dimensional structure with sequence identity ranging from 25 to 77%. After alignment of the target sequence with the related three-dimensional structure, the modeling procedure consists of two subproblems: side-chain prediction and loop prediction. The ICM method approaches these problems with the following steps: (1) a starting model is created based on the homologous structure with the conserved portion fixed and the noncon-served portion having standard covalent geometry and free torsion angles; (2) the Biased Probability Monte Carlo (BPMC) procedure is applied to search the subspaces of either all the nonconservative side-chain torsion angles or torsion angles in a loop backbone and surrounding side chains. A special algorithm was designed to generate low-energy loop deformations. The BPMC procedure globally optimizes the energy function consisting of ECEPP/3 and solvation energy terms. Comparison of the predictions with the NMR or crystallographic solutions reveals a high proportion of correctly predicted side chains. The loops were not correctly predicted because imprinted distortions of the backbone increased the energy of the near-native conformation and thus made the solution unrecognizable. Interestingly, the energy terms were found to be reliable and the sampling of conformational space sufficient. The implications of this finding for the strategies of future comparative modeling are discussed. © 1995 Wiley-Liss, Inc.  相似文献   

The protein structures of six comparative modeling targets were predicted in a procedure that relied on improved energy minimization, without empirical rules, to position all new atoms. The structures of human nucleoside diphosphate kinase NM23-H2, HPr from Mycoplasma capricolum, 2Fe-2S ferredoxin from Haloarcula marismortui, eosinophil-derived neurotoxin (EDN), mouse cellular retinoic acid protein I (CRABP1), and P450eryf were predicted with root mean square deviations on Cα atoms of 0.69, 0.73, 1.11, 1.48, 1.69, and 1.73 Å, respectively, compared to the target crystal structures. These differences increased as the sequence similarity between the target and parent proteins decreased from about 60 to 20% identity. More residues were predicted than form the common region shared by the two crystal structures. In most cases insertions or deletions between the target and the related protein of known structure were not correctly positioned. One two residue insertion in CRABP1 was predicted in the correct conformation, while a nine residue insertion in EDN was predicted in the correct spatial region, although not in the correct conformation. The positions of common cofactors and their binding sites were predicted correctly, even when overall sequence similarity was low. © 1995 Wiley-Liss, Inc.  相似文献   

Modeling of protein loops by simulated annealing.   总被引:1,自引:5,他引:1       下载免费PDF全文
A method is presented to model loops of protein to be used in homology modeling of proteins. This method employs the ESAP program of Higo et al. (Higo, J., Collura, V., & Garnier, J., 1992, Biopolymers 32, 33-43) and is based on a fast Monte Carlo simulation and a simulated annealing algorithm. The method is tested on different loops or peptide segments from immunoglobulin, bovine pancreatic trypsin inhibitor, and bovine trypsin. The predicted structure is obtained from the ensemble average of the coordinates of the Monte Carlo simulation at 300 K, which exhibits the lowest internal energy. The starting conformation of the loop prior to modeling is chosen to be completely extended, and a closing harmonic potential is applied to N, CA, C, and O atoms of the terminal residues. A rigid geometry potential of Robson and Platt (1986, J. Mol. Biol. 188, 259-281) with a united atom representation is used. This we demonstrate to yield a loop structure with good hydrogen bonding and torsion angles in the allowed regions of the Ramachandran map. The average accuracy of the modeling evaluated on the eight modeled loops is 1 A root mean square deviation (rmsd) for the backbone atoms and 2.3 A rmsd for all heavy atoms.  相似文献   

We developed a method for structure characterization of assembly components by iterative comparative protein structure modeling and fitting into cryo-electron microscopy (cryoEM) density maps. Specifically, we calculate a comparative model of a given component by considering many alternative alignments between the target sequence and a related template structure while optimizing the fit of a model into the corresponding density map. The method relies on the previously developed Moulder protocol that iterates over alignment, model building, and model assessment. The protocol was benchmarked using 20 varied target-template pairs of known structures with less than 30% sequence identity and corresponding simulated density maps at resolutions from 5A to 25A. Relative to the models based on the best existing sequence profile alignment methods, the percentage of C(alpha) atoms that are within 5A of the corresponding C(alpha) atoms in the superposed native structure increases on average from 52% to 66%, which is half-way between the starting models and the models from the best possible alignments (82%). The test also reveals that despite the improvements in the accuracy of the fitness function, this function is still the bottleneck in reducing the remaining errors. To demonstrate the usefulness of the protocol, we applied it to the upper domain of the P8 capsid protein of rice dwarf virus that has been studied by cryoEM at 6.8A. The C(alpha) root-mean-square deviation of the model based on the remotely related template, bluetongue virus VP7, improved from 8.7A to 6.0A, while the best possible model has a C(alpha) RMSD value of 5.3A. Moreover, the resulting model fits better into the cryoEM density map than the initial template structure. The method is being implemented in our program MODELLER for protein structure modeling by satisfaction of spatial restraints and will be applicable to the rapidly increasing number of cryoEM density maps of macromolecular assemblies.  相似文献   

Comparative or homology modeling of a target protein based on sequence similarity to a protein with known structure is widely used to provide structural models of proteins. Depending on the target‐template similarity these model structures may contain regions of limited structural accuracy. In principle, molecular dynamics (MD) simulations can be used to refine protein model structures and also to model loop regions that connect structurally conserved regions but it is limited by the currently accessible simulation time scales. A recently developed biasing potential replica exchange (BP‐REMD) method was used to refine loops and complete decoy protein structures at atomic resolution including explicit solvent. In standard REMD simulations several replicas of a system are run in parallel at different temperatures allowing exchanges at preset time intervals. In a BP‐REMD simulation replicas are controlled by various levels of a biasing potential to reduce the energy barriers associated with peptide backbone dihedral transitions. The method requires much fewer replicas for efficient sampling compared with T‐REMD. Application of the approach to several protein loops indicated improved conformational sampling of backbone dihedral angle of loop residues compared to conventional MD simulations. BP‐REMD refinement simulations on several test cases starting from decoy structures deviating significantly from the native structure resulted in final structures in much closer agreement with experiment compared to conventional MD simulations. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

We present an automated method for modeling backbones of protein loops. The method samples a database of phi i + 1 and psi i angles constructed from a nonredundant version of the Protein Data Bank (PDB). The dihedral angles phi i + 1 and psi i completely define the backbone conformation of a dimer when standard bond lengths, bond angles, and a trans planar peptide configuration are used. For the 400 possible dimers resulting from 20 natural amino acids, a list of allowed phi i + 1, psi i pairs for each dimer is created by pooling all such pairs from the loop segments of each protein in the nonredundant version of the PDB. Starting from the N-terminus of the loop sequence, conformations are generated by assigning randomly selected pairs of phi i + 1, psi i for each dimer from the respective pool using standard bond lengths, bond angles, and a trans peptide configuration. We use this database to simulate protein loops of lengths varying from 5 to 11 amino acids in five proteins of known three-dimensional structures. Typically, 10,000-50,000 models are simulated for each protein loop and are evaluated for stereochemical consistency. Depending on the length and sequence of a given loop, 50-80% of the models generated have no stereochemical strain in the backbone atoms. We demonstrate that, when simulated loops are extended to include flanking residues from homologous segments, only very few loops from an ensemble of sterically allowed conformations orient the flanking segments consistent with the protein topology. The presence of near-native backbone conformations for loops from five different proteins suggests the completeness of the dimeric database for use in modeling loops of homologous proteins. Here, we take advantage of this observation to design a method that filters near-native loop conformations from an ensemble of sterically allowed conformations. We demonstrate that our method eliminates the need for a loop-closure algorithm and hence allows for the use of topological constraints of the homologous proteins or disulfide constraints to filter near-native loop conformations.  相似文献   

Loops are regions of nonrepetitive conformation connecting regular secondary structures. We identified 2,024 loops of one to eight residues in length, with acceptable main-chain bond lengths and peptide bond angles, from a database of 223 protein and protein-domain structures. Each loop is characterized by its sequence, main-chain conformation, and relative disposition of its bounding secondary structures as described by the separation between the tips of their axes and the angle between them. Loops, grouped according to their length and type of their bounding secondary structures, were superposed and clustered into 161 conformational classes, corresponding to 63% of all loops. Of these, 109 (51% of the loops) were populated by at least four nonhomologous loops or four loops sharing a low sequence identity. Another 52 classes, including 12% of the loops, were populated by at least three loops of low sequence similarity from three or fewer nonhomologous groups. Loop class suprafamilies resulting from variations in the termini of secondary structures are discussed in this article. Most previously described loop conformations were found among the classes. New classes included a 2:4 type IV hairpin, a helix-capping loop, and a loop that mediates dinucleotide-binding. The relative disposition of bounding secondary structures varies among loop classes, with some classes such as beta-hairpins being very restrictive. For each class, sequence preferences as key residues were identified; those most frequently at these conserved positions than in proteins were Gly, Asp, Pro, Phe, and Cys. Most of these residues are involved in stabilizing loop conformation, often through a positive phi conformation or secondary structure capping. Identification of helix-capping residues and beta-breakers among the highly conserved positions supported our decision to group loops according to their bounding secondary structures. Several of the identified loop classes were associated with specific functions, and all of the member loops had the same function; key residues were conserved for this purpose, as is the case for the parvalbumin-like calcium-binding loops. A significant number, but not all, of the member loops of other loop classes had the same function, as is the case for the helix-turn-helix DNA-binding loops. This article provides a systematic and coherent conformational classification of loops, covering a broad range of lengths and all four combinations of bounding secondary structure types, and supplies a useful basis for modelling of loop conformations where the bounding secondary structures are known or reliably predicted.  相似文献   

Gao C  Stern HA 《Proteins》2007,68(1):67-75
We perform a systematic examination of the ability of several different high-resolution, atomic-detail scoring functions to discriminate native conformations of loops in membrane proteins from non-native but physically reasonable, or "decoy," conformations. Decoys constructed from changing a loop conformation while keeping the remainder of the protein fixed are a challenging test of energy function accuracy. Nevertheless, the best of the energy functions we examined recognized the native structure as lowest in energy around half the time, and consistently chose it as a low-energy structure. This suggests that the best of present energy functions, even without a representation of the lipid bilayer, are of sufficient accuracy to give reasonable confidence in predictions of membrane protein structure. We also constructed homology models for each structure, using other known structures in the same protein family as templates. Homology models were constructed using several scoring functions and modeling programs, but with a comparable sampling effort for each procedure. Our results indicate that the quality of sequence alignment is probably the most important factor in model accuracy for sequence identity from 20-40%; one can expect a reasonably accurate model for membrane proteins when sequence identity is greater than 30%, in agreement with previous studies. Most errors are localized in loop regions, which tend to be found outside the lipid bilayer. For the most discriminative energy functions, it appears that errors are most likely due to lack of sufficient sampling, although it should be stressed that present energy functions are still far from perfectly reliable.  相似文献   

Achieving atomic-level accuracy in comparative protein models is limited by our ability to refine the initial, homolog-derived model closer to the native state. Despite considerable effort, progress in developing a generalized refinement method has been limited. In contrast, methods have been described that can accurately reconstruct loop conformations in native protein structures. We hypothesize that loop refinement in homology models is much more difficult than loop reconstruction in crystal structures, in part, because side-chain, backbone, and other structural inaccuracies surrounding the loop create a challenging sampling problem; the loop cannot be refined without simultaneously refining adjacent portions. In this work, we single out one sampling issue in an artificial but useful test set and examine how loop refinement accuracy is affected by errors in surrounding side-chains. In 80 high-resolution crystal structures, we first perturbed 6-12 residue loops away from the crystal conformation, and placed all protein side chains in non-native but low energy conformations. Even these relatively small perturbations in the surroundings made the loop prediction problem much more challenging. Using a previously published loop prediction method, median backbone (N-Calpha-C-O) RMSD's for groups of 6, 8, 10, and 12 residue loops are 0.3/0.6/0.4/0.6 A, respectively, on native structures and increase to 1.1/2.2/1.5/2.3 A on the perturbed cases. We then augmented our previous loop prediction method to simultaneously optimize the rotamer states of side chains surrounding the loop. Our results show that this augmented loop prediction method can recover the native state in many perturbed structures where the previous method failed; the median RMSD's for the 6, 8, 10, and 12 residue perturbed loops improve to 0.4/0.8/1.1/1.2 A. Finally, we highlight three comparative models from blind tests, in which our new method predicted loops closer to the native conformation than first modeled using the homolog template, a task generally understood to be difficult. Although many challenges remain in refining full comparative models to high accuracy, this work offers a methodical step toward that goal.  相似文献   

We present a novel, knowledge-based method for the side-chain addition step in protein structure modeling. The foundation of the method is a conditional probability equation, which specifies the probability that a side-chain will occupy a specific rotamer state, given a set of evidence about the rotamer states adopted by the side-chains at aligned positions in structurally homologous crystal structures. We demonstrate that our method increases the accuracy of homology model side-chain addition when compared with the widely employed practice of preserving the side-chain conformation from the homology template to the target at conserved residue positions. Furthermore, we demonstrate that our method accurately estimates the probability that the correct rotamer state has been selected. This interesting result implies that our method can be used to understand the reliability of each and every side-chain in a protein homology model.  相似文献   

Metal ions play an essential role in stabilizing protein structures and contributing to protein function. Ions such as zinc have well‐defined coordination geometries, but it has not been easy to take advantage of this knowledge in protein structure prediction efforts. Here, we present a computational method to predict structures of zinc‐binding proteins given knowledge of the positions of zinc‐coordinating residues in the amino acid sequence. The method takes advantage of the “atom‐tree” representation of molecular systems and modular architecture of the Rosetta3 software suite to incorporate explicit metal ion coordination geometry into previously developed de novo prediction and loop modeling protocols. Zinc cofactors are tethered to their interacting residues based on coordination geometries observed in natural zinc‐binding proteins. The incorporation of explicit zinc atoms and their coordination geometry in both de novo structure prediction and loop modeling significantly improves sampling near the native conformation. The method can be readily extended to predict protein structures bound to other metal and/or small chemical cofactors with well‐defined coordination or ligation geometry.  相似文献   

Template-based methods for predicting protein structure provide models for a significant portion of the protein but often contain insertions or chain ends (InsEnds) of indeterminate conformation. The local structure prediction "problem" entails modeling the InsEnds onto the rest of the protein. A well-known limit involves predicting loops of ≤12 residues in crystal structures. However, InsEnds may contain as many as ~50 amino acids, and the template-based model of the protein itself may be imperfect. To address these challenges, we present a free modeling method for predicting the local structure of loops and large InsEnds in both crystal structures and template-based models. The approach uses single amino acid torsional angle "pivot" moves of the protein backbone with a C(β) level representation. Nevertheless, our accuracy for loops is comparable to existing methods. We also apply a more stringent test, the blind structure prediction and refinement categories of the CASP9 tournament, where we improve the quality of several homology based models by modeling InsEnds as long as 45 amino acids, sizes generally inaccessible to existing loop prediction methods. Our approach ranks as one of the best in the CASP9 refinement category that involves improving template-based models so that they can function as molecular replacement models to solve the phase problem for crystallographic structure determination.  相似文献   

We present results of structural modeling of the variable fragment of Mα2,3, an antibody capable of neutralizing all short snake toxins. Three different methods were used to model the hypervariable loops: the conformational search algorithm CONGEN (Bruccoleri and Karplus, Biopolymers 26:137–168, 1987), high-temperature molecular dynamics (Bruccoleri and Karplus, Biopolymers 29:1847–1862, 1990), and a combined knowledge-based and energy-based algorithm (Martin et al., Proc. Natl. Acad. Sci. USA 86:9268–9272, 1989). Ninety plausible conformations were generated and were clustered into 13 classes. The clustering results indicate that there was little overlap of the conformational space explored by the different methods. Canonical loop structures were found by all methods for two of the loops, in agreement with previously established empirical modeling criteria. Nine of the 13 classes of structure were rejected on the ground of their lacking common features of antibody combining-site structure. The remaining four models were refined using restrained molecular dynamics. It was found that interconversion between the four resulting structures is possible with no significant energy barriers, suggesting that they are in thermodynamic equilibrium at 300 K. Features of the combining-site structure likely to be particularly important for antigen binding are discussed. © 1996 Wiley-Liss, Inc.  相似文献   

A computational strategy for homology modeling, using several protein structures comparison, is described. This strategy implies a formalized definition of structural blocks common to several protein structures, a new program to compare these structures simultaneously, and the use of consensus matrices to improve sequence alignment between the structurally known and target proteins. Applying this method to cytochromes P450 led to the definition of 15 substructures common to P450cam, P450BM3, and P450terp, and to proposing a 3D model of P450eryF. Proteins 28:388–404, 1997 © 1997 Wiley-Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号