首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Modeling of loops in protein structures   总被引:27,自引:0,他引:27       下载免费PDF全文
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.  相似文献   

2.
Detection of homologous proteins by an intermediate sequence search   总被引:2,自引:0,他引:2  
We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).  相似文献   

3.
CODA, an algorithm for predicting the variable regions in proteins, combines FREAD a knowledge based approach, and PETRA, which constructs the region ab initio. FREAD selects from a database of protein structure fragments with environmentally constrained substitution tables and other rule-based filters. FREAD was parameterized and tested on over 3000 loops. The average root mean square deviation ranged from 0.78 A for three residue loops to 3.5 A for eight residue loops on a nonhomologous test set. CODA clusters the predictions from the two independent programs and makes a consensus prediction that must pass a set of rule-based filters. CODA was parameterized and tested on two unrelated separate sets of structures that were nonhomologous to one another and those found in the FREAD database. The average root mean square deviation in the test set ranged from 0.76 A for three residue loops to 3.09 A for eight residue loops. CODA shows a general improvement in loop prediction over PETRA and FREAD individually. The improvement is far more marked for lengths six and upward, probably as the predictive power of PETRA becomes more important. CODA was further tested on several model structures to determine its applicability to the modeling situation. A web server of CODA is available at http://www-cryst.bioc.cam.ac.uk/~charlotte/Coda/search_coda.html.  相似文献   

4.
The prediction of protein 3D structures close to insertions and deletions or, more generally, loop prediction, is still one of the major challenges in homology modeling projects. In this article, we developed ranking criteria and selection filters to improve knowledge-based loop predictions. These criteria were developed and optimized for a test data set containing 678 insertions and deletions. The examples are, in principle, predictable from the used loop database with an RMSD < 1 A and represent realistic modeling situations. Four noncorrelated criteria for the selection of fragments are evaluated. A fast prefilter compares the distance between the anchor groups in the template protein with the stems of the fragments. The RMSD of the anchor groups is used for fitting and ranking of the selected loop candidates. After fitting, repulsive close contacts of loop candidates with the template protein are used for filtering, and fragments with backbone torsion angles, which are unfavorable according to a knowledge-based potential, are eliminated. By the combined application of these filter criteria to the test set, it was possible to increase the percentage of predictions with a global RMSD < 1 A to over 50% among the first five ranks, with average global RMSD values for the first rank candidate that are between 1.3 and 2.2 A for different loop lengths. Compared to other examples described in the literature, our large numbers of test cases are not self-predictions, where loops are placed in a protein after a peptide loop has been cut out, but are attempts to predict structural changes that occur in evolution when a protein is affected by insertions and deletions.  相似文献   

5.
Park H  Ko J  Joo K  Lee J  Seok C  Lee J 《Proteins》2011,79(9):2725-2734
The rapid increase in the number of experimentally determined protein structures in recent years enables us to obtain more reliable protein tertiary structure models than ever by template-based modeling. However, refinement of template-based models beyond the limit available from the best templates is still needed for understanding protein function in atomic detail. In this work, we develop a new method for protein terminus modeling that can be applied to refinement of models with unreliable terminus structures. The energy function for terminus modeling consists of both physics-based and knowledge-based potential terms with carefully optimized relative weights. Effective sampling of both the framework and terminus is performed using the conformational space annealing technique. This method has been tested on a set of termini derived from a nonredundant structure database and two sets of termini from the CASP8 targets. The performance of the terminus modeling method is significantly improved over our previous method that does not employ terminus refinement. It is also comparable or superior to the best server methods tested in CASP8. The success of the current approach suggests that similar strategy may be applied to other types of refinement problems such as loop modeling or secondary structure rearrangement.  相似文献   

6.
A similarity between average distance maps (Kikuchiet al., 1988a)—that is, predicted contact maps of two tertiary structurally homologous proteins—is examined. Comparisons of shapes of average distance maps (we refer to this as ADM) are made by superpositions of ADMs for two homologous proteins. Also, we compare shapes of actual contact maps for the pair of proteins. We search a optimal superposition mode of each pair of maps showing that two proteins are most similar. It is concluded that two ADMs are also similar when actual tertiary structures between two proteins show similarity. A criterion for similarity of maps is also proposed. The possibility of application of this method to detect weak homology between protein structures is discussed.  相似文献   

7.
DeWeese-Scott C  Moult J 《Proteins》2004,55(4):942-961
Experimental protein structures often provide extensive insight into the mode and specificity of small molecule binding, and this information is useful for understanding protein function and for the design of drugs. We have performed an analysis of the reliability with which ligand-binding information can be deduced from computer model structures, as opposed to experimentally derived ones. Models produced as part of the CASP experiments are used. The accuracy of contacts between protein model atoms and experimentally determined ligand atom positions is the main criterion. Only comparative models are included (i.e., models based on a sequence relationship between the protein of interest and a known structure). We find that, as expected, contact errors increase with decreasing sequence identity used as a basis for modeling. Analysis of the causes of errors shows that sequence alignment errors between model and experimental template have the most deleterious effect. In general, good, but not perfect, insight into ligand binding can be obtained from models based on a sequence relationship, providing there are no alignment errors in the model. The results support a structural genomics strategy based on experimental sampling of structure space so that all protein domains can be modeled on the basis of 30% or higher sequence identity.  相似文献   

8.
Template-based methods for predicting protein structure provide models for a significant portion of the protein but often contain insertions or chain ends (InsEnds) of indeterminate conformation. The local structure prediction "problem" entails modeling the InsEnds onto the rest of the protein. A well-known limit involves predicting loops of ≤12 residues in crystal structures. However, InsEnds may contain as many as ~50 amino acids, and the template-based model of the protein itself may be imperfect. To address these challenges, we present a free modeling method for predicting the local structure of loops and large InsEnds in both crystal structures and template-based models. The approach uses single amino acid torsional angle "pivot" moves of the protein backbone with a C(β) level representation. Nevertheless, our accuracy for loops is comparable to existing methods. We also apply a more stringent test, the blind structure prediction and refinement categories of the CASP9 tournament, where we improve the quality of several homology based models by modeling InsEnds as long as 45 amino acids, sizes generally inaccessible to existing loop prediction methods. Our approach ranks as one of the best in the CASP9 refinement category that involves improving template-based models so that they can function as molecular replacement models to solve the phase problem for crystallographic structure determination.  相似文献   

9.
Protein structure prediction is based mainly on the modeling of proteins by homology to known structures; this knowledgebased approach is the most promising method to date. Although it is used in the whole area of protein research, no general rules concerning the quality and applicability of concepts and procedures used in homology modeling have been put forward yet. Therefore, the main goal of the present work is to provide tools for the assessment of accuracy of modeling at a given level of sequence homology. A large set of known structures from different conformational and functional classes, but various degrees of homology was selected. Pairwise structure superpositions were performed. Starting with the definition of the structurally conserved regions and determination of topologically correct sequence alignments, we correlated geometrical properties with sequence homology (defined by the 250 PAM Dayhoff Matrix) and identity. It is shown that both the topological differences of the protein backbones and the relative positions of corresponding side chains diverge with decreasing sequence identity. Below 50% identity, the deviation in regions that are structurally not conserved continually increases, thus implying that with decreasing sequence identity modeling has to take into account more and more structurally diverging loop regions that are difficult to predict. © 1993 Wiley-Liss, Inc.  相似文献   

10.
To promote application of a single chain variable region fragment (sFv) in immunoglobulins, a sFv gene was connected to an IgG1 Fc gene, designated as a sFvc gene, and used for transfection of Sp2/0. As a result, the sFvc protein was found to be secreted in a dimeric form. It is thus felt that the sFvc protein, which mimicks the shape of a naturally occurring antibody, can be simple and useful to reproduce divalency and Fc-associated effecter functions as seen in a natural antibody.Abbreviations Abbreviations sFv single chain variable region fragment - Fc constant region of immunoglobulin - sFvc single chain variable region fragment with an Fc region  相似文献   

11.
This paper provides an unbiased comparison of four commercially available programs for loop sampling, Prime, Modeler, ICM, and Sybyl, each of which uses a different modeling protocol. The study assesses the quality of results and examines the relative strengths and weaknesses of each method. The set of loops to be modeled varied in length from 4-12 amino acids. The approaches used for loop modeling can be classified into two methodologies: ab initio loop generation (Modeler and Prime) and database searches (Sybyl and ICM). Comparison of the modeled loops to the native structures was used to determine the accuracy of each method. All of the protocols returned similar results for short loop lengths (four to six residues), but as loop length increased, the quality of the results varied among the programs. Prime generated loops with RMSDs <2.5 A for loops up to 10 residues, while the other three methods met the 2.5 A criteria at seven-residue loops. Additionally, the ability of the software to utilize disulfide bonds and X-ray crystal packing influenced the quality of the results. In the final analysis, the top-ranking loop from each program was rarely the loop with the lowest RMSD with respect to the native template, revealing a weakness in all programs to correctly rank the modeled loops.  相似文献   

12.
Left-handed polyproline II (PPII) helices commonly occur in globular proteins in segments of 4-8 residues. This paper analyzes the structural conservation of PPII-helices in 3 protein families: serine proteinases, aspartic proteinases, and immunoglobulin constant domains. Calculations of the number of conserved segments based on structural alignment of homologous molecules yielded similar results for the PPII-helices, the alpha-helices, and the beta-strands. The PPII-helices are consistently conserved at the level of 100-80% in the proteins with sequence identity above 20% and RMS deviation of structure alignments below 3.0 A. The most structurally important PPII segments are conserved below this level of sequence identity. These results suggest that the PPII-helices, in addition to the other 2 secondary structure classes, should be identified as part of structurally conserved regions in proteins. This is supported by similar values for the local RMS deviations of the aligned segments for the structural classes of PPII-helices, alpha-helices, and beta-strands. The PPII-helices are shown to participate in supersecondary elements such as PPII-helix/alpha-helix. The conservation of PPII-helices depends on the conservation of a supersecondary element as a whole. PPII-helices also form links, possibly flexible, in the interdomain regions. The role of the PPII-helices in model building by homology is 2-fold; they serve as additional conserved elements in the structure allowing improvement of the accuracy of a model and provide correct chain geometry for modeling of the segments equivalenced to them in a target sequence. The improvement in model building is demonstrated in 2 test studies.  相似文献   

13.
Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue‐based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the "structurally variable" regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of 'variable' regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.  相似文献   

15.
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects – to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.  相似文献   

16.
A novel method has been developed for acquiring the correct alignment of a query sequence against remotely homologous proteins by extracting structural information from profiles of multiple structure alignment. A systematic search algorithm combined with a group of score functions based on sequence information and structural information has been introduced in this procedure. A limited number of top solutions (15,000) with high scores were selected as candidates for further examination. On a test-set comprising 301 proteins from 75 protein families with sequence identity less than 30%, the proportion of proteins with completely correct alignment as first candidate was improved to 39.8% by our method, whereas the typical performance of existing sequence-based alignment methods was only between 16.1% and 22.7%. Furthermore, multiple candidates for possible alignment were provided in our approach, which dramatically increased the possibility of finding correct alignment, such that completely correct alignments were found amongst the top-ranked 1000 candidates in 88.3% of the proteins. With the assistance of a sequence database, completely correct alignment solutions were achieved amongst the top 1000 candidates in 94.3% of the proteins. From such a limited number of candidates, it would become possible to identify more correct alignment using a more time-consuming but more powerful method with more detailed structural information, such as side-chain packing and energy minimization, etc. The results indicate that the novel alignment strategy could be helpful for extending the application of highly reliable methods for fold identification and homology modeling to a huge number of homologous proteins of low sequence similarity. Details of the methods, together with the results and implications for future development are presented.  相似文献   

17.
Modeling mutations in protein structures   总被引:2,自引:0,他引:2  
We describe an automated method for the modeling of point mutations in protein structures. The protein is represented by all non-hydrogen atoms. The scoring function consists of several types of physical potential energy terms and homology-derived restraints. The optimization method implements a combination of conjugate gradient minimization and molecular dynamics with simulated annealing. The testing set consists of 717 pairs of known protein structures differing by a single mutation. Twelve variations of the scoring function were tested in three different environments of the mutated residue. The best-performing protocol optimizes all the atoms of the mutated residue, with respect to a scoring function that includes molecular mechanics energy terms for bond distances, angles, dihedral angles, peptide bond planarity, and non-bonded atomic contacts represented by Lennard-Jones potential, dihedral angle restraints derived from the aligned homologous structure, and a statistical potential for non-bonded atomic interactions extracted from a large set of known protein structures. The current method compares favorably with other tested approaches, especially when predicting long and flexible side-chains. In addition to the thoroughness of the conformational search, sampled degrees of freedom, and the scoring function type, the accuracy of the method was also evaluated as a function of the flexibility of the mutated side-chain, the relative volume change of the mutated residue, and its residue type. The results suggest that further improvement is likely to be achieved by concentrating on the improvement of the scoring function, in addition to or instead of increasing the variety of sampled conformations.  相似文献   

18.
The protein structures of six comparative modeling targets were predicted in a procedure that relied on improved energy minimization, without empirical rules, to position all new atoms. The structures of human nucleoside diphosphate kinase NM23-H2, HPr from Mycoplasma capricolum, 2Fe-2S ferredoxin from Haloarcula marismortui, eosinophil-derived neurotoxin (EDN), mouse cellular retinoic acid protein I (CRABP1), and P450eryf were predicted with root mean square deviations on Cα atoms of 0.69, 0.73, 1.11, 1.48, 1.69, and 1.73 Å, respectively, compared to the target crystal structures. These differences increased as the sequence similarity between the target and parent proteins decreased from about 60 to 20% identity. More residues were predicted than form the common region shared by the two crystal structures. In most cases insertions or deletions between the target and the related protein of known structure were not correctly positioned. One two residue insertion in CRABP1 was predicted in the correct conformation, while a nine residue insertion in EDN was predicted in the correct spatial region, although not in the correct conformation. The positions of common cofactors and their binding sites were predicted correctly, even when overall sequence similarity was low. © 1995 Wiley-Liss, Inc.  相似文献   

19.
  1. Download : Download high-res image (153KB)
  2. Download : Download full-size image
  相似文献   

20.
Cozzetto D  Tramontano A 《Proteins》2005,58(1):151-157
Comparative modeling is the method of choice, whenever applicable, for protein structure prediction, not only because of its higher accuracy compared to alternative methods, but also because it is possible to estimate a priori the quality of the models that it can produce, thereby allowing the usefulness of a model for a given application to be assessed beforehand. By and large, the quality of a comparative model depends on two factors: the extent of structural divergence between the target and the template and the quality of the sequence alignment between the two protein sequences. The latter is usually derived from a multiple sequence alignment (MSA) of as many proteins of the family as possible, and its accuracy depends on the number and similarity distribution of the sequences of the protein family. Here we describe a method to evaluate the expected difficulty, and by extension accuracy, of a comparative model on the basis of the MSA used to build it. The parameter that we derive is used to compare the results obtained in the last two editions of the Critical Assessment of Methods for Structure Prediction (CASP) experiment as a function of the difficulty of the modeling exercise. Our analysis demonstrates that the improvement in the scope and quality of comparative models between the two experiments is largely due to the increased number of available protein sequences and to the consequent increased chance that a large and appropriately spaced set of protein sequences homologous to the proteins of interest is available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号