首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS: We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY: http://cmb.genomics.sinica.edu.tw  相似文献   

2.
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

3.
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment‐based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ~25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root‐mean‐square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments .  相似文献   

4.
We describe a web server, which provides easy access to the SLoop database of loop conformations connecting elements of protein secondary structure. The loops are classified according to their length, the type of bounding secondary structures and the conformation of the mainchain. The current release of the database consists of over 8000 loops of up to 20 residues in length. A loop prediction method, which selects conformers on the basis of the sequence and the positions of the elements of secondary structure, is also implemented. These web pages are freely accessible over the internet at http://www-cryst.bioc.cam.ac.uk/ approximately sloop.  相似文献   

5.
Methods for the prediction of protein function from structure are of growing importance in the age of structural genomics. Here, we focus on the problem of identifying sites of potential serine protease inhibitor interactions on the surface of proteins of known structure. Given that there is no sequence conservation within canonical loops from different inhibitor families, we first compare representative loops to all fragments of equal length among proteins of known structure by calculating main-chain RMS deviation. Fragments with RMS deviation below a certain threshold (hits) are removed if residues have solvent accessibilities appreciably lower than those observed in the search structure. These remaining hits are further filtered to remove those occurring largely within secondary structure elements. Likely functional significance is restricted further by considering only extracellular protein domains. By comparing different canonical loop structures to the protein structure database, we show that the method is able to detect previously known inhibitors. In addition, we discuss potentially new canonical loop structures found in secreted hydrolases, toxins, viral proteins, cytokines and other proteins. We discuss the possible functional significance of several of the examples found, and comment on implications for the prediction of function from protein 3D structure.  相似文献   

6.
Assembling short fragments from known structures has been a widely used approach to construct novel protein structures. To what extent there exist structurally similar fragments in the database of known structures for short fragments of a novel protein is a question that is fundamental to this approach. This work addresses that question for seven-, nine- and 15-residue fragments. For each fragment size, two databases, a query database and a template database of fragments from high-quality protein structures in SCOP20 and SCOP90, respectively, were constructed. For each fragment in the query database, the template database was scanned to find the lowest r.m.s.d. fragment among non-homologous structures. For seven-residue fragments, there is a 99% probability that there exists such a fragment within 0.7 A r.m.s.d. for each loop fragment. For nine-residue fragments there is a 96% probability of a fragment within 1 A r.m.s.d., while for 15-residue fragments there is a 91% probability of a fragment within 2 A r.m.s.d. These results, which update previous studies, show that there exists sufficient coverage to model even a novel fold using fragments from the Protein Data Bank, as the current database of known structures has increased enormously in the last few years. We have also explored the use of a grid search method for loop homology modeling and make some observations about the use of a grid search compared with a database search for the loop modeling problem.  相似文献   

7.
The SLoop database of supersecondary fragments, first described by Donate et al. (Protein Sci., 1996, 5, 2600-2616), contains protein loops, classified according to structural similarity. The database has recently been updated and currently contains over 10 000 loops up to 20 residues in length, which cluster into over 560 well populated classes. The database can be found at http://www-cryst.bioc.cam.ac.uk/~sloop. In this paper, we identify conserved structural features such as main chain conformation and hydrogen bonding. Using the original approach of Rufino and co-workers (1997), the correct structural class is predicted with the highest SLoop score for 35% of loops. This rises to 65% by considering the three highest scoring class predictions and to 75% in the top five scoring class predictions. Inclusion of residues from the neighbouring secondary structures and use of substitution tables derived using a reduced definition of secondary structure increase these prediction accuracies to 58, 78 and 85%, respectively. This suggests that capping residues can stabilize the loop conformation as well as that of the secondary structure. Further increases are achieved if only well-populated classes are considered in the prediction. These results correspond to an average loop root mean square deviation of between 0.4 and 2.6 A for loops up to five residues in length.  相似文献   

8.
Loops are regions of nonrepetitive conformation connecting regular secondary structures. We identified 2,024 loops of one to eight residues in length, with acceptable main-chain bond lengths and peptide bond angles, from a database of 223 protein and protein-domain structures. Each loop is characterized by its sequence, main-chain conformation, and relative disposition of its bounding secondary structures as described by the separation between the tips of their axes and the angle between them. Loops, grouped according to their length and type of their bounding secondary structures, were superposed and clustered into 161 conformational classes, corresponding to 63% of all loops. Of these, 109 (51% of the loops) were populated by at least four nonhomologous loops or four loops sharing a low sequence identity. Another 52 classes, including 12% of the loops, were populated by at least three loops of low sequence similarity from three or fewer nonhomologous groups. Loop class suprafamilies resulting from variations in the termini of secondary structures are discussed in this article. Most previously described loop conformations were found among the classes. New classes included a 2:4 type IV hairpin, a helix-capping loop, and a loop that mediates dinucleotide-binding. The relative disposition of bounding secondary structures varies among loop classes, with some classes such as beta-hairpins being very restrictive. For each class, sequence preferences as key residues were identified; those most frequently at these conserved positions than in proteins were Gly, Asp, Pro, Phe, and Cys. Most of these residues are involved in stabilizing loop conformation, often through a positive phi conformation or secondary structure capping. Identification of helix-capping residues and beta-breakers among the highly conserved positions supported our decision to group loops according to their bounding secondary structures. Several of the identified loop classes were associated with specific functions, and all of the member loops had the same function; key residues were conserved for this purpose, as is the case for the parvalbumin-like calcium-binding loops. A significant number, but not all, of the member loops of other loop classes had the same function, as is the case for the helix-turn-helix DNA-binding loops. This article provides a systematic and coherent conformational classification of loops, covering a broad range of lengths and all four combinations of bounding secondary structure types, and supplies a useful basis for modelling of loop conformations where the bounding secondary structures are known or reliably predicted.  相似文献   

9.
We describe a fast ab initio method for modeling local segments in protein structures. The algorithm is based on a divide and conquer approach and uses a database of precalculated look-up tables, which represent a large set of possible conformations for loop segments of variable length. The target loop is recursively decomposed until the resulting conformations are small enough to be compiled analytically. The algorithm, which is not restricted to any specific loop length, generates a ranked set of loop conformations in 20-180 s on a desktop PC. The prediction quality is evaluated in terms of global RMSD. Depending on loop length the top prediction varies between 1.06 A RMSD for three-residue loops and 3.72 A RMSD for eight-residue loops. Due to its speed the method may also be useful to generate alternative starting conformations for complex simulations.  相似文献   

10.
Deane CM  Blundell TL 《Proteins》2000,40(1):135-144
We present a fast ab initio method for the prediction of local conformations in proteins. The program, PETRA, selects polypeptide fragments from a computer-generated database (APD) encoding all possible peptide fragments up to twelve amino acids long. Each fragment is defined by a representative set of eight straight phi/psi pairs, obtained iteratively from a trial set by calculating how fragments generated from them represent the protein databank (PDB). Ninety-six percent (96%) of length five fragments in crystal structures, with a resolution better than 1.5 A and less than 25% identity, have a conformer in the database with less than 1 A root-mean-square deviation (rmsd). In order to select segments from APD, PETRA uses a set of simple rule-based filters, thus reducing the number of potential conformations to a manageable total. This reduced set is scored and sorted using rmsd fit to the anchor regions and a knowledge-based energy function dependent on the sequence to be modelled. The best scoring fragments can then be optimized by minimization of contact potentials and rmsd fit to the core model. The quality of the prediction made by PETRA is evaluated by calculating both the differences in rmsd and backbone torsion angles between the final model and the native fragment. The average rmsd ranges from 1.4 A for three residue loops to 3.9 A for eight residue loops.  相似文献   

11.
An object-oriented database system has been developed which is being used to store protein structure data. The database can be queried using the logic programming language Prolog or the query language Daplex. Queries retrieve information by navigating through a network of objects which represent the primary, secondary and tertiary structures of proteins. Routines written in both Prolog and Daplex can integrate complex calculations with the retrieval of data from the database, and can also be stored in the database for sharing among users. Thus object-oriented databases are better suited to prototyping applications and answering complex queries about protein structure than relational databases. This system has been used to find loops of varying length and anchor positions when modelling homologous protein structures.  相似文献   

12.
Mönnigmann M  Floudas CA 《Proteins》2005,61(4):748-762
The structure prediction of loops with flexible stem residues is addressed in this article. While the secondary structure of the stem residues is assumed to be known, the geometry of the protein into which the loop must fit is considered to be unknown in our methodology. As a consequence, the compatibility of the loop with the remainder of the protein is not used as a criterion to reject loop decoys. The loop structure prediction with flexible stems is more difficult than fitting loops into a known protein structure in that a larger conformational space has to be covered. The main focus of the study is to assess the precision of loop structure prediction if no information on the protein geometry is available. The proposed approach is based on (1) dihedral angle sampling, (2) structure optimization by energy minimization with a physically based energy function, (3) clustering, and (4) a comparison of strategies for the selection of loops identified in (3). Steps (1) and (2) have similarities to previous approaches to loop structure prediction with fixed stems. Step (3) is based on a new iterative approach to clustering that is tailored for the loop structure prediction problem with flexible stems. In this new approach, clustering is not only used to identify conformers that are likely to be close to the native structure, but clustering is also employed to identify far-from-native decoys. By discarding these decoys iteratively, the overall quality of the ensemble and the loop structure prediction is improved. Step (4) provides a comparative study of criteria for loop selection based on energy, colony energy, cluster density, and a hybrid criterion introduced here. The proposed method is tested on a large set of 3215 loops from proteins in the Pdb-Select25 set and to 179 loops from proteins from the Casp6 experiment.  相似文献   

13.
Li W  Liang S  Wang R  Lai L  Han Y 《Protein engineering》1999,12(12):1075-1086
Loops are structurally variable regions, but the secondary structural elements bracing loops are often conserved. Motifs with similar secondary structures exist in the same and different protein families. In this study, we made an all-PDB-based analysis and produced 495 motif families accessible from the Internet. Every motif family contains some variable loops spanning a common framework (a pair of secondary structures). The diversity of loops and the convergence of frameworks were examined. In addition, we also identified 119 loops with conformational changes in different PDB files. These materials can give some directions for functional loop design and flexible docking.  相似文献   

14.
前期的相关研究发现mRNA二级结构中存在对蛋白质折叠速率的重要影响因素.而mRNA二级结构中普遍存在着各种复杂的环结构,这些环结构是否对蛋白质折叠速率也有重要的影响呢?不同的环结构对蛋白质折叠速率的影响是否相同呢?基于此想法,建立了一个包含mRNA内部环、发夹环、膨胀环和多分支环等环结构信息和相应蛋白质折叠速率的数据库.对于数据库中的每一个蛋白质,计算了mRNA二级结构中各种环结构碱基含量、配对碱基含量及单链碱基含量等参量,分析了各参量与相应蛋白质折叠速率的相关性.结果显示,各种环结构碱基含量与蛋白质折叠速率均呈极显著或显著正相关.说明mRNA环结构对蛋白质折叠速率有重要的影响.进一步,把蛋白质按照不同折叠类型或不同二级结构类型分组后,对每一组蛋白质重复上述的分析工作.结果表明,对不同类蛋白质,mRNA的各种环结构对其相应蛋白质折叠速率的影响存在着显著差异.上述研究将为进一步开展有关mRNA和蛋白质折叠速率的研究奠定理论基础.  相似文献   

15.
A bank of 13,563 loops from three to eight amino acid residues long, representing motifs between two consecutive regular secondary structures, has been derived from protein structures presenting less than 95 % sequence identity. Statistical analyses of occurrences of conformations and residues revealed length-dependent over-representations of particular amino acids (glycine, proline, asparagine, serine, and aspartate) and conformations (alphaL, epsilon, betaPregions of the Ramachandran plot). A position-dependent distribution of these occurrences was observed for N and C-terminal residues, which are correlated to the nature of the flanking regions. Loops of the same length were clustered into statistically meaningful families on the basis of their backbone structures when placed in a common reference frame, independent of the flanks. These clusters present significantly different distributions of sequence, conformations, and endpoint residue Calphadistances. On the basis of the sequence-structure correlation of this clustering, an automatic loop modeling algorithm was developed. Based on the knowledge of its sequence and of its flank backbone structures each query loop is assigned to a family and target loop supports are selected in this family. The support backbones of these target loops are then adjusted on flanking structures by partial exploration of the conformational space. Loop closure is performed by energy minimization for each support and the final model is chosen among connected supports based upon energy criteria. The quality of the prediction is evaluated by the root-mean-square deviation (rmsd) between the final model and the native loops when the whole bank is re-attributed on itself with a Jackknife test. This average rmsd ranges from 1.1 A for three-residue loops to 3.8 A for eight-residue loops. A few poorly predicted loops are inescapable, considering the high level of diversity in loops and the lack of environment data. To overcome such modeling problems, a statistical reliability score was assigned for each prediction. This score is correlated to the quality of the prediction, in terms of rmsd, and thus improves the selection accuracy of the model. The algorithm efficiency was compared to CASP3 target loop predictions. Moreover, when tested on a test loop bank, this algorithm was shown to be robust when the loops are not precisely delimited, therefore proving to be a useful tool in practice for protein modeling.  相似文献   

16.
Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9A for a 2.7-state model on the basis of fragments of length 7-0.76A for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1A compared to over 20 states per residue needed previously.For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.  相似文献   

17.
Modeling of loops in protein structures   总被引:27,自引:0,他引:27       下载免费PDF全文
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.  相似文献   

18.
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re‐evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2Å RMSD, compared to an average of over 10Å for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
One of the most important and challenging tasks in protein modelling is the prediction of loops, as can be seen in the large variety of existing approaches. Loops In Proteins (LIP) is a database that includes all protein segments of a length up to 15 residues contained in the Protein Data Bank (PDB). In this study, the applicability of LIP to loop prediction in the framework of homology modelling is investigated. Searching the database for loop candidates takes less than 1 s on a desktop PC, and ranking them takes a few minutes. This is an order of magnitude faster than most existing procedures. The measure of accuracy is the root mean square deviation (RMSD) with respect to the main-chain atoms after local superposition of target loop and predicted loop. Loops of up to nine residues length were modelled with a local RMSD <1 A and those of length up to 14 residues with an accuracy better than 2 A. The results were compared in detail with a thoroughly evaluated and tested ab initio method published recently and additionally with two further methods for a small loop test set. The LIP method produced very good predictions. In particular for longer loops it outperformed other methods.  相似文献   

20.
Lee J  Kim SY  Joo K  Kim I  Lee J 《Proteins》2004,56(4):704-714
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号