首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
Li W  Liu Z  Lai L 《Biopolymers》1999,49(6):481-495
A general problem in comparative modeling and protein design is the conformational evaluation of loops with a certain sequence in specific environmental protein frameworks. Loops of different sequences and structures on similar scaffolds are common in the Protein Data Bank (PDB). In order to explore both structural and sequential diversity of them, a data base of loops connecting similar secondary structure fragments is constructed by searching the data base of families of structurally similar proteins and PDB. A total of 84 loop families having 2-13 residues are found among the well-determined structures of resolution better than 2.5 A. Eight alpha-alpha, 20 alpha-beta, 19 beta-alpha, and 37 beta-beta families are identified. Every family contains more than 5 loop motifs. In each family, no loops share same sequence and all the frameworks are well superimposed. Forty-three new loop classes are distinguished in the data base. The structural variability of loops in homologous proteins are examined and shown in 44 families. Motif families are characterized with geometric parameters and sequence patterns. The conformations of loops in each family are clustered into subfamilies using average linkage cluster analysis method. Information such as geometric properties, sequence profile, sequential and structural variability in loop, structural alignment parameters, sequence similarities, and clustering results are provided. Correlations between the conformation of loops and loop sequence, motif sequence, and global sequence of PDB chain are examined in order to find how loop structures depend on their sequences and how they are affected by the local and global environment. Strong correlations (R > 0.75) are only found in 24 families. The best R value is 0.98. The data base is available through the Internet.  相似文献   

2.
Linkers that connect repeating secondary structures fall into conformational classes based on distance and main-chain torsion clustering. A data set of 300 unique protein chains with low pairwise sequence identity was clustered into only a few groups representing the preferred motifs. The linkers of two to eight residues for the nonredundant data set are designated H-Ln-H, H-Ln-E, E-Ln-H, E-Ln-E, where n is the length, H stands for alpha-helices, and E for beta-strands. Most of the clusters identified here corroborate earlier findings. However, 19 new clusters are identified in this paper, with many of them having seven and eight residue linkers. In our first analysis, the secondary structures flanking the linkers are both interacting and noninteracting and there is no precise angle of orientation between them. A second analysis was performed on a set of proteins with restricted orientations for the flanking elements, namely, mainly alpha class of proteins with orthogonal architecture. Two definite clusters are identified, one corresponding to linkers of orthogonal helices and the other to linkers of antiparallel helices. Loops forming binding sites or involved in catalytic activity are important determinants of the function of proteins. Although the structural conservation of the residues around the catalytic triad of serine proteases has been studied widely, there has not been a systematic analysis of the conformation of the loops that contain them. Residues of the catalytic triad reside in the linkers of beta-strands, with varying lengths of more than eight residues. Here, we analyze the structural conservation of such linkers by superposition, and observe a conserved structural feature of the linkers incorporating each of the three residues of the catalytic triad.  相似文献   

3.
Li W  Liang S  Wang R  Lai L  Han Y 《Protein engineering》1999,12(12):1075-1086
Loops are structurally variable regions, but the secondary structural elements bracing loops are often conserved. Motifs with similar secondary structures exist in the same and different protein families. In this study, we made an all-PDB-based analysis and produced 495 motif families accessible from the Internet. Every motif family contains some variable loops spanning a common framework (a pair of secondary structures). The diversity of loops and the convergence of frameworks were examined. In addition, we also identified 119 loops with conformational changes in different PDB files. These materials can give some directions for functional loop design and flexible docking.  相似文献   

4.
The SLoop database of supersecondary fragments, first described by Donate et al. (Protein Sci., 1996, 5, 2600-2616), contains protein loops, classified according to structural similarity. The database has recently been updated and currently contains over 10 000 loops up to 20 residues in length, which cluster into over 560 well populated classes. The database can be found at http://www-cryst.bioc.cam.ac.uk/~sloop. In this paper, we identify conserved structural features such as main chain conformation and hydrogen bonding. Using the original approach of Rufino and co-workers (1997), the correct structural class is predicted with the highest SLoop score for 35% of loops. This rises to 65% by considering the three highest scoring class predictions and to 75% in the top five scoring class predictions. Inclusion of residues from the neighbouring secondary structures and use of substitution tables derived using a reduced definition of secondary structure increase these prediction accuracies to 58, 78 and 85%, respectively. This suggests that capping residues can stabilize the loop conformation as well as that of the secondary structure. Further increases are achieved if only well-populated classes are considered in the prediction. These results correspond to an average loop root mean square deviation of between 0.4 and 2.6 A for loops up to five residues in length.  相似文献   

5.
We describe a web server, which provides easy access to the SLoop database of loop conformations connecting elements of protein secondary structure. The loops are classified according to their length, the type of bounding secondary structures and the conformation of the mainchain. The current release of the database consists of over 8000 loops of up to 20 residues in length. A loop prediction method, which selects conformers on the basis of the sequence and the positions of the elements of secondary structure, is also implemented. These web pages are freely accessible over the internet at http://www-cryst.bioc.cam.ac.uk/ approximately sloop.  相似文献   

6.
Loops are integral components of protein structures, providing links between elements of secondary structure, and in many cases contributing to catalytic and binding sites. The conformations of short loops are now understood to depend primarily on their amino acid sequences. In contrast, the structural determinants of longer loops involve hydrogen-bonding and packing interactions within the loop and with other parts of the protein. By searching solved protein structures for regions similar in main chain conformation to the antigen-binding loops in immunoglobulins, we identified medium-sized loops of similar structure in unrelated proteins, and compared the determinants of their conformations. For loops that form compact substructures the major determinant of the conformation is the formation of hydrogen bonds to inward-pointing main chain atoms. For loops that have more extended conformations, the major determinant of their structure is the packing of a particular residue or residues against the rest of the protein. The following picture emerges: Medium-sized loops of similar conformation are stabilized by similar interactions. The groups that interact with the loop have very similar spatial dispositions with respect to the loop. However, the residues that provide these interactions may arise from dissimilar parts of the protein: The conformation of the loop requires certain interactions that the protein may provide in a variety of ways.  相似文献   

7.
The conformations of protein loops from a non-redundant set of 347 proteins with less than 25% sequence homology have been studied in order to clarify the topological variation of protein loops. Loops have been classified in five types (α-α, α-β, β-α, β-links and β-hairpins) depending on the secondary structures that they embrace. Four variables have been used to describe the loop geometry (3 angles and the end-to-end distance between the secondary structures embracing the loop). Loops with well defined geometry are identified by means of the internal dependency between the geometrical variables by application of information-entropy theory. From this it has been deduced that loops formed by less than 10 residues show an intrinsic dependency on the geometric variables that defines the motif shape. In this interval the most stable loops are found for short connections owing to the entropic energy analysed.  相似文献   

8.
A data collection which merges protein structural and sequence information is described. Structural superpositions amongst proteins with similar main-chain fold were performed or collected from the literature. Sequences taken from the protein primary structure databases were associated with the multiple structural alignments providing they were at least 50% homologous in residue identity to one of the structural sequences and at least 50% of the structural sequence residues were alignable. Such restrictions allow reasonable confidence that the primary sequences share the conformation of the tertiary structural templates, except in the less conserved loop regions. Multiple structural superpositions were collected for 38 familial groups containing a total of 209 tertiary structures; 45 structures had no superposable mates and were used individually. Other information is also provided as main-chain and side-chain conformational angles, secondary structural assignments and the like. Wedding the primary and tertiary structural data resulted in an 8-fold increase of data bank sequence entries over those associated with the known three-dimensional architectures alone.  相似文献   

9.
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence–structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.  相似文献   

10.
One of the most important and challenging tasks in protein modelling is the prediction of loops, as can be seen in the large variety of existing approaches. Loops In Proteins (LIP) is a database that includes all protein segments of a length up to 15 residues contained in the Protein Data Bank (PDB). In this study, the applicability of LIP to loop prediction in the framework of homology modelling is investigated. Searching the database for loop candidates takes less than 1 s on a desktop PC, and ranking them takes a few minutes. This is an order of magnitude faster than most existing procedures. The measure of accuracy is the root mean square deviation (RMSD) with respect to the main-chain atoms after local superposition of target loop and predicted loop. Loops of up to nine residues length were modelled with a local RMSD <1 A and those of length up to 14 residues with an accuracy better than 2 A. The results were compared in detail with a thoroughly evaluated and tested ab initio method published recently and additionally with two further methods for a small loop test set. The LIP method produced very good predictions. In particular for longer loops it outperformed other methods.  相似文献   

11.
Enzyme function often involves a conformational change. There is a general agreement that loops play a vital role in correctly positioning the catalytically important residues. Nevertheless, predicting the functional loops and most importantly their role in enzyme function remains a difficult task. A major reason for this difficulty is that loops that undergo conformational change are frequently not well conserved in their primary sequence. beta1,4-Galactosyltransferase is one such enzyme. There, the amino acid sequence of a long loop that undergoes a large conformational change upon substrate binding is not well conserved. Our molecular dynamics simulations show that the large conformational change in the long loop is brought about by a second, interacting loop. Interestingly, while the structural change of the second loop is much smaller than that of the long loop, its sequence (particularly glycine residues) is highly conserved. We further examine the generality of the proposition that there are loops that trigger movements but nevertheless show little or no structural changes in crystals. We focus on two other enzymes, enolase and lipase. We chose these enzymes, since they too undergo conformational change upon ligand binding, however, they have different folds and different functions. Through multiple sets of simulations we show that the conformational change of the functional loop(s) is brought about through communication of flexibility by triggering loops that have several glycine residues. We further propose that similar to the conservation of common favorable fold types and structural motifs, evolution has also conserved common "skillful" mechanisms. Mechanisms may be conserved across different folds, sequences and functions, with adaptation to specific enzymatic roles.  相似文献   

12.
An accurate three-dimensional structure is known for papain (1.65 A resolution) and actinidin (1.7 A). A detailed comparison of these two structures was performed to determine the effect of amino acid changes on the conformation. It appeared that, despite only 48% identity in their amino acid sequence, different crystallization conditions and different X-ray data collection techniques, their structures are surprisingly similar with a root-mean-square difference of 0.40 A between 76% of the main-chain atoms (differences less than 3 sigma). Insertions and deletions cause larger differences but they alter the conformation over a very limited range of two to three residues only. Conformations of identical side-chains are generally retained to the same extent as the main-chain conformation. If they do change, this is due to a modified local environment. Several examples are described. Spatial positions of hydrogen bonds are conserved to a greater extent than are the specific groups involved. The greatest structural similarity is found for the active site residues of papain and actinidin, for the internal water molecules and for the main-chain conformation of residues in alpha-helices and anti-parallel beta-sheet structure. This was reflected also in the similarity of the temperature factors. It suggests that the secondary structural elements form the skeleton of the molecule and that their interaction is the main factor in directing the fold of the polypeptide chain. Therefore, substitution of residues in the skeleton will, in general, have the most drastic effect on the conformation of the protein molecule. In papain and actinidin, some main-chain-side-chain hydrogen bonds are also strongly conserved and these may determine the folding of non-repetitive parts of the structure. Furthermore, we included primary structure information for three homologous thiol proteases: stem bromelain, and the cathepsins B and H. By combining the three-dimensional structural information for papain and actinidin with sequence homologies and identities, we conclude that the overall folding pattern of the polypeptide chain is grossly the same in all five proteases, and that they utilize the same catalytic mechanism.  相似文献   

13.
Modeling of loops in protein structures   总被引:27,自引:0,他引:27       下载免费PDF全文
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.  相似文献   

14.
Simultaneous modeling of multiple loops in proteins.   总被引:1,自引:1,他引:0       下载免费PDF全文
The most reliable methods for predicting protein structure are by way of homologous extension, using structural information from a closely related protein, or by "threading" through a set of predefined protein folds ("inverse folding"). Both sets of methods provide a model for the core of the protein--the structurally conserved secondary structures. Due to the large variability both in sequence and size of the loops that connect these secondary structures, they generally cannot be modeled using these techniques. Loop-closure algorithms are aimed at predicting loop structures, given their end-to-end distance. Various such algorithms have been described, and all have been tested by predicting the structure of a single loop in a known protein. In this paper we propose a method, which is based on the bond-scaling-relaxation loop-closure algorithm, for simultaneously predicting the structures of multiple loops, and demonstrate that, for two spatially close loops, simultaneous closure invariably leads to more accurate predictions than sequential closure. The accuracy of the predictions obtained for pairs of loops in the size range of 5-7 residues each is comparable to that obtained by other methods, when predicting the structures of single loops: the RMS deviations from the native conformations of various test cases modeled are approximately 0.6-1.7 A for backbone atoms and 1.1-3.3 A for all-atoms.  相似文献   

15.
A common feature of alpha-helices in proteins is a loop at the C-terminal end, with a characteristic hydrogen bond pattern. It is noted that several loops with the same structural features occur independently of alpha-helices; two are even situated at the loop ends of beta-hairpins. The name paperclip is suggested for loops possessing the appropriate hydrogen bonds. A number of features of paperclips are described: they exist in two classes, depending on the number of residues at the loop end; one class is very much commoner than the other. Two paperclips are found that belong to the common class, except that the main-chain conformation of each is the mirror image of that normally found. The majority of paperclips are shown to have tightly clustered sets of main-chain dihedral angles. These are somewhat similar to, but distinct from, a subgroup of another common family of loops that have been called beta-bulge loops; in the latter, the dihedral angles are also tightly clustered. The high degree of clustering in both cases is likely to be a result of steric constraints associated with hydrogen bond patterns at the ends of loops.  相似文献   

16.
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).  相似文献   

17.
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

18.
Subtilases are members of the family of subtilisin-like serine proteases. Presently, greater than 50 subtilases are known, greater than 40 of which with their complete amino acid sequences. We have compared these sequences and the available three-dimensional structures (subtilisin BPN', subtilisin Carlsberg, thermitase and proteinase K). The mature enzymes contain up to 1775 residues, with N-terminal catalytic domains ranging from 268 to 511 residues, and signal and/or activation-peptides ranging from 27 to 280 residues. Several members contain C-terminal extensions, relative to the subtilisins, which display additional properties such as sequence repeats, processing sites and membrane anchor segments. Multiple sequence alignment of the N-terminal catalytic domains allows the definition of two main classes of subtilases. A structurally conserved framework of 191 core residues has been defined from a comparison of the four known three-dimensional structures. Eighteen of these core residues are highly conserved, nine of which are glycines. While the alpha-helix and beta-sheet secondary structure elements show considerable sequence homology, this is less so for peptide loops that connect the core secondary structure elements. These loops can vary in length by greater than 150 residues. While the core three-dimensional structure is conserved, insertions and deletions are preferentially confined to surface loops. From the known three-dimensional structures various predictions are made for the other subtilases concerning essential conserved residues, allowable amino acid substitutions, disulphide bonds, Ca(2+)-binding sites, substrate-binding site residues, ionic and aromatic interactions, proteolytically susceptible surface loops, etc. These predictions form a basis for protein engineering of members of the subtilase family, for which no three-dimensional structure is known.  相似文献   

19.
We have carried out a systematic analysis in order to evaluate whether Intra-Chain Disulfide Bridged Peptides (ICDBPs) observed in proteins of known three-dimensional structure adopt structurally similar conformations as they may correspond to structural/functional motifs. 406 representative ICDBPs comprising between 3 to 17 amino acid residues could be classified according to peptide sequence length and main-chain secondary structure conformation into 146 classes. ICDBPs comprising 6 amino acid residues are maximally represented in the Protein Data Bank. They also represent the maximum number of main-chain secondary structure conformational classes. Individual ICDBPs in each class represent different protein superfamilies and correspond to different amino acid sequences. We identified 145 ICDBP pairs that had not less-than 0.5 A root mean square deviation value corresponding to their equivalent peptide backbone atoms. We believe these ICDBPs represent structural motifs and possible candidates in order to further explore their structure/function role in the corresponding proteins. The common conformational classes observed for ICDBPs defined according to the main-chain secondary structure conformations; H (helix), B (residue in a isolated beta bridge), C (coil), E (extended beta strand), G (3(10) helix), I (pi helix), S (bend), T (hydrogen-bonded turn) were; "CHHH", "CTTC", "CSSS" and "CSSC" (for ICDBP length 4), "CSSCC" (length 5), "EETTEE", "CCSSCC", "CCSSSC" (length 6), "EETTTEE" (length 7), "EETTTTEE" (length 8), "EEEETTEEEE" (length 10), "EEEETTTEEEE" (length 11) and "EEEETTTTEEEE" (length 12).  相似文献   

20.
The SH3 domain, comprised of approximately 60 residues, is found within a wide variety of proteins, and is a mediator of protein-protein interactions. Due to the large number of SH3 domain sequences and structures in the databases, this domain provides one of the best available systems for the examination of sequence and structural conservation within a protein family. In this study, a large and diverse alignment of SH3 domain sequences was constructed, and the pattern of conservation within this alignment was compared to conserved structural features, as deduced from analysis of eighteen different SH3 domain structures. Seventeen SH3 domain structures solved in the presence of bound peptide were also examined to identify positions that are consistently most important in mediating the peptide-binding function of this domain. Although residues at the two most conserved positions in the alignment are directly involved in peptide binding, residues at most other conserved positions play structural roles, such as stabilizing turns or comprising the hydrophobic core. Surprisingly, several highly conserved side-chain to main-chain hydrogen bonds were observed in the functionally crucial RT-Src loop between residues with little direct involvement in peptide binding. These hydrogen bonds may be important for maintaining this region in the precise conformation necessary for specific peptide recognition. In addition, a previously unrecognized yet highly conserved beta-bulge was identified in the second beta-strand of the domain, which appears to provide a necessary kink in this strand, allowing it to hydrogen bond to both sheets comprising the fold.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号