首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Li W  Liang S  Wang R  Lai L  Han Y 《Protein engineering》1999,12(12):1075-1086
Loops are structurally variable regions, but the secondary structural elements bracing loops are often conserved. Motifs with similar secondary structures exist in the same and different protein families. In this study, we made an all-PDB-based analysis and produced 495 motif families accessible from the Internet. Every motif family contains some variable loops spanning a common framework (a pair of secondary structures). The diversity of loops and the convergence of frameworks were examined. In addition, we also identified 119 loops with conformational changes in different PDB files. These materials can give some directions for functional loop design and flexible docking.  相似文献   

2.
The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.  相似文献   

3.
Most of the hairpin, internal and junction loops that appear single-stranded in standard RNA secondary structures form recurrent 3D motifs, where non-Watson–Crick base pairs play a central role. Non-Watson–Crick base pairs also play crucial roles in tertiary contacts in structured RNA molecules. We previously classified RNA base pairs geometrically so as to group together those base pairs that are structurally similar (isosteric) and therefore able to substitute for each other by mutation without disrupting the 3D structure. Here, we introduce a quantitative measure of base pair isostericity, the IsoDiscrepancy Index (IDI), to more accurately determine which base pair substitutions can potentially occur in conserved motifs. We extract and classify base pairs from a reduced-redundancy set of RNA 3D structures from the Protein Data Bank (PDB) and calculate centroids (exemplars) for each base combination and geometric base pair type (family). We use the exemplars and IDI values to update our online Basepair Catalog and the Isostericity Matrices (IM) for each base pair family. From the database of base pairs observed in 3D structures we derive base pair occurrence frequencies for each of the 12 geometric base pair families. In order to improve the statistics from the 3D structures, we also derive base pair occurrence frequencies from rRNA sequence alignments.  相似文献   

4.
The tRNA anticodon loops always comprise seven nucleotides and is involved in many recognition processes with proteins and RNA fragments. We have investigated the nature and the possible interactions between the first (32) and last (38) residues of the loop on the basis of the available sequences and crystal structures. The data demonstrate the conservation of a bifurcated hydrogen bond interaction between residues 32 and 38, located at the stem/loop junction. This interaction leads to the formation of a non-canonical base-pair which is preserved in the known crystal structures of tRNA/synthetase complexes. Among the tRNA and tDNA sequences, 93 % of the 32.38 oppositions can be assigned to two families of isosteric base-pairs, one with a large (86 %) and the other with a much smaller (7 %) population. The remainder (7 %) of the oppositions have been assigned to a third family due to the lack of evidence for assigning them into the first two sets. In all families, the Y32.R38 base-pairs are not isosteric upon reversal (like the sheared G.A or wobble G.U pairs), explaining the strong conservation of a pyrimidine at position 32. Thus, the 32.38 interaction extends the sequence signature of the anticodon loop beyond the conserved U-turn at position 33 and the usually modified purine at position 37. A comparison with other loops containing both a singly hydrogen-bonded base-pair and a U-turn suggests that the 32.38 pair could be involved in the formation of a base triple with a residue in a ribosomal RNA component. It is also observed that two crystal structures of ribozymes (hammerhead and leadzyme) present similar base-pairs at the cleavage site.  相似文献   

5.
A thermodynamic study of unusually stable RNA and DNA hairpins.   总被引:11,自引:0,他引:11       下载免费PDF全文
V P Antao  S Y Lai    I Tinoco  Jr 《Nucleic acids research》1991,19(21):5901-5905
About 70% of the RNA tetra-loop sequences identified in ribosomal RNAs from different organisms fall into either (UNCG) or (GNRA) families (where N = A, C, G, or U; and R = A or G). RNA hairpins with these loop sequences form unusually stable tetra-loop structures. We have studied the RNA hairpin GGAC(UUCG)GUCC and several sequence variants to determine the effect of changing the loop sequence and the loop-closing base pair on the thermodynamic stability of (UNCG) tetra-loops. The hairpin GGAG(CUUG)CUCC with the conserved loop G(CUUG)C was also unusually stable. We have determined melting temperatures (Tm), and obtained thermodynamic parameters for DNA hairpins with sequences analogous to stable RNA hairpins with (UNCG), C(GNRA)G, C(GAUA)G, and G(CUUG)C loops. DNA hairpins with (TTCG), (dUdUCG), and related sequences in the loop, unlike their RNA counterparts, did not form unusually stable hairpins. However, DNA hairpins with the consensus loop sequence C(GNRA)G were very stable compared to hairpins with C(TTTT)G or C(AAAA)G loops. The C(GATA)G and G(CTTG)C loops were also extra stable. The relative stabilities of the unusually stable DNA hairpins are similar to those observed for their RNA analogs.  相似文献   

6.
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence–structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.  相似文献   

7.
Homology based 3D structural model of the immunodominant major surface antigen OmpC from Salmonella typhi, an obligatory human pathogen, was built to understand the possible unique conformational features of its antigenic loops with respect to other immunologically cross reacting porins. The homology model was built based on the known crystal structures of the E. coli porins OmpF and PhoE. Structure based sequence alignment helped to define the structurally conserved regions (SCRs). The SCR regions of OmpC were modelled using the coordinates of corresponding regions from reference proteins. Surface exposed variable regions were modelled based on the sequence similarity and loop search in PDB. Structural refinement based on symmetry restrained energy minimization resulted in an agreeable model for the trimer of OmpC. The resulting model was compared with other porin structures, having b-barrel fold with 16 transmembrane beta-strands, and found that the variable regions are unique in terms of sequence and structure. A ranking of the loops taking into account the antigenic index, the sequence variability, the surface accessibility in the context of the trimer, and the structural variability suggests that loop 4 (151-172), loop 5 (194-218) and loop 6 (237-264) are the best ranked B-cell epitopes. The model provides possible explanations for the functional and unique immunological properties associated with the surface exposed regions and outlines the implications for structure based experimental design.  相似文献   

8.
Raval S  Gowda SB  Singh DD  Chandra NR 《Glycobiology》2004,14(12):1247-1263
Lectins are known to be important for many biological processes, due to their ability to recognize cell surface carbohydrates with high specificity. Plant lectins have been model systems to study protein-carbohydrate recognition, because individually they exhibit high sensitivity and as a group large diversity in recognizing carbohydrate structures. Although extensive studies have been carried out for legume lectins that have led to interesting insights into the sequence determinants of sugar recognition in them, frameworks with such specific correlations are not available for other plant lectin families. This study reports a large-scale data acquisition and extensive analysis of sequences and structures of beta-prism-I or jacalin-related lectins (JRLs) and shows that hypervariability in the binding site loops generates carbohydrate recognition diversity, a strategy analogous to that in legume lectins. Analyses of the size, conformation, and sequence variability in key regions reveal the existence of a common theme, encoded as a set of structural features over a common scaffold, in defining specificity. This study also points to the remarkable range of domain architectures, often arising out of gene duplication events in lectins of this family. The data analyzed here also indicate a spectacular variety of quaternary associations possible in this family of lectins that have implications for glycan recognition. These results thus provide sequence-structure-function correlations, an understanding of the molecular basis of carbohydrate recognition by beta-prism-I lectins, and also a rationale for engineering specific recognition capabilities in relevant molecules.  相似文献   

9.
10.
Dengler U  Siddiqui AS  Barton GJ 《Proteins》2001,42(3):332-344
The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.  相似文献   

11.
Understanding the conformational propensities of proteins is key to solving many problems in structural biology and biophysics. The co‐variation of pairs of mutations contained in multiple sequence alignments of protein families can be used to build a Potts Hamiltonian model of the sequence patterns which accurately predicts structural contacts. This observation paves the way to develop deeper connections between evolutionary fitness landscapes of entire protein families and the corresponding free energy landscapes which determine the conformational propensities of individual proteins. Using statistical energies determined from the Potts model and an alignment of 2896 PDB structures, we predict the propensity for particular kinase family proteins to assume a “DFG‐out” conformation implicated in the susceptibility of some kinases to type‐II inhibitors, and validate the predictions by comparison with the observed structural propensities of the corresponding proteins and experimental binding affinity data. We decompose the statistical energies to investigate which interactions contribute the most to the conformational preference for particular sequences and the corresponding proteins. We find that interactions involving the activation loop and the C‐helix and HRD motif are primarily responsible for stabilizing the DFG‐in state. This work illustrates how structural free energy landscapes and fitness landscapes of proteins can be used in an integrated way, and in the context of kinase family proteins, can potentially impact therapeutic design strategies.  相似文献   

12.
Hairpin loops belong to the most important structural motifs in folded nucleic acids. The d(GNA) sequence in DNA can form very stable trinucleotide hairpin loops depending, however, strongly on the closing base pair. Replica-exchange molecular dynamics (REMD) were employed to study hairpin folding of two DNA sequences, d(gcGCAgc) and d(cgGCAcg), with the same central loop motif but different closing base pairs starting from single-stranded structures. In both cases, conformations of the most populated conformational cluster at the lowest temperature showed close agreement with available experimental structures. For the loop sequence with the less stable G:C closing base pair, an alternative loop topology accumulated as second most populated conformational state indicating a possible loop structural heterogeneity. Comparative-free energy simulations on induced loop unfolding indicated higher stability of the loop with a C:G closing base pair by ~3 kcal mol(-1) (compared to a G:C closing base pair) in very good agreement with experiment. The comparative energetic analysis of sampled unfolded, intermediate and folded conformational states identified electrostatic and packing interactions as the main contributions to the closing base pair dependence of the d(GCA) loop stability.  相似文献   

13.
RNA folding is assumed to be a hierarchical process. The secondary structure of an RNA molecule, signified by base-pairing and stacking interactions between the paired bases, is formed first. Subsequently, the RNA molecule adopts an energetically favorable three-dimensional conformation in the structural space determined mainly by the rotational degrees of freedom associated with the backbone of regions of unpaired nucleotides (loops). To what extent the backbone conformation of RNA loops also results from interactions within the local sequence context or rather follows global optimization constraints alone has not been addressed yet. Because the majority of base stacking interactions are exerted locally, a critical influence of local sequence on local structure appears plausible. Thus, local loop structure ought to be predictable, at least in part, from the local sequence context alone. To test this hypothesis, we used Random Forests on a nonredundant data set of unpaired nucleotides extracted from 97 X-ray structures from the Protein Data Bank (PDB) to predict discrete backbone angle conformations given by the discretized η/θ-pseudo-torsional space. Predictions on balanced sets with four to six conformational classes using local sequence information yielded average accuracies of up to 55%, thus significantly better than expected by chance (17%-25%). Bases close to the central nucleotide appear to be most tightly linked to its conformation. Our results suggest that RNA loop structure does not only depend on long-range base-pairing interactions; instead, it appears that local sequence context exerts a significant influence on the formation of the local loop structure.  相似文献   

14.
A Tramontano  A M Lesk 《Proteins》1992,13(3):231-245
Using database screening techniques we have examined the relationship between antigen-binding loops in immunoglobulins, and regions of similar conformation in other protein families. The conformations of most antigen-binding loops are not unique to immunoglobulins. But in many cases, the geometrical relationship between the loop and the peptides flanking it differs between the immunoglobulins and other structures with the same loop. We assess model building by data base screening, compared with that based on canonical structures.  相似文献   

15.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

16.
In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change ("the evolutionary plasticity of structure") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core.  相似文献   

17.
Recent works has suggested that proteins in early evolution have gone through a stage of closed loop elements with a typical contour size of 25-35 residues. These closed loops are still the elementary protein units to these days, and can be used to spell out protein sequence/structure relationship through a relatively small number of protein prototypes. In this study we aimed to identify the sequences that are used to lock the loop ends to one another, and to show how an extensive dictionary of such locking pairs can be created using positional correlation data from a large proteome database, and structural data from PDB databases. Such a dictionary can be used in reconstructing the evolutionary pathway the modern proteins have gone through, and in identifying closed loop elements in modern proteins with yet unknown 3D structure.  相似文献   

18.
The HSSP database of protein structure-sequence alignments.   总被引:2,自引:0,他引:2       下载免费PDF全文
HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional(1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprot-stored sequences.  相似文献   

19.
The HSSP database of protein structure-sequence alignments.   总被引:4,自引:0,他引:4       下载免费PDF全文
HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.  相似文献   

20.
Huntley MA  Golding GB 《Proteins》2002,48(1):134-140
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号