首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
It is observed that during divergent evolution of two proteins with a common phylogenetic origin, the structural similarity of their backbones is often preserved even when the sequence similarity between them decreases to a virtually undetectable level. Here we analyzed, whether the conservation of structure along evolution involves also the local atomic structures in the interfaces between secondary structural elements. We have used as study case one protein family, the proteasomal subunits, for which 17 crystal structures are known. These include 14 different subunits of Saccharomyces cerevisiae, 2 subunits of Thermoplasma acidophilum and one subunit of Escherichia coli. The structural core of the 17 proteasomal subunits has 23 secondary structural elements. Any two adjacent secondary structural elements form a molecular interface consisting of two molecular patches. We found 61 interfaces that occurred in all 17 subunits. The 3D shape of equivalent molecular patches from different proteasomal subunits were compared by superposition. Our results demonstrate that pairs of equivalent molecular patches show an RMSD which is lower than that of randomly chosen patches from unrelated proteins. This is true even when patch comparisons with identical residues were excluded from the analysis. Furthermore it is known that the sequential dissimilarity is correlated to the RMSD between the backbones of the members of protein families. The question arises whether this is also true for local atomic structures. The results show that the correlation of individual patch RMSD values and local sequence dissimilarities is low and has a wide range from 0 to 0.41, however, it is surprising that there is a good correlation between the average RMSD of all corresponding patches and the global sequence dissimilarity. This average patch RMSD correlates slightly stronger than the C(alpha)-trace RMSD to the global sequence dissimilarity.  相似文献   

2.
The microtubule-associated protein tau is believed to be a natively unfolded molecule with virtually no secondary structure. However, this protein self-associates into filamentous forms in various neurodegenerative diseases. Since these filamentous forms show a remarkable degree of higher order due to their regular widths and periodicity, it is widely speculated that tau does contain secondary structures that come together to form tertiary and quaternary structures in the filamentous form. The purpose of this review is to use the primary sequence of tau along with predictive methods in an effort to identify potential secondary structural elements that could be involved in its normal and pathological functions. Although there are few predicted structural elements in the tau molecule, these analyses should lead to a better understanding of the structure/function relationships that regulate the behavior of tau.  相似文献   

3.
Li W  Liu Z  Lai L 《Biopolymers》1999,49(6):481-495
A general problem in comparative modeling and protein design is the conformational evaluation of loops with a certain sequence in specific environmental protein frameworks. Loops of different sequences and structures on similar scaffolds are common in the Protein Data Bank (PDB). In order to explore both structural and sequential diversity of them, a data base of loops connecting similar secondary structure fragments is constructed by searching the data base of families of structurally similar proteins and PDB. A total of 84 loop families having 2-13 residues are found among the well-determined structures of resolution better than 2.5 A. Eight alpha-alpha, 20 alpha-beta, 19 beta-alpha, and 37 beta-beta families are identified. Every family contains more than 5 loop motifs. In each family, no loops share same sequence and all the frameworks are well superimposed. Forty-three new loop classes are distinguished in the data base. The structural variability of loops in homologous proteins are examined and shown in 44 families. Motif families are characterized with geometric parameters and sequence patterns. The conformations of loops in each family are clustered into subfamilies using average linkage cluster analysis method. Information such as geometric properties, sequence profile, sequential and structural variability in loop, structural alignment parameters, sequence similarities, and clustering results are provided. Correlations between the conformation of loops and loop sequence, motif sequence, and global sequence of PDB chain are examined in order to find how loop structures depend on their sequences and how they are affected by the local and global environment. Strong correlations (R > 0.75) are only found in 24 families. The best R value is 0.98. The data base is available through the Internet.  相似文献   

4.
Protein folding involves the formation of secondary structural elements from the primary sequence and their association with tertiary assemblies. The relation of this primary sequence to a specific folded protein structure remains a central question in structural biology. An increasing body of evidence suggests that variations in homologous sequence ranging from point mutations to substantial insertions or deletions can yield stable proteins with markedly different folds. Here we report the structural characterization of domain IV (D4) and ΔD4 (polypeptides with 222 and 160 amino acids, respectively) that differ by virtue of an N-terminal deletion of 62 amino acids (28% of the overall D4 sequence). The high-resolution crystal structures of the monomeric D4 and the dimeric ΔD4 reveal substantially different folds despite an overall conservation of secondary structure. These structures show that the formation of tertiary structures, even in extended polypeptide sequences, can be highly context dependent, and they serve as a model for structural plasticity in protein isoforms.  相似文献   

5.
We developed a dynamic programming approach of computing common sequence structure patterns among two RNAs given their primary sequences and their secondary structures. Common patterns between two RNAs are defined to share the same local sequential and structural properties. The locality is based on the connections of nucleotides given by their phosphodiester and hydrogen bonds. The idea of interpreting secondary structures as chains of structure elements leads us to develop an efficient dynamic programming approach in time O(nm) and space O(nm), where n and m are the lengths of the RNAs. The biological motivation is given by detecting common, local regions of RNAs, although they do not necessarily share global sequential and structural properties. This might happen if RNAs fold into different structures but share a lot of local, stable regions. Here, we illustrate our algorithm on Hepatitis C virus internal ribosome entry sites. Our method is useful for detecting and describing local motifs as well. An implementation in C++ is available and can be obtained by contacting one of the authors.  相似文献   

6.
Proteins that contain similar structural elements often have analogous functions regardless of the degree of sequence similarity or structure connectivity in space. In general, protein structure comparison (PSC) provides a straightforward methodology for biologists to determine critical aspects of structure and function. Here, we developed a novel PSC technique based on angle-distance image (A-D image) transformation and matching, which is independent of sequence similarity and connectivity of secondary structure elements (SSEs). An A-D image is constructed by utilizing protein secondary structure information. According to various types of SSEs, the mutual SSE pairs of the query protein are classified into three different types of sub-images. Subsequently, corresponding sub-images between query and target protein structures are compared using modified cross-correlation approaches to identify the similarity of various patterns. Structural relationships among proteins are displayed by hierarchical clustering trees, which facilitate the establishment of the evolutionary relationships between structure and function of various proteins.Four standard testing datasets and one newly created dataset were used to evaluate the proposed method. The results demonstrate that proteins from these five datasets can be categorized in conformity with their spatial distribution of SSEs. Moreover, for proteins with low sequence identity that share high structure similarity, the proposed algorithms are an efficient and effective method for structural comparison.  相似文献   

7.
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson''s taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method''s utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.  相似文献   

8.
The most complex problem in studying multi-state protein folding is the determination of the sequence of formation of protein intermediate states. A far more complex issue is to determine at what stages of protein folding its various parts (secondary structure elements) develop. The structure and properties of different intermediate states depend in particular on these parts. An experimental approach, named μ-analysis, which allows understanding the order of formation of structural elements upon folding of a multi-state protein was used in this study. In this approach the same elements of the protein secondary structure are “tested” by substitutions of single hydrophobic amino acids and by incorporation of cysteine bridges. Single substitutions of hydrophobic amino acids contribute to yielding information on the late stages of protein folding while incorporation of ss-bridges allows obtaining data on the initial stages of folding. As a result of such an μ-analysis, we have determined the order of formation of beta-hairpins upon folding of the green fluorescent protein.  相似文献   

9.
Detailed structural analysis of protein necessitates investigation at primary, secondary and tertiary levels, respectively. Insight into protein secondary structures pave way for understanding the type of secondary structural elements involved (α-helices, β-strands etc.), the amino acid sequence that encode the secondary structural elements, number of residues, length and, percentage composition of the respective elements in the protein. Here we present a standalone tool entitled "ExSer" which facilitate an automated extraction of the amino acid sequence that encode for the secondary structural regions of a protein from the protein data bank (PDB) file. AVAILABILITY: ExSer is freely downloadable from http://code.google.com/p/tool-exser/  相似文献   

10.
11.
12.
Structural alignment of proteins is widely used in various fields of structural biology. In order to further improve the quality of alignment, we describe an algorithm for structural alignment based on text modelling techniques. The technique firstly superimposes secondary structure elements of two proteins and then, models the 3D-structure of the protein in a sequence of alphabets. These sequences are utilized by a step-by-step sequence alignment procedure to align two protein structures. A benchmark test was organized on a set of 200 non-homologous proteins to evaluate the program and compare it to state of the art programs, e.g. CE, SAL, TM-align and 3D-BLAST. On average, the results of all-against-all structure comparison by the program have a competitive accuracy with CE and TM-align where the algorithm has a high running speed like 3D-BLAST.  相似文献   

13.
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.  相似文献   

14.
Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.  相似文献   

15.
MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology.  相似文献   

16.
Disulfide bridges have an enormous impact on the structure of a large number of proteins and polypeptides. Understanding the structural basis that regulates their formation may be important for the design of novel peptide-based molecules with a specific fold and stability. Here we report a statistical analysis of the relationships between secondary structure and disulfide bond formation, carried out using a large database of protein structures. Our analyses confirm the observation sporadically reported in previous investigations that cysteine residues located in alpha-helices display a limited tendency to form disulfide bridges. The very low occurrence of the disulfide bond in all alpha-chains compared to all beta-chains indicates that this property is also evident when proteins with different topologies are investigated. Taking advantage of the large database that endorsed the analysis on relatively rare motifs, we demonstrate that cysteine residues embedded in 3(10) helices present a good tendency to form disulfide bonds. This result is somewhat surprising since 3(10) helices are commonly assimilated into alpha-helices. A plausible structural explanation for the observed data has been derived combining analyses of disulfide bond sequence separation and of the length of the different secondary structure elements.  相似文献   

17.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

18.
19.
Recent sequence analysis of complete prokaryotic proteomes suggests that in early evolutionary stages proteins were rather small, of the size 25-35 amino acids. Corroborating evidence comes from protein crystal data, which indicate this size for closed loops--universal structural units of globular proteins. In the latest development we were able to derive and structurally characterize several sequence/structure prototypes apparently representing early protein units. Structurally the prototypes appear as closed loops stabilized by end-to-end van der Waals interactions. While nearly standard in size the loops are highly diverse in terms of their secondary structure. A presentation of the protein as an assembly of descendants of the prototypes, the first of its kind, is described in detail here. The sequence and structure of the ATP-binding subunit of histidine permease of S. typhimurium is shown to contain several modified copies of different prototype elements, closed loops, and, thus, can be spelled as: x-PI-x-PIV-PVI-PII-PVII-x, where PI-PVII are the prototype elements. This study sets up the basic principles for the sequence/structure prototype spelling of globular proteins.  相似文献   

20.
Two geometrical parameters describing the structure of a polypeptide: V-dihedral angle between two sequential peptide bond planes and R-radius of curvature are used for structural classification of polypeptide structure in proteins. The relation between these two parameters was the basis for the definition of the conformational sub-space for early-stage structural forms. The cluster analysis of V and lnR, applied to the selected proteins of well-defined secondary structure (according to DSSP classification) and to proteins without any introductory classified analysis, revealed that several of the discriminated groups of proteins agree with the assumed model of early-stage conformational sub-space. This analysis shows that protein structures may be represented in VR space instead of Phi, Psi angles space, thus lowering the conformational space dimensionality. The VR model allows classification of traditional secondary structure elements as well as different Random Coil motifs, which broadens the range of recognized structural categories (compared to standard secondary structure elements).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号