首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method is described for the prediction of probable folding pathways of globular proteins, based on the analysis of distance maps. It is applicable to proteins of unknown spatial structure but known amino acid sequence as well as to proteins of known structure. It is based on an objective procedure for the determination of the boundary of compact regions that contain high densities of interresidue contacts on the distance map of a globular protein. The procedure can be used both with contact maps derived from a known three-dimensional protein structure and with predicted contact maps computed by means of a statistical procedure from the amino acid sequence alone. The computed contact map can also be used to predict the location of compact short-range structures, viz. -helices and -turns, thereby complementing other statistical predictive procedures. The method provides an objective basis for the derivation of a theoretically predicted pathway of protein folding, proposed by us earlier [Tanaka and Scheraga (1977) Macromolecules10, 291–304; Némethy and Scheraga (1979) Proc. Natl. Acad. Sci., U.S.A.76, 6050–6054].  相似文献   

2.
A method is proposed for predicting the adjacency order in which strands pack in a -sheet in a protein, on the basis of its amino acid sequence alone. The method is based on the construction of a predicted contact map for the protein, in which the probability that various residue pairs are close to each other is computed from statistically determined average distances of residue pairs in globular proteins of known structure. Compact regions, i.e., portions of the sequence with many interresidue contacts, are determined on the map by using an objective search procedure. The proximity of strands in a -sheet is predicted from the density of contacts in compact regions associated with each pair of strands. The most probable -sheet structures are those with the highest density of contacts. The method has been tested by computing the probable strand arrangements in a five-strand -sheet in five proteins or protein domains, containing 62–138 residues. Of the theoretically possible 60 strand arrangements, the method selects two to eight arrangements as most probable; i.e., it leads to a large reduction in the number of possibilities. The native strand arrangement is among those predicted for three of the five proteins. For the other two, it would be included in the prediction by a slight relaxation of the cutoff criteria used to analyze the density of contacts.  相似文献   

3.
To understand the folding mechanism of a protein is one of the goals in bioinformatics study. Nowadays, it is enigmatic and difficult to extract folding information from amino acid sequence using standard bioinformatics techniques or even experimental protocols which can be time consuming. To overcome these problems, we aim to extract the initial folding unit for titin protein (Ig and fnIII domains) by means of inter-residue average distance statistics, Average Distance Map (ADM) and contact frequency analysis (F-value). TI I27 and TNfn3 domains are used to represent the Ig-domain and fnIII-domain, respectively. Beta-strands 2, 3, 5, and 6 are significant for the initial folding processes of TI I27. The central strands of TNfn3 were predicted as a primary folding segment. Known 3D structure and unknown 3D structure domains were investigated by structure or non-structure based multiple sequence alignment, respectively, to learn the conserved hydrophobic residues and predicted compact region relevant to evolution. Our results show good correspondence to experimental data, phi-value and protection factor from H-D exchange experiments. The significance of conserved hydrophobic residues near F-value peaks for structural stability using hydrophobic packing is confirmed. Our prediction methods once again could extract a folding mechanism only knowing the amino acid sequence.  相似文献   

4.
It has been shown for 20 proteins that amino acid residues included into the protein folding nucleus, determined experimentally, are often involved in the theoretically determined amyloidogenic fragments. For 18 proteins, Φ-values indicative of the extent of residue involvement into the folding nucleus are on average higher for amino acid residues within amyloidogenic regions. Amyloidogenic fragments were predicted for 20 proteins by two methods chosen from four on the basis of comparison of prediction of amyloidogenic regions known from experimental data. Since theoretical folding nuclei are detected by the protein three-dimensional structure and amyloidogenic regions by the protein chain primary structure, the detected regularity makes possible predictions of folding nucleation sites on the basis of amino acid sequence.  相似文献   

5.
It is known that the backbone conformation of a protein can be reproduced with precision once a correct contact map (two-dimensional representation showing residue pairs in contact) is given as geometrical constraints. There is, however, no way to infer the correct contact map for a protein of unknown structure. We started with one-dimensional constraints using the quantity N14 (the number of neighboring residues within the radius of 14 Å). Since the plot of N14 along a chain shows a good correlation with the corresponding amino acid sequence, the N14 profile obtained from the X-ray structure is predictable from the sequence. Construction of backbone conformations under a given N14 profile was carried out in the following two steps: (1) a contact map from the N14 profile was produced by taking the product of N14 values of every two residues; (2) backbone conformations were generated by applying the distance geometry technique to distance constraints given by the contact map. If present, disulfide bonds in a protein, as well as the secondary structure, were treated as additional constraints, and both cases with or without the additional information were examined. The method was tested for 11 proteins of known structure, and the results indicated that the reproduced conformation was fairly good, using an X-ray structure for comparison, for small proteins of less than 80 residues long. The basic assumption and effectiveness of the present method were compared with those of previous studies employing the geometrical constraint approach. It has become clear that the specific, one-dimensional information (e.g., N14 profile) is more effective than nonspecific, two-dimensional constraints, such as average interresidue distances between particular types of amino acids. © 1993 Wiley-Liss, Inc.  相似文献   

6.
The primary structure of ovomucoid shows considerable sequence homology at three contiguous regions which form structural domains I, II and III. In order to see whether or not the three domains fold similarly and acquire similar overall native conformation/shape, two fragments A and C were obtained by controlled peptic digestion of ovomucoid. The two fragments were investigated for their chemical composition, molecular weight, anti-tryptic activity, hydrodynamic behaviour, optical properties and acid denaturation. Results on molecular weight, amino acid composition and inhibitory acitivity show that the fragments A and C correspond respectively to domain I-II and domain III. Optical data suggested more exposure of tyrosine residues in the fragments than in the intact molecule. Domain III exists in a compact and globular conformation under native conditions whereas domain I-II and ovomucoid appear to possess asymmetric conformation. Results on acid denaturation show that the process is thermodynamically reversible and that inter-domain separation probably precedes denaturation of domains during acidification of ovomucoid.  相似文献   

7.
Eicosapenta peptide repeats (EPRs) occur exclusively in flowering plant genomes and exhibit very high amino acid residue conservation across occurrence. DNA and amino acid sequence searches yielded no indications about the function due to absence of similarity to known sequences. Tertiary structure of an EPR protein coded by rice (Oryza sativa japonica) cDNA (GI: 32984786) was determined based on ab initio methodology in order to draw clues on functional significance of EPRs. The resultant structure comprised of seven α-helices and thirteen anti-parallel β-sheets. Surface-mapping of conserved residues onto the structure deduced that (i) regions equivalent to β α4- the primary function of EPR protein could be Ca2+ binding, and (iii) the putative EPR Ca2+ binding domain is structurally similar to calcium-binding domains of plant lectins. Additionally, the phylogenetic analysis showed an evolving taxa-specific distribution of EPR proteins observed in some GNA-like lectins.  相似文献   

8.
One of the goals of molecular bioinformatics is decoding amino acid sequences to extract information on the principles of protein folding. However, this is difficult to perform with standard bioinformatics techniques such as multiple sequence alignment and so on. Thus, we propose a technique based on inter-residue average distance statistics to make predictions regarding the protein folding mechanisms of amino acid sequences. Our method involves constructing a kind of predicted contact map called an Average Distance Map (ADM) based on average distance statistics to pinpoint regions of possible folding nuclei for proteins. Only information on the amino acid sequence of a given protein is required for the present method. In this article, we summarize the results of studies using our method to analyze how specific protein sequences affect folding properties. In particular, we present studies on proteins in the phage lysozyme, such as the globin, fatty acid binding protein-like, and the cupredoxin-like fold families. In the present review, we characterize the 3D architectures of these proteins through the properties of the protein ADMs. Furthermore, we combine the information on the conserved residues within the regions predicted by the ADMs with our results obtained so far. Such information may help identify the folding characteristics of each protein. We discuss this possibility in the present review.  相似文献   

9.
The complete amino acid sequence of wheat germ agglutinin isolectin 2 has been determined by the method of sequential Edman degradation and with the aid of the three-dimensional structure known from X-ray crystallography. Peptides ranging from 2 to 18 residues in length were obtained by thermolysin digestion of the S-carboxymethylated protein and purified by gel filtration and high-performance liquid chromatography. The peptide order was established primarily by matching (carboxymethyl)cysteines with the clearly defined half-cystine positions in the X-ray structure, thereby satisfying the disulfide repeat pattern observed in all four isostructural domains (A, B, C, and D) of wheat germ agglutinin, and by examination of amino acid compositions and terminal sequences of ten tryptic peptides. The unique assignment of peptides to these domains was consistent with all invariant half-cystines and glycines, as well as the single tryptophan, the two closely spaced histidines, and a number of other residues clearly identified in the X-ray structure analysis. Discrepancies between the chemical and X-ray sequences lie exclusively in poorly defined regions of the electron density map, at the N- and C-termini, and at the first intercystine loop of each domain. The latter loop was found to be eight instead of six residues in length, thus extending the size of domains A, B, and C from 41 to 43 residues and that of domain D to 42 residues. Regions of extensive interdomain homology, in addition to that of the half-cystines, are clustered at the central portion of each domain fold and are likely to be important for the integrity of the three-dimensional structure of the dimer molecule.  相似文献   

10.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

11.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

12.
Structural and functional relations among thioredoxins of different species   总被引:24,自引:0,他引:24  
Three-dimensional models have been constructed of homologous thioredoxins and protein disulfide isomerases based on the high resolution x-ray crystallographic structure of the oxidized form of Escherichia coli thioredoxin. The thioredoxins, from archebacteria to humans, have 27-69% sequence identity to E. coli thioredoxin. The models indicate that all the proteins have similar three-dimensional structures despite the large variation in amino acid sequences. As expected, residues in the active site region of thioredoxins are highly conserved. These include Asp-26, Ala-29, Trp-31, Cys-32, Gly-33, Pro-34, Cys-35, Asp-61, Pro-76, and Gly-92. Similar residues occur in most protein disulfide isomerase sequences. Most of these residues form the surface around the active site that appears to facilitate interactions with other enzymes. Other structurally important residues are also conserved. A proline at position 40 causes a kink in the alpha-2 helix and thus provides the proper position of the active site residues at the amino end of this helix. Pro-76 is important in maintaining the native structure of the molecule. In addition, residues forming the internal contact surfaces between the secondary structural elements are generally unchanged such as Phe-12, Val-25, and Phe-27.  相似文献   

13.
Theras-oncogene-encoded p21 protein becomes oncogenic if amino acid substitutions occur at critical positions in the polypeptide chain. The most commonly found oncogenic forms contain Val in place of Gly 12 or Leu in place of Gln 61. To determine the effects of these substitutions on the three-dimensional structure of the whole p21 protein, we have performed molecular dynamics calculations on each of these three proteins bound to GDP and magnesium ion to compute the average structures of each of the three forms. Comparisons of the computed average structures shows that both oncogenic forms with Val 12 and Leu 61 differ substantially in structure from that of the wild type (containing Gly 12 and Gln 61) in discrete regions: residues 10–16, 32–47, 55–74, 85–89, 100–110, and 119–134. All of these regions occur in exposed loops, and several of them have already been found to be involved in the cellular functioning of the p21 protein. These regions have also previously been identified as the most flexible domains of the wild-type protein and have been bound to be the same ones that differ in conformation between transforming and nontransforming p21 mutant proteins neither of which binds nucleotide. The two oncogenic forms have similar conformations in their carboxyl-terminal domains, but differ in conformation at residues 32–47 and 55–74. The former region is known to be involved in the interaction with at least three downstream effector target proteins. Thus, differences in structure between the two oncogenic proteins may reflect different relative affinities of each oncogenic protein for each of these effector targets. The latter region, 55–74, is known to be a highly mobile segment of the protein. The results strongly suggest that critical oncogenic amino acid substitutions in the p21 protein cause changes in the structures of vital domains of this protein.  相似文献   

14.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

15.
Subtilases are members of the family of subtilisin-like serine proteases. Presently, greater than 50 subtilases are known, greater than 40 of which with their complete amino acid sequences. We have compared these sequences and the available three-dimensional structures (subtilisin BPN', subtilisin Carlsberg, thermitase and proteinase K). The mature enzymes contain up to 1775 residues, with N-terminal catalytic domains ranging from 268 to 511 residues, and signal and/or activation-peptides ranging from 27 to 280 residues. Several members contain C-terminal extensions, relative to the subtilisins, which display additional properties such as sequence repeats, processing sites and membrane anchor segments. Multiple sequence alignment of the N-terminal catalytic domains allows the definition of two main classes of subtilases. A structurally conserved framework of 191 core residues has been defined from a comparison of the four known three-dimensional structures. Eighteen of these core residues are highly conserved, nine of which are glycines. While the alpha-helix and beta-sheet secondary structure elements show considerable sequence homology, this is less so for peptide loops that connect the core secondary structure elements. These loops can vary in length by greater than 150 residues. While the core three-dimensional structure is conserved, insertions and deletions are preferentially confined to surface loops. From the known three-dimensional structures various predictions are made for the other subtilases concerning essential conserved residues, allowable amino acid substitutions, disulphide bonds, Ca(2+)-binding sites, substrate-binding site residues, ionic and aromatic interactions, proteolytically susceptible surface loops, etc. These predictions form a basis for protein engineering of members of the subtilase family, for which no three-dimensional structure is known.  相似文献   

16.
17.
Theras-oncogene-encoded p21 protein becomes oncogenic if amino acid substitutions occur at critical positions in the polypeptide chain. The most commonly found oncogenic forms contain Val in place of Gly 12 or Leu in place of Gln 61. To determine the effects of these substitutions on the three-dimensional structure of the whole p21 protein, we have performed molecular dynamics calculations on each of these three proteins bound to GDP and magnesium ion to compute the average structures of each of the three forms. Comparisons of the computed average structures shows that both oncogenic forms with Val 12 and Leu 61 differ substantially in structure from that of the wild type (containing Gly 12 and Gln 61) in discrete regions: residues 10–16, 32–47, 55–74, 85–89, 100–110, and 119–134. All of these regions occur in exposed loops, and several of them have already been found to be involved in the cellular functioning of the p21 protein. These regions have also previously been identified as the most flexible domains of the wild-type protein and have been bound to be the same ones that differ in conformation between transforming and nontransforming p21 mutant proteins neither of which binds nucleotide. The two oncogenic forms have similar conformations in their carboxyl-terminal domains, but differ in conformation at residues 32–47 and 55–74. The former region is known to be involved in the interaction with at least three downstream effector target proteins. Thus, differences in structure between the two oncogenic proteins may reflect different relative affinities of each oncogenic protein for each of these effector targets. The latter region, 55–74, is known to be a highly mobile segment of the protein. The results strongly suggest that critical oncogenic amino acid substitutions in the p21 protein cause changes in the structures of vital domains of this protein.  相似文献   

18.
A review of the structural properties of the photosystem II chlorophyll binding proteins, CP47 and CP43, is given and a model of the transmembrane helical domains of CP47 has been constructed. The model is based on (i) the amino acid sequence of the spinach protein, (ii) an 8 A three-dimensional electron density map derived from electron crystallography and (iii) the structural homology which the membrane spanning region of CP47 shares with the six N-terminal transmembrane helices of the PsaA/PsaB proteins of photosystem I. Particular emphasis has been placed on the position of chlorophyll molecules assigned in the 8 A three-dimensional map of CP47 (K.-H. Rhee, E.P. Morris, J. Barber, W. Kühlbrandt, Nature 396 (1998) 283-286) relative to histidine residues located in the transmembrane regions of this protein which are likely to form axial ligands for chlorophyll binding. Of the 14 densities assigned to chlorophyll, the model predicted that five have their magnesium ions within 4 A of the imidazole nitrogens of histidine residues. For the remaining seven histidine residues the densities attributed to chlorophylls were within 4-8 A of the imidazole nitrogens and thus too far apart for direct ligation with the magnesium ion within the tetrapyrrole head group. Improved structural resolution and reconsiderations of the orientation of the porphyrin rings will allow further refinement of the model.  相似文献   

19.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

20.
Intrinsically disordered proteins (IDPs) lack a well-defined three-dimensional structure under physiological conditions. Intrinsic disorder is a common phenomenon, particularly in multicellular eukaryotes, and is responsible for important protein functions including regulation and signaling. Many disease-related proteins are likely to be intrinsically disordered or to have disordered regions. In this paper, a new predictor model based on the Bayesian classification methodology is introduced to predict for a given protein or protein region if it is intrinsically disordered or ordered using only its primary sequence. The method allows to incorporate length-dependent amino acid compositional differences of disordered regions by including separate statistical representations for short, middle and long disordered regions. The predictor was trained on the constructed data set of protein regions with known structural properties. In a Jack-knife test, the predictor achieved the sensitivity of 89.2% for disordered and 81.4% for ordered regions. Our method outperformed several reported predictors when evaluated on the previously published data set of Prilusky et al. [2005. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 21 (16), 3435-3438]. Further strength of our approach is the ease of implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号