共查询到20条相似文献,搜索用时 15 毫秒
1.
Sucha Sudarsanam 《Proteins》1998,30(3):228-231
One of the most important questions in the protein folding problem is whether secondary structures are formed entirely by local interactions. One way to answer this question is to compare identical subsequences of proteins to see if they have identical structures. Such an exercise would also reveal a lower limit on the number of amino acids needed to form unique secondary structures. In this context, we have searched the April 1996 release of the Protein Data Bank for sequentially identical subsequences of proteins and compared their structures. We find that identical octamers can have different conformations. In addition, there are several examples of identical heptamers with different conformations, and the number of identical hexamers with different conformations has increased since the previous PDB releases. These observations imply that secondary structure can be formed entirely by non-local interactions and that an identical match of up to eight amino acids may not imply structural similarity. In addition to the larger context of the protein folding problem, these observations have implications for protein structure prediction methods. Proteins 30:228–231, 1998. © 1998 Wiley-Liss, Inc. 相似文献
2.
In recent years, the protein-folding problem has attracted the attention of molecular biologists. Efforts have focused on developing heuristic and energy-based algorithms to predict the three-dimensional structure of a protein from its amino acid sequence. We have applied a series of heuristic algorithms to the sequence of human growth hormone. A family of five structures which are generically right-handed fourfold alpha-helical bundles are found from an investigation of approximately 10(8) structures. A plausible receptor binding site is suggested. Independent crystallographic analysis confirms some aspects of these predictions. These methods only deal with the "core" structure, and conformations of many residues are not defined. Further work is required to identify a unique set of coordinates and to clarify the topological alternative available to alpha-helical proteins. 相似文献
3.
4.
5.
O. B. Ptitsyn 《Journal of biosciences》1985,8(1-2):1-13
Physical principles determining the protein structure and protein folding are reviewed: (i) the molecular theory of protein
secondary structure and the method of its prediction based on this theory; (ii) the existence of a limited set of thermodynamically
favourable folding patterns of α- and β-regions in a compact globule which does not depend on the details of the amino acid
sequence; (iii) the moderns approaches to the prediction of the folding patterns of α- and β-regions in concrete proteins;
(iv) experimental approaches to the mechanism of protein folding. The review reflects theoretical and experimental works of
the author and his collaborators as well as those of other groups. 相似文献
6.
Retrospective analysis of a secondary structure prediction: the catalytic domain of matrix metalloproteinases.
下载免费PDF全文

E. E. Hodgkin I. C. Gillman R. J. Gilbert 《Protein science : a publication of the Protein Society》1994,3(6):984-986
Secondary structure prediction of the catalytic domain of matrix metalloproteinases is evaluated in the light of recently published experimentally determined structures. The prediction was made by combining conformational propensity, surface probability, and residue conservation calculated for an alignment of 19 sequences. The position of each observed secondary structure element was correctly predicted with a high degree of accuracy, with a single beta-strand falsely predicted. The domain fold was also anticipated from the prediction by analogy with the structural elements found in the distantly related metalloproteinases thermolysin, astacin, and adamalysin. 相似文献
7.
Database of homology-derived protein structures and the structural meaning of sequence alignment 总被引:85,自引:0,他引:85
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology. 相似文献
8.
John-Marc Chandonia Martin Karplus 《Protein science : a publication of the Protein Society》1995,4(2):275-285
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of “jackknife” cross-validation (testing each protein in the database individually). 相似文献
9.
We present heuristic-based predictions of the secondary and tertiary structures of the cyclins A, B, and D, representatives of the cyclin superfamily. The list of suggested constraints for tertiary structure assembly was left unrefined in order to submit this report before an announced crystal structure for cyclin A becomes available. To predict these constraints, a master sequence alignment over 270 positions of cyclin types A, B, and D was adjusted based on individual secondary structure predictions for each type. We used new heuristics for predicting aromatic residues at protein-protein interfaces and to identify sequentially distinct regions in the protein chain that cluster in the folded structure. The boundaries of two conjectured domains in the cyclin fold were predicted based on experimental data in the literature. The domain that is important for interaction of the cyclins with cyclin-dependent kinases (CDKs) is predicted to contain six helices; the second domain in the consensus model contains both helices and a β-sheet that is formed by sequentially distant regions in the protein chain. A plausible phosphorylation site is identified. This work represents a blinded test of the method for prediction of secondary and, to a lesser extent, tertiary structure from a set of homologous protein sequences. Evaluation of our predictions will become possible with the publication of the announced crystal structure. 相似文献
10.
Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein‐structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless‐threading method to generate position‐specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine‐learning‐based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/ . Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献
11.
提出紧结构域的概念,由二级结构序列中一段或几段连续的α螺旋和β折叠构成的空间紧密堆集的最大折叠体称为紧结构域.利用3种紧结构域(α域,β域和α/β域)定义球蛋白的5种结构型:α型蛋白,β型蛋白,α/β型蛋白,多域蛋白和ζ型蛋白.将1 261个代表性的蛋白质(1 022家族)进行分类,并和SCOP库的分类做了比较.进行了删去序列冗余的分析.在此基础上提出结构型的预测方案,成功率在82%~85%. 相似文献
12.
Amin Ahmadi Adl Abbas Nowzari-Dalini Bin Xue Vladimir N. Uversky 《Journal of biomolecular structure & dynamics》2013,31(6):1127-1137
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets. 相似文献
13.
The tertiary structure of the alpha-subunit of tryptophan synthase was proposed using a combination of experimental data and computational methods. The vacuum-ultraviolet circular dichroism spectrum was used to assign the protein to the alpha/beta-class of supersecondary structures. The two-domain structure of the alpha-subunit (Miles et al.: Biochemistry 21:2586, 1982; Beasty and Matthews: Biochemistry 24:3547, 1985) eliminated consideration of a barrel structure and focused attention on a beta-sheet structure. An algorithm (Cohen et al.: Biochemistry 22:4894, 1983) was used to generate a secondary structure prediction that was consistent with the sequence data of the alpha-subunit from five species. Three potential secondary structures were then packed into tertiary structures using other algorithms. The assumption of nearest neighbors from second-site revertant data eliminated 97% of the possible tertiary structures; consideration of conserved hydrophobic packing regions on the beta-sheet eliminated all but one structure. The native structure is predicted to have a parallel beta-sheet flanked on both sides by alpha-helices, and is consistent with the available data on chemical cross-linking, chemical modification, and limited proteolysis. In addition, an active site region containing appropriate residues could be identified as well as an interface for beta 2-subunit association. The ability of experimental data to facilitate the prediction of protein structure is discussed. 相似文献
14.
The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds ≤1.8Å, ≤2.0Å, ≤2.5Å, and ≤3.0Å. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (≤3.0 Å, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 ÷ 11.8% for the 3.0 Å dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 ÷ 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service. © 1996 Wiley-Liss, Inc. 相似文献
15.
Aldehyde dehydrogenases: widespread structural and functional diversity within a shared framework. 总被引:13,自引:3,他引:13
下载免费PDF全文

J. Hempel H. Nicholas R. Lindahl 《Protein science : a publication of the Protein Society》1993,2(11):1890-1900
Sequences of 16 NAD and/or NADP-linked aldehyde oxidoreductases are aligned, including representative examples of all aldehyde dehydrogenase forms with wide substrate preferences as well as additional types with distinct specificities for certain metabolic aldehyde intermediates, particularly semialdehydes, yielding pairwise identities from 15 to 83%. Eleven of 23 invariant residues are glycine and three are proline, indicating evolutionary restraint against alteration of peptide chain-bending points. Additionally, another 66 positions show high conservation of residue type, mostly hydrophobic residues. Ten of these occur in predicted beta-strands, suggesting important interior-packing interactions. A single invariant cysteine residue is found, further supporting its catalytic role. A previously identified essential glutamic acid residue is conserved in all but methyl malonyl semialdehyde dehydrogenase, which may relate to formation by that enzyme of a CoA ester as a product rather than a free carboxylate species. Earlier, similarity to a GXGXXG segment expected in the NAD-binding site was noted from alignments with fewer sequences. The same region continues to be indicated, although now only the first glycine residue is strictly conserved and the second (usually threonine) is not present at all, suggesting greater variance in coenzyme-binding interactions. 相似文献
16.
A systematic study of helix-helix packing in a comprehensive database of protein structures revealed that the side chains inside helix-helix interfaces on average are shorter than those in the noninterface parts of the helices. The study follows our earlier study of this effect in transmembrane helices. The results obtained on the entire database of protein structures are consistent with those obtained on the transmembrane helices. The difference in the length of interface and noninterface side chains is small but statistically significant. It indicates that helices, if viewed along their main axis, statistically are not circular, but have a flattened interface. This effect brings the helices closer to each other and creates a tighter structural packing. The results provide an interesting insight into the aspects of protein structure and folding. 相似文献
17.
The hierarchy of lattice Monte Carlo models described in the accompanying paper (Kolinski, A., Skolnick, J. Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins 18:338–352, 1994) is applied to the simulation of protein folding and the prediction of 3-dimensional structure. Using sequence information alone, three proteins have been successfully folded: the B domain of staphylococcal protein A, a 120 residue, monomeric version of ROP dimer, and crambin. Starting from a random expanded conformation, the model proteins fold along relatively well-defined folding pathways. These involve a collection of early intermediates, which are followed by the final (and rate-determining) transition from compact intermediates closely resembling the molten globule state to the native-like state. The predicted structures are rather unique, with native-like packing of the side chains. The accuracy of the predicted native conformations is better than those obtained in previous folding simulations. The best (but by no means atypical) folds of protein A have a coordinate rms of 2.25 Å from the native Cα trace, and the best coordinate rms from crambin is 3.18 Å. For ROP monomer, the lowest coordinate rms from equivalent Cαs of ROP dimer is 3.65 Å. Thus, for two simple helical proteins and a small α/β protein, the ability to predict protein structure from sequence has been demonstrated. © 1994 John Wiley & Sons, Inc. 相似文献
18.
Macdonald JR Johnson WC 《Protein science : a publication of the Protein Society》2001,10(6):1172-1177
We have investigated amino acid features that determine secondary structure: (1) the solvent accessibility of each side chain, and (2) the interaction of each side chain with others one to four residues apart. Solvent accessibility is a simple model that distinguishes residue environment. The pairwise interactions represent a simple model of local side chain to side chain interactions. To test the importance of these features we developed an algorithm to separate alpha-helices, beta-strands, and \"other\" structure. Single residue and pairwise probabilities were determined for 25,141 samples from proteins with <30% homology. Combining the features of solvent accessibility with pairwise probabilities allows us to distinguish the three structures after cross validation at the 82.0% level. We gain 1.4% to 2.0% accuracy by optimizing the propensities, demonstrating that probabilities do not necessarily reflect propensities. Optimization of residue exposures, weights of all probabilities, and propensities increased accuracy to 84.0%. 相似文献
19.
Background
Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.Results
Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.Conclusions
Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users. 相似文献20.
Marsden RL McGuffin LJ Jones DT 《Protein science : a publication of the Protein Society》2002,11(12):2814-2824
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed. 相似文献