共查询到20条相似文献,搜索用时 15 毫秒
1.
A number of methods to predicting the folding type of a protein based on its amino acid composition have been developed during the past few years. In order to perform an objective and fair comparison of different prediction methods, a Monte Carlo simulation method was proposed to calculate the asymptotic limit of the prediction accuracy [Zhang and Chou (1992),Biophys. J.
63, 1523–1529, referred to as simulation method I]. However, simulation method I was based on an oversimplified assumption, i.e., there are no correlations between the compositions of different amino acids. By taking into account such correlations, a new method, referred to as simulation method II, has been proposed to recalculate the objective accuracy of prediction for the least Euclidean distance method [Nakashimaet al. (1986),J. Biochem.
99, 152–162] and the least Minkowski distance method [Chou (1989),Prediction in Protein Structure and the Principles of Protein Conformation, Plenum Press, New York, pp. 549–586], respectively. The results show that the prediction accuracy of the former is still better than that of the latter, as found by simulation method I; however, after incorporating the correlative effect, the objective prediction accuracies become lower for both methods. The reason for this phenomenon is discussed in detail. The simulation method and the idea developed in this paper can be applied to examine any other statistical prediction method, including the computersimulated neural network method. 相似文献
2.
3.
A novel approach to predicting protein structural classes in a (20–1)-D amino acid composition space
Kuo-Chen Chou 《Proteins》1995,21(4):319-344
The development of prediction methods based on statistical theory generally consists of two parts: one is focused on the exploration of new algorithms, and the other on the improvement of a training database. The current study is devoted to improving the prediction of protein structural classes from both of the two aspects. To explore a new algorithm, a method has been developed that makes allowance for taking into account the coupling effect among different amino acid components of a protein by a covariance matrix. To improve the training database, the selection of proteins is carried out so that they have (1) as many non-homologous structures as possible, and (2) a good quality of structure. Thus, 129 representative proteins are selected. They are classified into 30 α, 30 β, 30 α + β, 30 α/β, and 9 ζ (irregular) proteins according to a new criterion that better reflects the feature of the structural classes concerned. The average accuracy of prediction by the current method for the 4 × 30 regular proteins is 99.2%, and that for 64 independent testing proteins not included in the training database is 95.3%. To further validate its efficiency, a jackknife analysis has been performed for the current method as well as the previous ones, and the results are also much in favor of the current method. To complete the mathematical basis, a theorem is presented and proved in Appendix A that is instructive for understanding the novel method at a deeper level. © 1995 Wiley-Liss, Inc. 相似文献
4.
An optimization approach to predicting protein structural class from amino acid composition. 总被引:11,自引:0,他引:11 下载免费PDF全文
Proteins are generally classified into four structural classes: all-alpha proteins, all-beta proteins, alpha + beta proteins, and alpha/beta proteins. In this article, a protein is expressed as a vector of 20-dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so-called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all-alpha proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all-alpha proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes. 相似文献
5.
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long‐range and short‐range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci‐bioinfo.cn/swfrate/input.jsp . Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献
6.
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few. 相似文献
7.
The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds ≤1.8Å, ≤2.0Å, ≤2.5Å, and ≤3.0Å. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (≤3.0 Å, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 ÷ 11.8% for the 3.0 Å dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 ÷ 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service. © 1996 Wiley-Liss, Inc. 相似文献
8.
Folding type-specific secondary structure propensities of 20 naturally occurring amino acids have been derived from α-helical, β-sheet, α/β, and α+β proteins of known structures. These data show that each residue type of amino acids has intrinsic propensities in different regions of secondary structures for different folding types of proteins. Each of the folding types shows markedly different rank ordering, indicating folding type-specific effects on the secondary structure propensities of amino acids. Rigorous statistical tests have been made to validate the folding type-specific effects. It should be noted that α and β proteins have relatively small α-helices and β-strands forming propensities respectively compared with those of α+β and α/β proteins. This may suggest that, with more complex architectures than α and β proteins, α+β and α/β proteins require larger propensities to distinguish from interacting α-helices and β-strands. Our finding of folding type-specific secondary structure propensities suggests that sequence space accessible to each folding type may have differing features. Differing sequence space features might be constrained by topological requirement for each of the folding types. Almost all strong β-sheet forming residues are hydrophobic in character regardless of folding types, thus suggesting the hydrophobicities of side chains as a key determinant of β-sheet structures. In contrast, conformational entropy of side chains is a major determinant of the helical propensities of amino acids, although other interactions such as hydrophobicities and charged interactions cannot be neglected. These results will be helpful to protein design, class-based secondary structure prediction, and protein folding. © 1998 John Wiley & Sons, Inc. Biopoly 45: 35–49, 1998 相似文献
9.
The presence of non-native kinetic traps in the free energy landscape of a protein may significantly lengthen the overall folding time so that the folding process becomes unreliable. We use a computational model alpha-helical hairpin peptide to calculate structural free energy landscapes and relate them to the kinetics of folding. We show how protein engineering through strategic changes in only a few amino acid residues along the primary sequence can greatly increase the speed and reliability of the folding process, as seen experimentally. These strategic substitutions also prevent the formation of long-lived misfolded configurations that can cause unwanted aggregations of peptides. These results support arguments that removal of kinetic traps, obligatory or nonobligatory, is crucial for fast folding. 相似文献
10.
We have performed molecular dynamics (MD) simulations to study the dimerization, folding, and binding to a protein of peptides containing an unnatural amino acid. NMR studies have shown that the substitution of one residue in a tripeptide beta-strand by the unnatural amino acid Hao (5-HO2CCONH-2-MeO-C6H3-CO-NHNH2) modifies the conformational flexibility of the beta-strand and the hydrogen-bonding properties of its two edges: The number of hydrogen-bond donors and acceptors increases at one edge, whereas at the other, they are sterically hindered. In simulations in chloroform, the Hao-containing peptide 9 (i-PrCO-Phe-Hao-Val-NHBu) forms a beta-sheet-like hydrogen-bonded dimer, in good agreement with the available experimental data. Addition of methanol to the solution induces instability of this beta-sheet, as confirmed by the experiments. MD simulations also reproduce the folding of the synthetic peptide 1a (i-PrCO-Hao-Ut-Phe-Ile-Leu-NHMe) into a beta-hairpin-like structure in chloroform. Finally, the Hao-containing peptide, Ac-Ala-Hao-Ala-NHMe, is shown to form a stable complex with the Ras analogue, Rap1 A, in water at room temperature. Together with the available experimental data, these simulation studies indicate that Hao-containing peptides may serve as inhibitors of beta-sheet interactions between proteins. 相似文献
11.
The secondary structure of DnaA protein and its interaction with DNA and ribonucleotides has been predicted using biochemical, biophysical techniques, and prediction methods based on multiple-sequence alignment and neural networks. The core of all proteins from the DnaA family consists of an “open twisted α/β structure,” containing five α-helices alternating with five β-strands. In our proposed structural model the interior of the core is formed by a parallel β-sheet, whereas the α-helices are arranged on the surface of the core. The ATP-binding motif is located within the core, in a loop region following the first β-strand. The N-terminal domain (80 aa) is composed of two α-helices, the first of which contains a potential leucine zipper motif for mediating protein-protein interaction, followed by a β-strand and an additional α-helix. The N-terminal domain and the α/β core region of DnaA are connected by a variable loop (45–70 aa); major parts of the loop region can be deleted without loss of protein activity. The C-terminal DNA-binding domain (94 aa) is mostly α-helical and contains a potential helix-loop-helix motif. DnaA protein does not dimerize in solution; instead, the two longest C-terminal α-helices could interact with each other, forming an internal “coiled coil” and exposing highly basic residues of a small loop region on the surface, probably responsible for DNA backbone contacts. © 1997 Wiley-Liss Inc. 相似文献
12.
Jian Lei Yan‐Feng Zhou Lan‐Fen Li Xiao‐Dong Su 《Protein science : a publication of the Protein Society》2009,18(8):1792-1800
Bacillus subtilis is one of the most studied gram‐positive bacteria. In this work, YvgN and YtbE from B. subtilis, assigned as AKR5G1 and AKR5G2 of aldo‐keto reductase (AKR) superfamily. AKR catalyzes the NADPH‐dependent reduction of aldehyde or aldose substrates to alcohols. YvgN and YtbE were studied by crystallographic and enzymatic analyses. The apo structures of these proteins were determined by molecular replacement, and the structure of holoenzyme YvgN with NADPH was also solved, revealing the conformational changes upon cofactor binding. Our biochemical data suggest both YvgN and YtbE have preferential specificity for derivatives of benzaldehyde, such as nitryl or halogen group substitution at the 2 or 4 positions. These proteins also showed broad catalytic activity on many standard substrates of AKR, such as glyoxal, dihydroxyacetone, and DL‐glyceraldehyde, suggesting a possible role in bacterial detoxification. 相似文献
13.
A major bottleneck in the field of biochemistry is our limited understanding of the processes by which a protein folds into its native conformation. Much of the work on this issue has focused on the conserved core of the folded protein. However, one might imagine that a ubiquitous motif for unaided folding or for the recognition of chaperones may involve regions on the surface of the native structure. We explore this possibility by an analysis of the spatial distribution of regions with amphiphilic α-helical potential on the surface of β-sheet proteins. All proteins, Including β-sheet proteins, contain regions with amphiphilic α-helical potential. That is, any α-helix formed by that region would be amphiphilic, having both hydrophobic and hydrophilic surfaces. In the three-dimensional structure of all β-sheet proteins analyzed, we have found a distinct pattern in the spatial distribution of sequences with amphiphilic α-helical potential. The amphiphilic regions occur in ring shaped clusters approximately 20 to 30 Å in diameter on the surface of the protein. In addition, these regions have a strong preference for positively charged amino acids and a lower preference for residues not favorable to α-helix formation. Although the purpose of these amphiphilic regions which are not associated with naturally occurring α-helix is unknown, they may play a critical role in highly conserved processes such as protein folding. © 1996 Wiley-Liss, Inc. 相似文献
14.
Cheom Gil Cheong Soo Hyun Eom Changsoo Chang Dong Hae Shin Hyun Kyu Song Kyeongsik Min Jin Ho Moon Kyeong Kyu Kim Kwang Yeon Hwang Se Won Suh 《Proteins》1995,21(2):105-117
Sweet potato β-amylase is a tetramer of identical subunits, which are arranged to exhibit 222 molecular symmetry. Its subunit consists of 498 amino acid residues (Mr 55,880). It has been crystallized at room temperature using polyethylene glycol 1500 as precipitant. The crystals, growing to dimensions of 0.4 mm × 0.4 mm × 1.0 mm within 2 weeks, belong to the tetragonal space group P42212 with unit cell dimensions of a = b = 129.63 Å and c = 68.42 Å. The asymmetric unit contains 1 subunit of β-amylase, with a crystal volume per protein mass (VM) of 2.57 Å3/Da and a solvent content of 52% by volume. The three-dimensional structure of the tetrameric β-amylase from sweet potato has been determined by molecular replacement methods using the monomeric structure of soybean enzyme as the starting model. The refined subunit model contains 3,863 nonhydrogen protein atoms (488 amino acid residues) and 319 water oxygen atoms. The current R-value is 20.3% for data in the resolution range of 8–2.3 Å (with 2 σ cut-off) with good stereochemistry. The subunit structure of sweet potato β-amylase (crystallized in the absence of α-cyclodextrin) is very similar to that of soybean β-amylase (complexed with α-cyclodextrin). The root-mean-square (RMS) difference for 487 equivalent Cα atoms of the two β-amylases is 0.96 Å. Each subunit of sweet potato β-amylase is composed of a large (α/β)8 core domain, a small one made up of three long loops [L3 (residues 91–150), LA (residues 183–258), and L5 (residues 300–327)], and a long C-terminal loop formed by residues 445–493. Conserved Glu 187, believed to play an important role in catalysis, is located at the cleft between the (α/β)8 barrel core and a small domain made up of three long loops (L3, L4, and L5). Conserved Cys 96, important in the inactivation of enzyme activity by sulfhydryl reagents, is located at the entrance of the (α/β)8 barrel. © 1995 Wiley-Liss, Inc. 相似文献
15.
Packing and hydrophobicity effects on protein folding and stability: effects of beta-branched amino acids, valine and isoleucine, on the formation and stability of two-stranded alpha-helical coiled coils/leucine zippers. 总被引:2,自引:7,他引:2 下载免费PDF全文
B. Y. Zhu N. E. Zhou C. M. Kay R. S. Hodges 《Protein science : a publication of the Protein Society》1993,2(3):383-394
The aim of this study was to examine the differences between hydrophobicity and packing effects in specifying the three-dimensional structure and stability of proteins when mutating hydrophobes in the hydrophobic core. In DNA-binding proteins (leucine zippers), Leu residues are conserved at positions "d," and beta-branched amino acids, Ile and Val, often occur at positions "a" in the hydrophobic core. In order to discern what effect this selective distribution of hydrophobes has on the formation and stability of two-stranded alpha-helical coiled coils/leucine zippers, three Val or three Ile residues were simultaneously substituted for Leu at either positions "a" (9, 16, and 23) or "d" (12, 19, and 26) in both chains of a model coiled coil. The stability of the resulting coiled coils was monitored by CD in the presence of Gdn.HCl. The results of the mutations of Ile to Val at either positions "a" or "d" in the reduced or oxidized coiled coils showed a significant hydrophobic effect with the additional methylene group in Ile stabilizing the coiled coil (delta delta G values range from 0.45 to 0.88 kcal/mol/mutation). The results of mutations of Leu to Ile or Val at positions "a" in the reduced or oxidized coiled coils showed a significant packing effect in stabilizing the coiled coil (delta delta G values range from 0.59 to 1.03 kcal/mol/mutation). Our results also indicate the subtle control hydrophobic packing can have not only on protein stability but on the conformation adopted by the amphipathic alpha-helices. These structural findings correlate with the observation that in DNA-binding proteins, the conserved Leu residues at positions "d" are generally less tolerant of amino acid substitutions than the hydrophobic residues at positions "a." 相似文献
16.
Gheorghe Benga Victor Ioan Pop Octavian Popescu Ileana Benga William Ferdinand 《Bioscience reports》1991,11(1):53-57
Amino acid analyses of the band 3 protein purified from erythrocyte membranes of control and epileptic children showed that no major structural abnormalities of this protein could be linked with the red blood cell membrane alterations previously described in child epilepsy and, consequently, the molecular basis of these alterations should be looked for elsewhere. 相似文献
17.
The labyrinthopeptins are a new class of lantibiotics containing two identical quaternary α,α‐disubstituted amino acids, named labionin (Lab). The synthetic formation of this unique structural feature represents the key step in the total synthesis of these polycyclic peptides. In this report we describe the synthesis of an orthogonally protected α,α‐disubstituted amino acid building block serving as labionin precursor for the future assembly of labyrinthopeptin A2 and of other labyrinthopeptin derivatives. Copyright © 2011 European Peptide Society and John Wiley & Sons, Ltd. 相似文献
18.
Michael Kokkinidis Metaxia Vlassi Yannis Papanikolaou Dina Kotsifaki Adrian Kingswell Demetrius Tsernoglou Hans-Juuml;rgen Hinz 《Proteins》1993,16(2):214-216
Six variants of the ROP protein, designed with the aim to analyze by X-ray crystallography loop formation and core packing interactions in 4-α-helical bundles- have been purified and a search of their crystallization conditions has been carried out. Five mutants yield crystals that are suitable for medium to high resolutionX-ray diffraction studies. For all mutants crystal size- sensitivity to X-irradiation and diffraction limit are correlated to their stability as determined by differential scanning calorimetry- in a manner which is not yet understood in detail. © Wiley-Liss, Inc. 相似文献
19.
M. H. P. W. Visker B. W. Dibbits S. M. Kinders H. J. F. van Valenberg J. A. M. van Arendonk H. Bovenhuis 《Animal genetics》2011,42(2):212-218
The aim of this study was to detect new polymorphisms in the bovine β‐casein (β‐CN) gene and to evaluate association of (new) β‐CN protein variants with milk production traits and milk protein composition. Screening of the β‐CN gene in genomic DNA from 72 Holstein Friesian (HF) bulls resulted in detection of 19 polymorphisms and revealed the presence of β‐CN protein variant I in the Dutch HF population. Studies of association of β‐CN protein variants with milk composition usually do not discriminate protein variant I from variant A2. Association of β‐CN protein variants with milk composition was studied in 1857 first‐lactation HF cows and showed that associations of protein variants A2 and I were quite different for several traits. β‐CN protein variant I was significantly associated with protein percentage and protein yield, and with αs1‐casein (αs1‐CN), αs2‐casein (αs2‐CN), κ‐casein (κ‐CN), α‐lactalbumin (α‐LA), β‐lactoglobulin (β‐LG), casein index and casein yield. Inferring β‐κ‐CN haplotypes showed that β‐CN protein variant I occurred only with κ‐CN variant B. Consequently, associations of β‐κ‐CN haplotype IB with protein percentage, κ‐CN, α‐LA, β‐LG and casein index are likely resulting from associations of κ‐CN protein variant B, while associations of β‐κ‐CN haplotype IB with αs1‐CN and αs2‐CN seem to be resulting from associations of β‐CN variant I. 相似文献
20.
Protein folding has been studied extensively for decades, yet our ability to predict how proteins reach their native state from a mechanistic perspective is still rudimentary at best, limiting our understanding of folding‐related processes in vivo and our ability to manipulate proteins in vitro. Here, we investigate the in vitro refolding mechanism of a large β‐helix protein, pertactin, which has an extended, elongated shape. At 55 kDa, this single domain, all‐β‐sheet protein allows detailed analysis of the formation of β‐sheet structure in larger proteins. Using a combination of fluorescence and far‐UV circular dichroism spectroscopy, we show that the pertactin β‐helix refolds remarkably slowly, with multiexponential kinetics. Surprisingly, despite the slow refolding rates, large size, and β‐sheet‐rich topology, pertactin refolding is reversible and not complicated by off‐pathway aggregation. The slow pertactin refolding rate is not limited by proline isomerization, and 30% of secondary structure formation occurs within the rate‐limiting step. Furthermore, site‐specific labeling experiments indicate that the β‐helix refolds in a multistep but concerted process involving the entire protein, rather than via initial formation of the stable core substructure observed in equilibrium titrations. Hence pertactin provides a valuable system for studying the refolding properties of larger, β‐sheet‐rich proteins, and raises intriguing questions regarding the prevention of aggregation during the prolonged population of partially folded, β‐sheet‐rich refolding intermediates. Proteins 2010. © 2009 Wiley‐Liss, Inc. 相似文献