首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Knowledge of amino acid composition, alone, is verified here to be sufficient for recognizing the structural class, α, β, α+β, or α/β of a given protein with an accuracy of 81%. This is supported by results from exhaustive enumerations of all conformations for all sequences of simple, compact lattice models consisting of two types (hydrophobic and polar) of residues. Different compositions exhibit strong affinities for certain folds. Within the limits of validity of the lattice models, two factors appear to determine the choice of particular folds: 1) the coordination numbers of individual sites and 2) the size and geometry of non-bonded clusters. These two properties, collectively termed the distribution of non-bonded contacts, are quantitatively assessed by an eigenvalue analysis of the so-called Kirchhoff or adjacency matrices obtained by considering the non-bonded interactions on a lattice. The analysis permits the identification of conformations that possess the same distribution of non-bonded contacts. Furthermore, some distributions of non-bonded contacts are favored entropically, due to their high degeneracies. Thus, a competition between enthalpic and entropic effects is effective in determining the choice of a distribution for a given composition. Based on these findings, an analysis of non-bonded contacts in protein structures was made. The analysis shows that proteins belonging to the four distinct folding classes exhibit significant differences in their distributions of non-bonded contacts, which more directly explains the success in predicting structural class from amino acid composition. Proteins 29:172–185, 1997. Published 1997 Wiley-Liss, Inc.
  • 1 This article is a US Goverment work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    2.
    Interresidue pair contacts were analyzed in detail for four pairs of protein structures solved using X-ray analysis (X-ray) and nuclear magnetic resonance (NMR). In the four NMR structures, at distances of ≤4.0 Å, the total number of pair contacts was 4–9% lower and, in general, the pair contacts were 0.02–0.16 Å shorter compared to the X-ray structures. Each of the four structural pairs contained 83–94% common pair contacts (CPCs), which were formed by identical residues in both structures; the other 6–17% were longer intrinsic pair contacts (IPCs) formed by different residues in NMR and X-ray structures, while the latter contained more IPC. Every NMR structure contained three types of CPC that were shorter, longer, or equal to the identical contact pairs in the X-ray structure of this protein. Methodologically different short CPCs prevailed at a known distance dependence of the interresidue contact density in 60–61 pairs of NMR/X-ray structures. Among the analyzed four structural pairs, contact shortening appeared upon the energy minimization of the crambin NMR structure and upon solving the ubiquitin, hen lysozyme, and monomeric hemoglobin NMR structures using X-PLOR software with decreased van der Waals atomic radii. The degree of contact shortening in the NMR structures diminished with an increase in the NMR data used to solve these structures. Among the 60 pairs of NMR/X-ray structures, the major difference between α-helical and β-structural proteins in the dependences on interresidue distances of average contact density appeared due to strong α/β differences in the backbone local geometry.  相似文献   

    3.
    Structural trees for large protein superfamilies, such as β proteins with the aligned β sheet packing, β proteins with the orthogonal packing of α helices, two-layer and three-layer α/β proteins, have been constructed. The structural motifs having unique overall folds and a unique handedness are taken as root structures of the trees. The larger protein structures of each superfamily are obtained by a stepwise addition of α helices and/or β strands to the corresponding root motif, taking into account a restricted set of rules inferred from known principles of the protein structure. Among these rules, prohibition of crossing connections, attention to handedness and compactness, and a requirement for α helices to be packed in α-helical layers and β strands in β layers are the most important. Proteins and domains whose structures can be obtained by stepwise addition of α helices and/or β strands to the same root motif can be grouped into one structural class or a superfamily. Proteins and domains found within branches of a structural tree can be grouped into subclasses or subfamilies. Levels of structural similarity between different proteins can easily be observed by visual inspection. Within one branch, protein structures having a higher position in the tree include the structures located lower. Proteins and domains of different branches have the structure located in the branching point as the common fold. Proteins 28:241–260, 1997. © 1997 Wiley-Liss Inc.  相似文献   

    4.
    5.
    Here we present a systematic analysis of accessible surface areas and hydrogen bonds of 2554 globular proteins from four structural classes (all-α, all-β, α/β and α+β proteins) that is aimed to learn in which structural class the accessible surface area increases with increasing protein molecular mass more rapidly than in other classes, and what structural peculiarities are responsible for this effect. The beta structural class of proteins was found to be the leader, with the following possible explanations of this fact. First, in beta structural proteins, the fraction of residues not included in the regular secondary structure is the largest, and second, the accessible surface area of packaged elements of the beta-structure increases more rapidly with increasing molecular mass in comparison with the alpha-structure. Moreover, in the beta structure, the probability of formation of backbone hydrogen bonds is higher than that in the alpha helix for all residues of α+β proteins (the average probability is 0.73±0.01 for the beta-structure and 0.60±0.01 for the alpha-structure without proline) and α/β proteins, except for asparagine, aspartic acid, glycine, threonine, and serine (0.70±0.01 for the beta-structure and 0.60±0.01 for the alpha-structure without the proline residue). There is a linear relationship between the number of hydrogen bonds and the number of amino acid residues in the protein (Number of hydrogen bonds=0.678·number of residues-3.350).  相似文献   

    6.
    We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all α: 4.77 Å, all β: 2.93 Å, α/β: 3.09 Å, α+β: 4.89 Å on average and within 6 Å for 71.41%, 92.85%, 94.29% and 64.28% for all-α, all-β, α/β and α+β, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of α and β folds. We find that α/β proteins with alternating α and β segments (such as the β-barrel) are more stable than proteins in other fold classes.  相似文献   

    7.
    The secondary structure of DnaA protein and its interaction with DNA and ribonucleotides has been predicted using biochemical, biophysical techniques, and prediction methods based on multiple-sequence alignment and neural networks. The core of all proteins from the DnaA family consists of an “open twisted α/β structure,” containing five α-helices alternating with five β-strands. In our proposed structural model the interior of the core is formed by a parallel β-sheet, whereas the α-helices are arranged on the surface of the core. The ATP-binding motif is located within the core, in a loop region following the first β-strand. The N-terminal domain (80 aa) is composed of two α-helices, the first of which contains a potential leucine zipper motif for mediating protein-protein interaction, followed by a β-strand and an additional α-helix. The N-terminal domain and the α/β core region of DnaA are connected by a variable loop (45–70 aa); major parts of the loop region can be deleted without loss of protein activity. The C-terminal DNA-binding domain (94 aa) is mostly α-helical and contains a potential helix-loop-helix motif. DnaA protein does not dimerize in solution; instead, the two longest C-terminal α-helices could interact with each other, forming an internal “coiled coil” and exposing highly basic residues of a small loop region on the surface, probably responsible for DNA backbone contacts. © 1997 Wiley-Liss Inc.  相似文献   

    8.
    One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

    9.
    The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.  相似文献   

    10.
    《Biochimie》2013,95(9):1741-1744
    In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes.  相似文献   

    11.
    Identification and study of the main principles underlying the kinetics and thermodynamics of protein folding generate a new insight into the factors that control this process. Statistical analysis of the radius of gyration for 3769 protein domains of four major classes (α, β, α/β, and α + β) showed that each class has a characteristic radius of gyration that determines the protein structure compactness. For instance, α proteins have the highest radius of gyration throughout the protein size range considered, suggesting a less tight packing as compared with β-and (α + β)-proteins. The lowest radius of gyration and, accordingly, the tightest packing are characteristic of α/β-proteins. The protein radius of gyration normalized by the radius of gyration of a ball with the same volume is independent of the protein size, in contrast to compactness and the number of contacts per residue.  相似文献   

    12.
    Based on available experimental data and using a theoretical model of protein folding, we demonstrate that there is an optimal ratio between the average conformational entropy and the average contact energy per residue for fast protein folding. A statistical analysis of the conformational entropy and the number of contacts per residue for 5829 protein domains from four main classes (α, β, α/β, α+β) shows that each class has its own characteristic average number of contacts per residue and average conformational entropy per residue. These class-specific characteristics determine the protein folding rates: α-proteins are the fastest to fold, β-proteins are the second fastest, α+β-proteins are the third, and α/β-proteins are the last to fold.  相似文献   

    13.
    The three-dimensional solution structure of maize nonspecific lipid transfer protein (nsLTP) obtained by nuclear magnetic resonance (NMR) is compared to the X-ray structure. Although both structures are very similar, some local structural differences are observed in the first and the fourth helices and in several side-chain conformations. These discrepancies arise partly from intermolecular contacts in the crystal lattice. The main characteristic of nsLTP structures is the presence of an internal hydrophobic cavity whose volume was found to vary from 237 to 513 Å3 without major variations in the 15 solution structures. Comparison of crystal and NMR structures shows the existence of another small hollow at the periphery of the protein containing a water molecule in the X-ray structure, which could play an important structural role. A model of the complexed form of maize nsLTP by α-lysopalmitoylphosphatidylcholine was built by docking the lipid inside the protein cavity of the NMR structure. The main structural feature is a hydrogen bond found also in the X-ray structure of the complex maize nsLTP/palmitate between the hydroxyl of Tyr81 and the carbonyl of the lipid. Comparison of 12 primary sequences of nsLTPs emphasizes that all residues delineating the cavities calculated on solution and X-ray structures are conserved, which suggests that this large cavity is a common feature of all compared plant nsLTPs. Furthermore several conserved basic residues seem to be involved in the stabilization of the protein architecture. Proteins 31:160–171, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

    14.
    Rahul Kaushik  Kam Y. J. Zhang 《Proteins》2020,88(10):1271-1284
    The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.  相似文献   

    15.
    Chao Zhang 《Proteins》1998,31(3):299-308
    In this study, we exploited an elementary 2-dimensional square lattice model of HP polymers to test the premise of extracting contact energies from protein structures. Given a set of prespecified energies for H–H, H–P, and P–P contacts, all possible sequences of various lengths were exhaustively enumerated to find sequences that have unique lowest-energy conformations. The lowest-energy structures (or native structures) of such (native) sequences were used to extract contact energies using the Miyazawa-Jernigan procedure and here-defined reference state. The relative magnitudes of the original energies were restored reasonably well, but the extracted contact energies were independent of the absolute magnitudes of the initial energies. We turned to a more detailed characterization of the energy landscapes of the native sequences in light of a new theoretical framework on protein folding. Foldability of such sequences imposes two limits on the absolute value of the prespecified energies: a lower bound entailed by the minimum requirement for thermodynamic stability and an upper bound associated with the entrapment of the chain to local minima. We found that these two limits confine the prespecified energy values to a rather narrow range which, surprisingly, also contains the extracted energies in all the cases examined. These results indicate that the quasi-chemical approximation can be used to connect quantitatively the occurrence of various residue–residue contacts in an ensemble of native structures with the energies of the contacts. More importantly, they suggest that the extracted contact energies do contain information on structural stability and can be used to estimate actual structural energetics. This study also encourages the use of structure-derived contact energies in threading. The finding that there is a rather narrow range of energies that are optimal for folding a sequence also cautions the use of arbitrary energy Hamiltonion in minimal folding models. Proteins 31:299–308, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

    16.
    R. Rajgaria  Y. Wei  C. A. Floudas 《Proteins》2010,78(8):1825-1846
    An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα? Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β‐sheet alignments. These β‐sheet alignments are used as constraints for contacts between residues of β‐sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was ~61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO‐FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

    17.
    Importance of long-range interactions in protein folding   总被引:2,自引:0,他引:2  
    Long-range interactions play an active role in the stability of protein molecules. In this work, we have analyzed the importance of long-range interactions in different structural classes of globular proteins in terms of residue distances. We found that 85% of residues are involved in long-range contacts. The residues occurring in the range of 4-10 residues apart contribute more towards long-range contacts in all-alpha proteins while the range is 11-20 in all-beta proteins. The hydrophobic residues Cys, Ile and Val prefer the 11-20 range and all other residues prefer the 4-10 range. The residues in all-beta proteins have an average of 3-8 long-range contacts whereas the residues in other classes have 1-4 long-range contracts. Furthermore, the preference of residue pairs to the folding and stability will be discussed.  相似文献   

    18.
    Proteins are generally classified into four structural classes: all-alpha proteins, all-beta proteins, alpha + beta proteins, and alpha/beta proteins. In this article, a protein is expressed as a vector of 20-dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so-called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all-alpha proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all-alpha proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes.  相似文献   

    19.
    Recent large-scale data sets of protein complex purifications have provided unprecedented insights into the organization of cellular protein complexes. Several computational methods have been developed to detect co-complexed proteins in these data sets. Their common aim is the identification of biologically relevant protein complexes. However, much less is known about the network of direct physical protein contacts within the detected protein complexes. Therefore, our work investigates whether direct physical contacts can be computationally derived by combining raw data of large-scale protein complex purifications. We assess four established scoring schemes and introduce a new scoring approach that is specifically devised to infer direct physical protein contacts from protein complex purifications. The physical contacts identified by the five methods are comprehensively benchmarked against different reference sets that provide evidence for true physical contacts. Our results show that raw purification data can indeed be exploited to determine high-confidence physical protein contacts within protein complexes. In particular, our new method outperforms competing approaches at discovering physical contacts involving proteins that have been screened multiple times in purification experiments. It also excels in the analysis of recent protein purification screens of molecular chaperones and protein kinases. In contrast to previous findings, we observe that physical contacts inferred from purification experiments of protein complexes can be qualitatively comparable to binary protein interactions measured by experimental high-throughput assays such as yeast two-hybrid. This suggests that computationally derived physical contacts might complement binary protein interaction assays and guide large-scale interactome mapping projects by prioritizing putative physical contacts for further experimental screens.  相似文献   

    20.
    Search and study of the general principles that govern kinetics and thermodynamics of protein folding generate a new insight into the factors controlling this process. Here, based on the known experimental data and using theoretical modeling of protein folding, we demonstrate that there exists an optimal relationship between the average conformational entropy and the average energy of contacts per residue-that is, an entropy capacity-for fast protein folding. Statistical analysis of conformational entropy and number of contacts per residue for 5829 protein structures from four general structural classes (all-alpha, all-beta, alpha/beta, alpha+beta) demonstrates that each class of proteins has its own class-specific average number of contacts (class alpha/beta has the largest number of contacts) and average conformational entropy per residue (class all-alpha has the largest number of rotatable angles phi, psi, and chi per residue). These class-specific features determine the folding rates: alpha proteins are the fastest folding proteins, then follow beta and alpha+beta proteins, and finally alpha/beta proteins are the slowest ones. Our result is in agreement with the experimental folding rates for 60 proteins. This suggests that structural and sequence properties are important determinants of protein folding rates.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号