首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Approaching a complete classification of protein secondary structure   总被引:2,自引:0,他引:2  
A complete classification of types of the protein secondary structure is developed on the basis of computer analysis of the crystallographic structural data deposited in the protein Data Bank. The majority of amino acid residues fall into five conformation types. A conclusion is drawn that the number of sequence variants of torsion angles phi, psi in globular proteins is limited and is essentially less than the number of possible amino acid sequences for this chain length. Along with alpha-helix and beta-structure, the distribution analysis assigning every maximum of distribution of amino acid conformations on Ramachandran map to a certain type of the secondary structure exposed a third type of the secondary structure that was previously neglected. This type of the structure is extended left-handed helical conformation, designated as mobile (M-) conformation. A full set of M-conformation fragments that seems to play a major role in protein globule dynamics has been obtained, a small radius of correlation for the polypeptide chain in M-conformation is demonstrated. It explains a prevalence of short segments of mobile conformation revealed in globular proteins. For secondary structure types, the frequency of occurrence of amino acid residues has been computed.  相似文献   

2.
In this study we present an accurate secondary structure prediction procedure by using a query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable. Proteins 27:329–335, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

3.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

4.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

5.
Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.  相似文献   

6.
Paramagnetic relaxation has been used to monitor the formation of structure in the folding peptide chain of guanidinium chloride-denatured acyl-coenzyme A-binding protein. The spin label (1-oxyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl)methanesulfonate (MTSL) was covalently bound to a single cysteine residue introduced into five different positions in the amino acid sequence. It was shown that the formation of structure in the folding peptide chain at conditions where 95% of the sample is unfolded brings the relaxation probe close to a wide range of residues in the peptide chain, which are not affected in the native folded structure. It is suggested that the experiment is recording the formation of many discrete and transient structures in the polypeptide chain in the preface of protein folding. Analysis of secondary chemical shifts shows a high propensity for alpha-helix formation in the C-terminal part of the polypeptide chain, which forms an alpha-helix in the native structure and a high propensity for turn formation in two regions of the polypeptide that form turns in the native structure. The results contribute to the idea that native-like structural elements form transiently in the unfolded state, and that these may be of importance to the initiation of protein folding.  相似文献   

7.
The secondary and tertiary structure of T4 bacteriophage dihydrofolate reductase is investigated by vacuum ultraviolet circular dichroism (CD) spectroscopy and probability analysis of the primary amino acid sequence. The far ultraviolet CD spectrum of the enzyme in the range of 260-178 nm is analyzed by the generalized inverse and variable selection methods developed by our laboratory. Variable selection yields an average content of 26% alpha-helix, 21% antiparallel beta-sheet, 10% parallel beta-sheet, 20% beta-turns, and 32% "other" structures within the T4 protein. The characteristic peaks of the CD spectrum indicate that the enzyme has a lot of antiparallel beta-sheet, which is typical of the alpha + beta tertiary class of globular proteins. The secondary structure of the protein is also analyzed by using four statistical methods on the amino acid sequence. Although the secondary structures predicted by each individual statistical method vary to a considerable extent, the fractions of each structure jointly predicted by a majority of the methods are in excellent agreement with our CD analysis. The alternating arrangement for some segments of alpha-helix and beta-sheet predicted from primary structure to be within the enzyme is characteristic of proteins containing parallel beta-sheet. This supports our conclusion that the protein contains both parallel and antiparallel beta-sheet structures, but finding both types of beta-sheet also means that the protein may have the variation on alpha/beta tertiary structure recently found in EcoRI endonuclease and thymidylate synthase. These observations, in conjunction with other physical properties of the T4 reductase, suggest that the enzyme perhaps shares an evolution in common with the dihydrofolate reductases derived from type I R-plasmids rather than with the host-cell protein.  相似文献   

8.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

9.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.  相似文献   

10.
As a result of statistical analysis of Protein Data Bank a new type of secondary structure was found in globular proteins. It is mobile (M) conformation, characterised by noncooperative hydration and the increased dynamical properties of the chain. Percentage distribution of amino acid residues between the main secondary structure types is 42.7% for alpha-helix, 19.6% for beta-structure and 19.1% for M-conformation. The most frequently occurring amino acids for M-conformation are proline, cysteine and serine. Fragments of mobile conformation seem to play a major part in local and domain dynamics of protein globule.  相似文献   

11.
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein’s function in the cell. Understanding a protein’s secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.  相似文献   

12.
Only a minute fraction of all possible protein sequences can exist in the genomes of all life forms. To explore whether physicochemical constraints or a lack of need causes the paucity of different protein folds, we set out to construct protein libraries without any restriction of topology. We generated different libraries (all alpha-helix, all beta-strand, and alpha-helix plus beta-strand) with an average length of 100 amino acid residues, composed of designed secondary structure modules (alpha-helix, beta-strand, and beta-turn) in various proportions, based primarily on the patterning of polar and nonpolar residues. We wished to explore that part of sequence space that is rich in secondary structure. The analysis of randomly chosen clones from each of the libraries showed that, despite the low sequence homology to known protein sequences, a substantial proportion of the library members containing alpha-helix modules were indeed helical, possess a defined oligomerization state, and showed cooperative chemical unfolding behavior. On the other hand, proteins composed of mainly beta-strand modules tended to form amyloid-like fibrils and were among the least soluble proteins ever reported. We found that a large fraction of members in non-beta-strand-containing protein libraries that are distant from natural proteins in sequence space possess unexpectedly favorable properties. These results reinforce the efficacy of applying binary patterning to design proteins with native-like properties despite lack of restriction in topology. Because of the intrinsic tendency of beta-strand modules to aggregate, their presence requires precise topologic arrangement to prevent fibril formation.  相似文献   

13.
Prediction of the Secondary Structure of Myelin Basic Protein   总被引:14,自引:10,他引:4  
An investigation into the probable secondary structure of the myelin basic protein was carried out by the application of three procedures currently in use to predict the secondary structures of proteins from knowledge of their amino acid sequences. In order to increase the accuracy of the predictions, the amino acid substitutions that occur in the basic protein from different species were incorporated into the predictive algorithms. It was possible to locate regions of probable alpha-helix, beta-structure, beta-turn, and unordered conformation (coil) in the protein. One of the predictive methods introduces a bias into the algorithm to maximize or minimize the amounts of alpha-helix and/or beta-structure present; this made it possible to assess how conditions such as pH and protein concentration or the presence of anionic amphiphilic molecules could influence the protein's secondary structure. The predictions made by the three methods were in reasonably good agreement with one another. They were consistent with experimental data, provided that the stabilizing or destabilizing effects of the environment were taken into account. According to the predictions, the extent of possible alpha-helix and beta-structure formation in the protein s severely restricted by the low frequency and extensive scattering of hydrophobic residues, along with a high frequency and extensive scattering of residues that favor the formation of beta-turns and coils. Neither prolyl residues nor cationic residues per se are responsible for the low content of alpha-helix predicted in the protein. The principal ordered conformation predicted is the beta-turn. Many of the predicted beta-turns overlap extensively, involving in some cases up to 10 residues. In some of these structures it is possible for the peptide backbone to oscillate in a sinusoidal manner, generating a flat, pleated sheetlike structure. Cationic residues located in these structures would appear to be ideally oriented for interaction with lipid phosphate groups located at the cytoplasmic surface of the myelin membrane. An analysis of possible and probable conformations that the triproline sequence could assume questions the popular notion that this sequence produces a hairpin turn in the basic protein.  相似文献   

14.
Wrabl JO  Grishin NV 《Proteins》2005,61(3):523-534
Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.  相似文献   

15.
A suite of FORTRAN programs, PREF, is described for calculating preference functions from the data base of known protein structures and for comparing smoothed profiles of sequence-dependent preferences in proteins of unknown structure. Amino acid preferences for a secondary structure are considered as functions of a sequence environment. Sequence environment of amino acid residue in a protein is defined as an average over some physical, chemical, or statistical property of its primary structure neighbors. The frequency distribution of sequence environments in the data base of soluble protein structures is approximately normal for each amino acid type of known secondary conformation. An analytical expression for the dependence of preferences on sequence environment is obtained after each frequency distribution is replaced by corresponding Gaussian function. The preference for the α-helical conformation increases for each amino acid type with the increase of sequence environment of buried solvent-accessible surface areas. We show that a set of preference functions based on buried surface area is useful for predicting folding motifs in α-class proteins and in integral membrane proteins. The prediction accuracy for helical residues is 79% for 5 integral membrane proteins and 74% for 11 α-class soluble proteins. Most residues found in transmembrane segments of membrane proteins with known α-helical structure are predicted to be indeed in the helical conformation because of very high middle helix preferences. Both extramembrane and transmembrane helices in the photosynthetic reaction center M and L subunits are correctly predicted. We point out in the discussion that our method of conformational preference functions can identify what physical properties of the amino acids are important in the formation of particular secondary structure elements. © 1993 John Wiley & Sons, Inc.  相似文献   

16.
Shestopalov BV 《Tsitologiia》2007,49(7):594-600
One of the possible ways for complete and final solution of the problem of determination of three-dimensional structure of proteins on amino acid sequence is simulation of protein three-dimensional structure formation. The use of the code physics method developed by the author has been suggested to fulfill this task. The simulation of alpha-helix and beta-hairpin formation in water-soluble proteins as a start of realization of the plan is described here. The results of the simulation were compared with the experimental data for 14 proteins of no more than 50 amino acids and therefore with little number of alpha-helices and beta-strands (to meet limits of simulation process) and with secondary structure predictions by the best to data methods of protein secondary structure prediction, PSIpred, PORTER and PROFsec. Secondary structure of the proteins, obtained as a result of the simulation of alpha-helix and beta-hairpin formation using the code physics method, corresponded completely to experimental data while the secondary structure predicted by the PSIpred, PORTER and PROFsec methods differed from these data significantly.  相似文献   

17.
A neural network-based method has been developed for the prediction of beta-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Q(pred), Q(obs), and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published beta-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach.  相似文献   

18.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

19.
Secondary structure prediction from amino acid sequence is a key component of protein structure prediction, with current accuracy at approximately 75%. We analysed two state-of-the-art secondary structure prediction methods, PHD and JPRED, comparing predictions with secondary structure assigned by the algorithms DSSP and STRIDE. The specific focus of our study was alpha-helix N-termini, as empirical free energy scales are available for residue preferences at N-terminal positions. Although these prediction methods perform well in general at predicting the alpha-helical locations and length distributions in proteins, they perform less well at predicting the correct helical termini. For example, although most predicted alpha-helices overlap a real alpha-helix (with relatively few completely missed or extra predicted helices), only one-third of JPRED and PHD predictions correctly identify the N-terminus. Analysis of neighbouring N-terminal sequences to predicted helical N-termini shows that the correct N-terminus is often within one or two residues. More importantly, the true N-terminal motif is, on average, more favourable as judged by our experimentally measured free energies. This suggests a simple, but powerful, strategy to improve secondary structure prediction using empirically derived energies to adjust the predicted output to a more favourable N-terminal sequence.  相似文献   

20.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号