首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Relationship of protein flexibility to thermostability   总被引:11,自引:0,他引:11  
Thermostability of proteins arises from the simultaneous effect of several forces, which in fact lead to decreased flexibility of the polypeptide chain. This is verified by flexibility indices, which are derived from normalized B-values of individual amino acids in several refined three-dimensional structures. Flexibility indices show that overall flexibility is reduced when thermostability is increased. Protein molecules require both flexibility and rigidity to function, but the higher the temperature optimum and stability the more rigid is the structure needed to compensate for increased thermal fluctuations. Flexibilities of proteins performing the same catalytic activity seem to be about the same at their temperature optima, but the more rigid thermostable proteins reach the flexibility of thermolabile proteins at higher temperatures. In several proteins such as allosteric enzymes, some local sites of flexibility are highly conserved. The relevance of reduced flexibility to overall stability of proteins is also discussed. Flexibility indices and profiles can be used in the design of more stable proteins by site-directed mutagenesis.  相似文献   

2.
Protein molecules exhibit varying degrees of flexibility throughout their three-dimensional structures, with some segments showing little mobility while others may be so disordered as to be unresolvable by techniques such as X-ray crystallography. Atomic displacement parameters, or B-factors, from X-ray crystallographic studies give an experimentally determined indication of the degree of mobility in a protein structure. To provide better estimators of amino acid flexibility, we have examined B-factors from a large set of high-resolution crystal structures. Because of the differences among structures, it is necessary to normalize the B-factors. However, many proteins have segments of unusually high mobility, which must be accounted for before normalization can be performed. Accordingly, a median-based method from quality control studies was used to identify outliers. After removal of outliers from, and normalization of, each protein chain, the B-factors were collected for each amino acid in the set. It was found that the distribution of normalized B-factors followed a Gumbel, or extreme value distribution, and the location parameter, or mode, of this distribution was used as an estimator of flexibility for the amino acid. These new parameters have a higher correlation with experimentally determined B-factors than parameters from earlier methods.  相似文献   

3.
We suggest an algorithm that inputs a protein sequence and outputs a decomposition of the protein chain into a regular part including secondary structures and a nonregular part corresponding to loop regions. We have analyzed loop regions in a protein dataset of 3,769 globular domains and defined the optimal parameters for this prediction: the threshold between regular and nonregular regions and the optimal window size for averaging procedures using the scale of the expected number of contacts in a globular state and entropy scale as the number of degrees of freedom for the angles phi, psi, and chi for each amino acid. Comparison with known methods demonstrates that our method gives the same results as the well-known ALB method based on physical properties of amino acids (the percentage of true predictions is 64% against 66%), and worse prediction for regular and nonregular regions than PSIPRED (Protein Structure Prediction Server) without alignment of homologous proteins (the percentage of true predictions is 73%). The potential advantage of the suggested approach is that the predicted set of loops can be used to find patterns of rigid and flexible loops as possible candidates to play a structure/function role as well as a role of antigenic determinants.  相似文献   

4.
Hua Zhang  Lukasz Kurgan 《Amino acids》2014,46(12):2665-2680
Knowledge of protein flexibility is vital for deciphering the corresponding functional mechanisms. This knowledge would help, for instance, in improving computational drug design and refinement in homology-based modeling. We propose a new predictor of the residue flexibility, which is expressed by B-factors, from protein chains that use local (in the chain) predicted (or native) relative solvent accessibility (RSA) and custom-derived amino acid (AA) alphabets. Our predictor is implemented as a two-stage linear regression model that uses RSA-based space in a local sequence window in the first stage and a reduced AA pair-based space in the second stage as the inputs. This method is easy to comprehend explicit linear form in both stages. Particle swarm optimization was used to find an optimal reduced AA alphabet to simplify the input space and improve the prediction performance. The average correlation coefficients between the native and predicted B-factors measured on a large benchmark dataset are improved from 0.65 to 0.67 when using the native RSA values and from 0.55 to 0.57 when using the predicted RSA values. Blind tests that were performed on two independent datasets show consistent improvements in the average correlation coefficients by a modest value of 0.02 for both native and predicted RSA-based predictions.  相似文献   

5.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.  相似文献   

6.
7.
Schlessinger A  Rost B 《Proteins》2005,61(1):115-126
Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B-values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large-scale analysis of B-values. We used this analysis to develop a neural network-based method that predicts flexible-rigid residues from amino acid sequence. The system uses both global and local information (i.e., features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence-consecutive residues). The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to 4 different case studies, each of which related our predictions to aspects of function. The first 2 were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions, the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly low B-values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those 4 case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function.  相似文献   

8.
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc.  相似文献   

9.
Chen H  Kihara D 《Proteins》2008,71(3):1255-1274
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.  相似文献   

10.
Fulle S  Gohlke H 《Biophysical journal》2008,94(11):4202-4219
RNA requires conformational dynamics to undergo its diverse functional roles. Here, a new topological network representation of RNA structures is presented that allows analyzing RNA flexibility/rigidity based on constraint counting. The method extends the FIRST approach, which identifies flexible and rigid regions in atomic detail in a single, static, three-dimensional molecular framework. Initially, the network rigidity of a canonical A-form RNA is analyzed by counting on constraints of network elements of increasing size. These considerations demonstrate that it is the inclusion of hydrophobic contacts into the RNA topological network that is crucial for an accurate flexibility prediction. The counting also explains why a protein-based parameterization results in overly rigid RNA structures. The new network representation is then validated on a tRNAASP structure and all NMR-derived ensembles of RNA structures currently available in the Protein Data Bank (with chain length ≥40). The flexibility predictions demonstrate good agreement with experimental mobility data, and the results are superior compared to predictions based on two previously used network representations. Encouragingly, this holds for flexibility predictions as well as mobility predictions obtained by constrained geometric simulations on these networks. Potential applications of the approach to analyzing the flexibility of DNA and RNA/protein complexes are discussed.  相似文献   

11.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

12.
13.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

14.
Chen H  Zhou HX 《Proteins》2005,61(1):21-35
The number of structures of protein-protein complexes deposited to the Protein Data Bank is growing rapidly. These structures embed important information for predicting structures of new protein complexes. This motivated us to develop the PPISP method for predicting interface residues in protein-protein complexes. In PPISP, sequence profiles and solvent accessibility of spatially neighboring surface residues were used as input to a neural network. The network was trained on native interface residues collected from the Protein Data Bank. The prediction accuracy at the time was 70% with 47% coverage of native interface residues. Now we have extensively improved PPISP. The training set now consisted of 1156 nonhomologous protein chains. Test on a set of 100 nonhomologous protein chains showed that the prediction accuracy is now increased to 80% with 51% coverage. To solve the problem of over-prediction and under-prediction associated with individual neural network models, we developed a consensus method that combines predictions from multiple models with different levels of accuracy and coverage. Applied on a benchmark set of 68 proteins for protein-protein docking, the consensus approach outperformed the best individual models by 3-8 percentage points in accuracy. To demonstrate the predictive power of cons-PPISP, eight complex-forming proteins with interfaces characterized by NMR were tested. These proteins are nonhomologous to the training set and have a total of 144 interface residues identified by chemical shift perturbation. cons-PPISP predicted 174 interface residues with 69% accuracy and 47% coverage and promises to complement experimental techniques in characterizing protein-protein interfaces. .  相似文献   

15.
The linear IgE-binding epitopes of non-specific lipid transfer proteins (nsLTP) from plants were predicted using a combination of predictive tools including (1) the hydropathic profiles based on different scales of hydrophilicity, flexibility and exposure to the solvent, (2) the hydrophobic cluster analysis plots, (3) the occurrence of charged residues in the predicted amino acid sequence stretches and, (4) the exposition of the predicted linear IgE-binding epitopes checked on the three-dimensional models built for the nsLTP. A reliable prediction was obtained for nsLTP as compared with the previously characterized IgE-binding epitopes of various proteins. A consensual IgE-binding epitope occurring in other plant nsLTP and responsible for some IgE-binding cross-reactivity among fruit nsLTP has been identified and characterized. Despite some discrepancies, a fairly good prediction resulted in applying our combination of predictive methods to longer nsLTP or plant profilins.  相似文献   

16.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

17.
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.  相似文献   

18.
Qiu J  Sheffler W  Baker D  Noble WS 《Proteins》2008,71(3):1175-1182
Protein structure prediction is an important problem of both intellectual and practical interest. Most protein structure prediction approaches generate multiple candidate models first, and then use a scoring function to select the best model among these candidates. In this work, we develop a scoring function using support vector regression (SVR). Both consensus-based features and features from individual structures are extracted from a training data set containing native protein structures and predicted structural models submitted to CASP5 and CASP6. The SVR learns a scoring function that is a linear combination of these features. We test this scoring function on two data sets. First, when used to rank server models submitted to CASP7, the SVR score selects predictions that are comparable to the best performing server in CASP7, Zhang-Server, and significantly better than all the other servers. Even if the SVR score is not allowed to select Zhang-Server models, the SVR score still selects predictions that are significantly better than all the other servers. In addition, the SVR is able to select significantly better models and yield significantly better Pearson correlation coefficients than the two best Quality Assessment groups in CASP7, QA556 (LEE), and QA634 (Pcons). Second, this work aims to improve the ability of the Robetta server to select best models, and hence we evaluate the performance of the SVR score on ranking the Robetta server template-based models for the CASP7 targets. The SVR selects significantly better models than the Robetta K*Sync consensus alignment score.  相似文献   

19.
20.
目的预测金黄色葡萄球菌肠毒素A蛋白(SEA)的B细胞表位。方法以金黄色葡萄球菌合肥乳源分离株M3基因组DNA为模板,PCR扩增SEA基因并进行序列测定与分析。应用DNAstar protean软件对SEA蛋白的二级结构、柔性、亲水性、表面可能性和抗原指数等多参数进行综合分析,预测其B细胞表位。结果M3分离株的SEA基因全长774bp,编码由257个氨基酸组成的相对分子量为29.67kDa的SEA蛋白,M3分离株SEA基因与标准株的核苷酸序列与氨基酸序列同源性分别为98.7%和98.4%。SEA蛋白的优势B细胞表位位于肽链的第64—68、100~107、138—141、156—160、166~173、213~217和237~244区段。结论预测出SEA蛋白的7个优势B细胞表位,为进而克隆表达表位蛋白,制备针对SEA表位的单克隆抗体奠定了基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号