首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

Background  

Solvent accessibility (ASA) of amino acid residues is often transformed from absolute values of exposed surface area to their normalized relative values. This normalization is typically attained by assuming a highest exposure conformation based on extended state of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest possible (or highest observed) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using context-dependent Highest Observed ASA (HOA) instead of context-free extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.  相似文献   

2.
We analyzed the total, hydrophobic, and hydrophilic accessible surfaces (ASAs) of residues from a nonredundant bank of 587 3D structure proteins. In an extended fold, residues are classified into three families with respect to their hydrophobicity balance. As expected, residues lose part of their solvent-accessible surface with folding but the three groups remain. The decrease of accessibility is more pronounced for hydrophobic than hydrophilic residues. Amazingly, Lysine is the residue with the largest hydrophobic accessible surface in folded structures. Our analysis points out a clear difference between the mean (other studies) and median (this study) ASA values of hydrophobic residues, which should be taken into consideration for future investigations on a protein-accessible surface, in order to improve predictions requiring ASA values. The different secondary structures correspond to different accessibility of residues. Random coils, turns, and beta-structures (outside beta-sheets) are the most accessible folds, with an average of 30% accessibility. The helical residues are about 20% accessible, and the difference between the hydrophobic and the hydrophilic residues illustrates the amphipathy of many helices. Residues from beta-sheets are the most inaccessible to solvent (10% accessible). Hence, beta-sheets are the most appropriate structures to shield the hydrophobic parts of residues from water. We also show that there is an equal balance between the hydrophobic and the hydrophilic accessible surfaces of the 3D protein surfaces irrespective of the protein size. This results in a patchwork surface of hydrophobic and hydrophilic areas, which could be important for protein interactions and/or activity.  相似文献   

3.
We investigate the relationship between the flexibility, expressed with B‐factor, and the relative solvent accessibility (RSA) in the context of local, with respect to the sequence, neighborhood and related concepts such as residue depth. We observe that the flexibility of a given residue is strongly influenced by the solvent accessibility of the adjacent neighbors. The mean normalized B‐factor of the exposed residues with two buried neighbors is smaller than that of the buried residues with two exposed neighbors. Inclusion of RSA of the neighboring residues (local RSA) significantly increases correlation with the B‐factor. Correlation between the local RSA and B‐factor is shown to be stronger than the correlation that considers local distance‐ or volume‐based residue depth. We also found that the correlation coefficients between B‐factor and RSA for the 20 amino acids, called flexibility‐exposure correlation index, are strongly correlated with the stability scale that characterizes the average contributions of each amino acid to the folding stability. Our results reveal that the predicted RSA could be used to distinguish between the disordered and ordered residues and that the inclusion of local predicted RSA values helps providing a better contrast between these two types of residues. Prediction models developed based on local actual RSA and local predicted RSA show similar or better results in the context of B‐factor and disorder predictions when compared with several existing approaches. We validate our models using three case studies, which show that this work provides useful clues for deciphering the structure–flexibility–function relation. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
The solvent accessibility of each residue is predicted on the basis of the protein sequence. A set of 338 monomeric, non-homologous and high-resolution protein crystal structures is used as a learning set and a jackknife procedure is applied to each entry. The prediction is based on the comparison of the observed and the average values of the solvent-accessible area. It appears that the prediction accuracy is significantly improved by considering the residue types preceding and/or following the residue whose accessibility must be predicted. In contrast, the separate treatment of different secondary structural types does not improve the quality of the prediction. It is furthermore shown that the residue accessibility is much better predicted in small than in larger proteins. Such a discrepancy must be carefully considered in any algorithm for predicting residue accessibility.  相似文献   

5.
A major challenge in the development of antibody biotherapeutics is their tendency to aggregate. One root cause for aggregation is exposure of hydrophobic surface regions to the solvent. Many current techniques predict the relative aggregation propensity of antibodies via precalculated scales for the hydrophobicity or aggregation propensity of single amino acids. However, those scales cannot describe the nonadditive effects of a residue’s surrounding on its hydrophobicity. Therefore, they are inherently limited in their ability to describe the impact of subtle differences in molecular structure on the overall hydrophobicity. Here, we introduce a physics-based approach to describe hydrophobicity in terms of the hydration free energy using grid inhomogeneous solvation theory (GIST). We apply this method to assess the effects of starting structures, conformational sampling, and protonation states on the hydrophobicity of antibodies. Our results reveal that high-quality starting structures, i.e., crystal structures, are crucial for the prediction of hydrophobicity and that conformational sampling can compensate errors introduced by the starting structure. On the other hand, sampling of protonation states only leads to good results when combined with high-quality structures, whereas it can even be detrimental otherwise. We conclude by pointing out that a single static homology model may not be adequate for predicting hydrophobicity.  相似文献   

6.
Ahmad S  Gromiha MM  Sarai A 《Proteins》2003,50(4):629-635
The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.  相似文献   

7.
Adamczak R  Porollo A  Meller J 《Proteins》2005,59(3):467-475
Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchmc.org.  相似文献   

8.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

9.
10.
We perform a statistical analysis of solvent accessibility and hydrophobicity profiles of a representative set of proteins. The joint probability distribution is well fitted to a multivariable Gaussian, which takes a relatively simple form when expressed in terms of the Fourier transforms of the profiles. This allows us to quantify the asymmetric manner by which these profiles influence each other. For example, the α‐helix periodicity in sequence hydrophobicity is dictated by the solvent accessibility of structures, and not vice versa, possibly indicating the faster evolution of sequences compared to structures. The decorrelated hydrophobicity and solvent accessibility profiles show distinct behaviors at long periods, where sequence hydrophobicity fluctuates less, while solvent accessibility fluctuates more than average. The correlations between the two profiles can be interpreted as the Boltzmann weight of the solvation energy at room temperature, consistent with earlier observations. Proteins 2006. © 2005 Wiley‐Liss, Inc.  相似文献   

11.
Wang JY  Lee HM  Ahmad S 《Proteins》2005,61(3):481-491
A multiple linear regression method was applied to predict real values of solvent accessibility from the sequence and evolutionary information. This method allowed us to obtain coefficients of regression and correlation between the occurrence of an amino-acid residue at a specific target and its sequence neighbor positions on the one hand, and the solvent accessibility of that residue on the other. Our linear regression model based on sequence information and evolutionary models was found to predict residue accessibility with 18.9% and 16.2% mean absolute error respectively, which is better than or comparable to the best available methods. A correlation matrix for several neighbor positions to examine the role of evolutionary information at these positions has been developed and analyzed. As expected, the effective frequency of hydrophobic residues at target positions shows a strong negative correlation with solvent accessibility, whereas the reverse is true for charged and polar residues. The correlation of solvent accessibility with effective frequencies at neighboring positions falls abruptly with distance from target residues. Longer protein chains have been found to be more accurately predicted than their smaller counterparts.  相似文献   

12.
ABSTRACT: BACKGROUND: Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). RESULTS: Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio omega that varies linearly with RSA provides a better model fit than an RSA-independent omega or an omega that is estimated separately in individual RSA bins. We further show that the branch length t and the transition--transverion ratio kappa also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between omega and RSA, and gene expression level affects both the intercept and the slope. CONCLUSIONS: Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between omega and RSA implies that genes are better characterized by their omega slope and intercept than by just their mean omega.  相似文献   

13.
Fan  Chao  Liu  Diwei  Huang  Rui  Chen  Zhigang  Deng  Lei 《BMC bioinformatics》2016,17(1):85-95
Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints.  相似文献   

14.
Wang JY  Lee HM  Ahmad S 《Proteins》2007,68(1):82-91
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized.  相似文献   

15.
A new triclinic crystal structure form of porcine pancreatic procarboxypeptidase B (PCPB) was obtained at higher resolution than the previously known tetragonal crystal structure. This new crystal polymorph has allowed for a corrected, accurate assignment of residues along the polypeptide chain based on the currently available gene sequence information and crystallographic data. The present structure shows unbound PCPB in a distinct molecular packing as compared to the previous benzamidine complexed form. Its catalytically important Tyr248 residue is oriented and hydrogen‐bonded to solvent water molecules, and locates the furthest away from the catalytic zinc ion as compared to previous structures. A relatively long stretch of residues flanking Tyr248 and guarding the access to the catalytic zinc ion was found to be sequentially unique to the M14 family of peptidases. Predictions from a normal mode analysis indicated that this stretch of residues belongs to a rigid subdomain in the protein structure. The specific presence of a tyrosyl residue at the most exposed position in this region would allow for a delicate balance between extreme hydrophobicity and hydrophilicity, and affect substrate binding and the kinetic efficiency of the enzyme. © 2009 Wiley Periodicals, Inc. Biopolymers 93: 178–185, 2010. This article was originally published online as an accepted preprint. The “Published Online” date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley.com  相似文献   

16.
In site-directed spin labeling, the relative solvent accessibility of spin-labeled side chains is taken to be proportional to the Heisenberg exchange rate (W(ex)) of the nitroxide with a paramagnetic reagent in solution. In turn, relative values of W(ex) are determined by continuous wave power saturation methods and expressed as a proportional and dimensionless parameter Pi. In the experiments presented here, NiEDDA is characterized as a paramagnetic reagent for solvent accessibility studies, and it is shown that absolute values of W(ex) can be determined from Pi, and that the proportionality constant relating them is independent of the paramagnetic reagent and mobility of the nitroxide. Based on absolute exchange rates, an accessibility factor is defined (0 < rho < 1) that serves as a quantitative measure of side-chain solvent accessibility. The accessibility factors for a nitroxide side chain at 14 different sites in T4 lysozyme are shown to correlate with a structure-based accessibility parameter derived from the crystal structure of the protein. These results provide a useful means for relating crystallographic and site-directed spin labeling data, and hence comparing crystal and solution structures.  相似文献   

17.

Background  

Contradicting evidence has been presented in the literature concerning the effectiveness of empirical contact energies for fold recognition. Empirical contact energies are calculated on the basis of information available from selected protein structures, with respect to a defined reference state, according to the quasi-chemical approximation. Protein-solvent interactions are estimated from residue solvent accessibility.  相似文献   

18.
Atom depth as a descriptor of the protein interior   总被引:3,自引:0,他引:3       下载免费PDF全文
  相似文献   

19.
Multiprotein systems mediate most regulatory processes in living organisms. Although the structures of the individual proteins are often defined, less is known of the structures of multiprotein systems. Computational methods for predicting interfaces, using evolutionary conservation and/or physicochemical data, have been developed. Here we consider the use of solvent accessibility, residue propensity, and hydrophobicity, in conjunction with secondary structure data, as prediction parameters. We analyze the influence of residue type and secondary structure on solvent accessibility and define a measure of "relative exposedness." Clustering abnormally high scoring residues provides a basis for predicting interaction sites. The analysis is extended to investigate abnormally exposed secondary structure elements, particularly beta-sheet strands. We show that surface-exposed beta-strands lacking protective features are more likely to be found at protein-protein interfaces, allowing us to create an algorithm with approximately 68% and approximately 75% accuracy in differentiating between interacting and edge strands in isolated beta-strands and beta-sheet strands, respectively. These methods of identifying abnormally exposed surface regions are combined in an algorithm, which, on a data set of 77 unbound and disjoint (single chain extracted from complex) structures, predicts 79% of the protein-protein interfaces correctly. If enzyme-inhibitor complexes, where the inhibitor mimics a nonprotein substrate, are excluded, the accuracy increases to 85%.  相似文献   

20.

Background  

Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号