首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ahmad S  Gromiha MM  Sarai A 《Proteins》2003,50(4):629-635
The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.  相似文献   

2.
Nguyen MN  Rajapakse JC 《Proteins》2006,63(3):542-550
We address the problem of predicting solvent accessible surface area (ASA) of amino acid residues in protein sequences, without classifying them into buried and exposed types. A two-stage support vector regression (SVR) approach is proposed to predict real values of ASA from the position-specific scoring matrices generated from PSI-BLAST profiles. By adding SVR as the second stage to capture the influences on the ASA value of a residue by those of its neighbors, the two-stage SVR approach achieves improvements of mean absolute errors up to 3.3%, and correlation coefficients of 0.66, 0.68, and 0.67 on the Manesh dataset of 215 proteins, the Barton dataset of 502 nonhomologous proteins, and the Carugo dataset of 338 proteins, respectively, which are better than the scores published earlier on these datasets. A Web server for protein ASA prediction by using a two-stage SVR method has been developed and is available (http://birc.ntu.edu.sg/~ pas0186457/asa.html).  相似文献   

3.
Wang JY  Lee HM  Ahmad S 《Proteins》2007,68(1):82-91
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized.  相似文献   

4.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

5.
In this work, we explore a novel method to broaden the scope of sequence-based predictions of solvent accessibility or accessible surface area (ASA) to the atomic level. All 167 heavy atoms from the 20 types of amino acid residues in proteins have been studied. An analysis of ASA distribution of these atomic groups in different proteins has been performed and rotamer-style libraries have been developed. We observe that the ASA of some atomic groups (e.g., backbone C and N atoms) can be estimated from the sequence environment within a mean absolute error of 2-3 angstroms(2). However, some side chain atoms such as CG in Pro, NH1 in Arg and NE2 in Gln show a strong variability making it more difficult to estimate their ASA from sequence environment. In general, the prediction of ASA becomes more difficult for atomic positions at the side chain extremities of long amino acid residues (aromatic side chain terminals being the exception). Several atomic groups are frequently exposed to solvent. Some of them have a bimodal distribution, suggesting two stable conformations in terms of their solvent exposure. More detailed understanding and prediction of solvent accessibility, i.e., at an atomic level is expected to help in bioinformatics approaches to structure prediction, functional relevance of atomic solvent accessibilities and other interaction analyses.  相似文献   

6.
7.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

8.
Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.  相似文献   

9.
Yuan Z  Bailey TL  Teasdale RD 《Proteins》2005,58(4):905-912
The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods.  相似文献   

10.
Wu S  Zhang Y 《PloS one》2008,3(10):e3400
We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28 degrees/46 degrees , which is approximately 10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0 x 10(-300) (or <1.0 x 10(-148)) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10-20 degrees ) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling.  相似文献   

11.
We analyzed the total, hydrophobic, and hydrophilic accessible surfaces (ASAs) of residues from a nonredundant bank of 587 3D structure proteins. In an extended fold, residues are classified into three families with respect to their hydrophobicity balance. As expected, residues lose part of their solvent-accessible surface with folding but the three groups remain. The decrease of accessibility is more pronounced for hydrophobic than hydrophilic residues. Amazingly, Lysine is the residue with the largest hydrophobic accessible surface in folded structures. Our analysis points out a clear difference between the mean (other studies) and median (this study) ASA values of hydrophobic residues, which should be taken into consideration for future investigations on a protein-accessible surface, in order to improve predictions requiring ASA values. The different secondary structures correspond to different accessibility of residues. Random coils, turns, and beta-structures (outside beta-sheets) are the most accessible folds, with an average of 30% accessibility. The helical residues are about 20% accessible, and the difference between the hydrophobic and the hydrophilic residues illustrates the amphipathy of many helices. Residues from beta-sheets are the most inaccessible to solvent (10% accessible). Hence, beta-sheets are the most appropriate structures to shield the hydrophobic parts of residues from water. We also show that there is an equal balance between the hydrophobic and the hydrophilic accessible surfaces of the 3D protein surfaces irrespective of the protein size. This results in a patchwork surface of hydrophobic and hydrophilic areas, which could be important for protein interactions and/or activity.  相似文献   

12.
SUMMARY: RVP-net is an online program for the prediction of real valued solvent accessibility. All previous methods of accessible surface area (ASA) predictions classify amino acid residues into exposure states and named them buried or exposed based on different thresholds. Real values in some cases were generated by taking the mid points of these state thresholds. This is the first method, which provides a direct prediction of ASA without making exposure categories and achieves results better than 19% mean absolute error. To facilitate batch processing of several sequences, a standalone version of this tool is also provided. AVAILABILITY: Online predictions are available at http://www.netasa.org/rvp-net/. Standalone version of the program can be obtained from the corresponding author by E-mail request.  相似文献   

13.
免疫性不育病人的血清(IPS)能100%地抑制人体外受精.用该血清筛选人睾丸cDNA基因表达文库,发现了一种新的睾丸特异抗原(称作C2).通过DNA顺序研究,并与基因库中的有关同源性基因数据进行比较,证实C2是一个新的特异蛋白.C2基因仅与睾丸组织的mRNA杂交.该克隆在大肠杆菌中所表达的融合蛋白能够被3个不同的不育病人血清识别.利用嵌套缺失和Western印迹的方法研究其抗原决定簇定位,发现B细胞抗原决定簇在羧基端29个氨基酸范围内.用GCG软件分析该抗原的氨基酸的亲水性和疏水性以及其存在于蛋白表面的可能性,确定其中的15个氨基酸为抗原决定簇所必需,并合成了多肽  相似文献   

14.
Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein''s sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction.  相似文献   

15.
目的预测金黄色葡萄球菌肠毒素A蛋白(SEA)的B细胞表位。方法以金黄色葡萄球菌合肥乳源分离株M3基因组DNA为模板,PCR扩增SEA基因并进行序列测定与分析。应用DNAstar protean软件对SEA蛋白的二级结构、柔性、亲水性、表面可能性和抗原指数等多参数进行综合分析,预测其B细胞表位。结果M3分离株的SEA基因全长774bp,编码由257个氨基酸组成的相对分子量为29.67kDa的SEA蛋白,M3分离株SEA基因与标准株的核苷酸序列与氨基酸序列同源性分别为98.7%和98.4%。SEA蛋白的优势B细胞表位位于肽链的第64—68、100~107、138—141、156—160、166~173、213~217和237~244区段。结论预测出SEA蛋白的7个优势B细胞表位,为进而克隆表达表位蛋白,制备针对SEA表位的单克隆抗体奠定了基础。  相似文献   

16.
根据PDB提供的PrPC的原子坐标,利用MSMS程序,对PrPC氨基酸残基溶剂可及表面积进行了计算和分析.结果表明:(1) PrPC氨基酸残基可及性具有总体一致性特点;(2) PrPC蛋白质序列中非保守残基与种间屏障有一定关系;(3) 在PrPC向PrP Sc转变过程中,由于蛋白质X的结合,PrPC可能会出现一定的构象变化,这种变化利于PrPC向PrPSc发生转变.  相似文献   

17.
The solvent accessible surface area (ASA) of the polysaccharides, namely (i) carrageenan (1CAR); (ii) agarose (1AGA); (iii) guaran (GUR); (iv) capsular polysaccharide (1CAP); and (v) hyaluronan (1HUA), have been computed using the solvent accessibility technique of Lee and Richards. The results show that the average variation of ASA for the various atoms in the molecules lie in the range 1-30 A(2). Irrespective of position of sulfation, either at two or four in the sugar residues in 1CAR, the charged groups interact almost equally with the solvent. The ASA values for the chains A and B in 1AGA and 1CAR indicate that there are not much interchain interactions and the chains in both the molecules interact equally with the solvent. Residue-wise analysis indicates that the ASAs of residues vary alternately, high-low-high value pattern that is similar to that of the hydrophobic behaviour of beta-strands in proteins. The results also suggest that in these polysaccharides D-configuration residues have higher ASA than L-configuration residues.  相似文献   

18.

Background  

Solvent accessibility (ASA) of amino acid residues is often transformed from absolute values of exposed surface area to their normalized relative values. This normalization is typically attained by assuming a highest exposure conformation based on extended state of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest possible (or highest observed) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using context-dependent Highest Observed ASA (HOA) instead of context-free extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.  相似文献   

19.
The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds ≤1.8Å, ≤2.0Å, ≤2.5Å, and ≤3.0Å. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (≤3.0 Å, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 ÷ 11.8% for the 3.0 Å dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 ÷ 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service. © 1996 Wiley-Liss, Inc.  相似文献   

20.
利用RT-PCR技术从烟实夜蛾Helicoverpa assulta (Hass) 雄虫触角中扩增得到了信息素结合蛋白3(Hass PBP3)。克隆和测序结果表明,该基因核苷酸序列全长495 bp,编码164个氨基酸残基,预测分子量18.5 kD。并预测N-末端疏水区包含由22个氨基酸组成的信号肽。因此,成熟蛋白应包括142个氨基酸,预测分子量为16.1 kD,等电点为5.44。经氨基酸序列同源性分析发现,此序列与已知昆虫PBP3有较高的同源性,而且具有气味结合蛋白的典型特征。将该基因重组到表达载体pGEX-4T-2中进行原核表达。经IPTG诱导、SDS-PAGE分析和Western印迹检测,结果表明烟实夜蛾PBP3基因能在大肠杆菌BL21中表达,电泳检测到一条大约42 kD的外源蛋白,与预测的融合蛋白分子量相符。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号