首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Wang JY  Lee HM  Ahmad S 《Proteins》2005,61(3):481-491
A multiple linear regression method was applied to predict real values of solvent accessibility from the sequence and evolutionary information. This method allowed us to obtain coefficients of regression and correlation between the occurrence of an amino-acid residue at a specific target and its sequence neighbor positions on the one hand, and the solvent accessibility of that residue on the other. Our linear regression model based on sequence information and evolutionary models was found to predict residue accessibility with 18.9% and 16.2% mean absolute error respectively, which is better than or comparable to the best available methods. A correlation matrix for several neighbor positions to examine the role of evolutionary information at these positions has been developed and analyzed. As expected, the effective frequency of hydrophobic residues at target positions shows a strong negative correlation with solvent accessibility, whereas the reverse is true for charged and polar residues. The correlation of solvent accessibility with effective frequencies at neighboring positions falls abruptly with distance from target residues. Longer protein chains have been found to be more accurately predicted than their smaller counterparts.  相似文献   

2.
Characterization of protein surface accessibility represents a new frontier of structural biology. A surface accessibility investigation for two structurally well-defined proteins, tendamistat and bovine pancreatic trypsin inhibitor, is performed here by a combined analysis of water-protein Overhauser effects and paramagnetic perturbation profiles induced by the soluble spin-label 4-hydroxy-2,2,6,6-tetramethyl-piperidine-1-oxyl on NMR spectra. This approach seems to be reliable not only for distinguishing between buried and exposed residues but also for finding molecular locations where a network of more ordered waters covers the protein surface. From the presented set of data, an overall picture of the surface accessibility of the two proteins can be inferred. Detailed knowledge of protein accessibility can form the basis for successful design of mutants with increased activity and/or greater specificity.  相似文献   

3.
Information on relative solvent accessibility (RSA) of amino acid residues in proteins provides valuable clues to the prediction of protein structure and function. A two-stage approach with support vector machines (SVMs) is proposed, where an SVM predictor is introduced to the output of the single-stage SVM approach to take into account the contextual relationships among solvent accessibilities for the prediction. By using the position-specific scoring matrices (PSSMs) generated by PSI-BLAST, the two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh data set of 215 protein structures and the RS126 data set of 126 nonhomologous globular proteins, respectively, which are better than the highest published scores on both data sets to date. A Web server for protein RSA prediction using a two-stage SVM method has been developed and is available (http://birc.ntu.edu.sg/~pas0186457/rsa.html).  相似文献   

4.
Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

5.
MOTIVATION: The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor method, a simple but powerful classification algorithm, has never been applied to the prediction of solvent accessibility, although it has been used frequently for the classification of biological and medical data. RESULTS: We applied the fuzzy k-nearest neighbor method to the solvent accessibility prediction, using PSI-BLAST profiles as feature vectors, and achieved high prediction accuracies. With leave-one-out cross-validation on the ASTRAL SCOP reference dataset constructed by sequence clustering, our method achieved 64.1% accuracy for a 3-state (buried/intermediate/exposed) prediction (thresholds of 9% for buried/intermediate and 36% for intermediate/exposed) and 86.7, 82.0, 79.0 and 78.5% accuracies for 2-state (buried/exposed) predictions (thresholds of each 0, 5, 16 and 25% for buried/exposed), respectively. Our method also showed slightly better accuracies than other methods by about 2-5% on the RS126 dataset and a benchmarking dataset with 229 proteins. AVAILABILITY: Program and datasets are available at http://biocom1.ssu.ac.kr/FKNNacc/ CONTACT: jul@ssu.ac.kr.  相似文献   

6.
Chromatin remodeling enzymes use energy derived from ATP hydrolysis to mobilize nucleosomes and alter their structure to facilitate DNA access. The Remodels the Structure of Chromatin (RSC) complex has been extensively studied, yet aspects of how this complex functionally interacts with nucleosomes remain unclear. We introduce a steric mapping approach to determine how RSC activity depends on interaction with specific surfaces within the nucleosome. We find that blocking SHL + 4.5/–4.5 via streptavidin binding to the H2A N-terminal tail domains results in inhibition of RSC nucleosome mobilization. However, restriction enzyme assays indicate that remodeling-dependent exposure of an internal DNA site near the nucleosome dyad is not affected. In contrast, occlusion of both protein faces of the nucleosome by streptavidin attachment near the acidic patch completely blocks both remodeling-dependent nucleosome mobilization and internal DNA site exposure. However, we observed partial inhibition when only one protein surface is occluded, consistent with abrogation of one of two productive RSC binding orientations. Our results indicate that nucleosome mobilization requires RSC access to the trailing but not the leading protein surface, and reveals a mechanism by which RSC and related complexes may drive unidirectional movement of nucleosomes to regulate cis-acting DNA sequences in vivo.  相似文献   

7.
Chen H  Zhou HX 《Nucleic acids research》2005,33(10):3193-3199
Residues that form the hydrophobic core of a protein are critical for its stability. A number of approaches have been developed to classify residues as buried or exposed. In order to optimize the classification, we have refined a suite of five methods over a large dataset and proposed a metamethod based on an ensemble average of the individual methods, leading to a two-state classification accuracy of 80%. Many studies have suggested that hydrophobic core residues are likely sites of deleterious mutations, so we wanted to see to what extent these sites can be predicted from the putative buried residues. Residues that were most confidently classified as buried were proposed as sites of deleterious mutations. This proposition was tested on six proteins for which sites of deleterious mutations have previously been identified by stability measurement or functional assay. Of the total of 130 residues predicted as sites of deleterious mutations, 104 (or 80%) were correct.  相似文献   

8.
9.
The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases which link the carbohydrate GalNAc to the side-chain of certain serine and threonine residues in mucin type glycoproteins, are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position – 1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods. Predictions of O-glycosylation sites in the envelope glycoprotein gp120 from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The most conserved O-glycosylation signals in these evolutionary-related glycoproteins were found in their first hypervariable loop, V1. However, the strain variation for HIV-1 gp120 was significant. A computer server, available through WWW or E-mail, has been developed for prediction of mucin type O-glycosylation sites in proteins based on the amino acid sequence. The server addresses are http://www.cbs.dtu.dk/services/NetOGlyc/ and netOglyc@cbs.dtu.dk.  相似文献   

10.
在充分利用土壤类型、土地利用方式、岩性类型、地形、道路、工业类型等影响土壤质量主要因素,准确获取区域土壤质量的空间分布特征的基础上,采用互信息理论对13个辅助变量(岩性类型、土地利用方式、土壤类型、到城镇的距离、到道路的距离、到工业用地的距离、到河流的距离、相对高程、坡度、坡向、平向曲率、纵向曲率和切线曲率)进行筛选,然后通过决策树See5.0预测研究区土壤质量.结果表明: 影响研究区土壤质量的主要因素包括土壤类型、土地利用方式、岩性类型、到城镇的距离、到水域的距离、相对高程、到道路的距离和到工业用地的距离;以互信息理论选取的因子为预测变量的决策树模型精度明显优于以全部因子为预测变量的决策树模型,在前者的决策树模型中,无论是决策树还是决策规则,分类预测精度均达到80%以上.互信息理论结合决策树的方法在充分利用连续型和字符型数据的基础上,不仅精简了一般决策树算法的输入参数,而且能有效地预测和评价区域土壤质量等级.  相似文献   

11.
A protein's surface influences its role in protein-protein interactions and protein-ligand binding. Mass spectrometry can be used to give low resolution structural information about protein surfaces and conformations when used in combination with derivatization methods that target surface accessible amino acid residues. However, pinpointing the resulting modified peptides upon enzymatic digestion of the surface-modified protein is challenging because of the complexity of the peptide mixture and low abundance of modified peptides. Here a novel hydrazone reagent (NN) is presented that allows facile identification of all modified surface residues through a preferential cleavage upon activation by electron transfer dissociation coupled with a collision activation scan to pinpoint the modified residue in the peptide sequence. Using this approach, the correlation between percent reactivity and surface accessibility is demonstrated for two biologically active proteins, wheat eIF4E and PARP-1 Domain C.  相似文献   

12.
Coupling of a specific ligand to vaccines or drugs can be a powerful aid to route these compounds to a certain target cell population. However, if the targeted receptor is buried in a glycocalyx, binding of the ligand may be sterically hindered or even abolished, especially when the ligand is attached to bulky payloads. The antigen-transporting M cells that cover the gut-associated lymphoid tissue have a less pronounced glycocalyx than neighboring enterocytes. Such architectural differences might provide a possibility for targeting micro- or nanoparticulate vaccines to the mucosal immune system. To investigate the influence of the glycocalyx on the accessibility of cell surface receptors, we developed a system where a monolayer of ligand molecules is coupled in spatially aligned manner onto the surface of microparticles. On the basis of fluorescent carboxylate-modified particles of 1 micron diameter, different synthetic strategies were tested. Particles were first modified to display aldehyde functions on their surface, then protein ligands were coupled via Schiff base formation. The performance of the particles was tested on cultured mouse fibroblasts using the B subunit of cholera toxin as ligand and the plasma membrane glycolipid ganglioside G(M1) as receptor. Cholera toxin B subunit-coated microparticles generated by one of our synthetic pathways exhibited specific binding to fibroblasts which could be blocked with soluble cholera toxin B subunit. As particles as small as 50 nm and any proteinaceous ligand may be used, this system provides a versatile means for monitoring receptor accessibilities in vitro and in vivo.  相似文献   

13.
Steward RE  Thornton JM 《Proteins》2002,48(2):178-191
An information theory approach was developed to predict the alignment of interacting antiparallel and parallel beta-strands. Information scores were derived for the preference of a residue on a beta-strand to be opposite a sequence of residues on an adjacent beta-strand. These scores were used to predict the interstrand register of interacting beta-strands from 10 alternative offset positions either side of the experimentally observed beta-sheet register. The amino acid sequence of an internal beta-strand can be correctly aligned with two beta-strands in a fixed position either side of the strand in 45% of antiparallel and 48% of parallel arrangements. For comparison, when another beta-strand from a nonhomologous protein substitutes the internal beta-strand, the same register is predicted for only 24 and 36% of antiparallel and parallel arrangements. As expected, alignment of a single fixed strand with just a second beta-strand sequence was more difficult, and gave a correct register in 31 and 37% of antiparallel and parallel beta-pairs, respectively. These scores are 10% higher than for two randomly selected beta-strand sequences. In general, prediction accuracy was not improved by information tables that distinguished hydrogen-bonding patterns or beta-strand order. These results will contribute to predicting the arrangement of beta-strands in beta-pleated sheets and protein topology.  相似文献   

14.
Membrane proteins are gatekeepers to the cell and essential for determination of the function of cells. Identification of the types of membrane proteins is an essential problem in cell biology. It is time-consuming and expensive to identify the type of membrane proteins with traditional experimental methods. The alternative way is to design effective computational methods, which can provide quick and reliable predictions. To date, several computational methods have been proposed in this regard. Several of them used the features extracted from the sequence information of individual proteins. Recently, networks are more and more popular to tackle different protein-related problems, which can organize proteins in a system level and give an overview of all proteins. However, such form weakens the essential properties of proteins, such as their sequence information. In this study, a novel feature fusion scheme was proposed, which integrated the information of protein sequences and protein-protein interaction network. The fused features of a protein were defined as the linear combination of sequence features of all proteins in the network, where the combination coefficients were the probabilities yielded by the random walk with restart algorithm with the protein as the seed node. Several models with such fused features and different classification algorithms were built and evaluated. Their performance for predicting the type of membrane proteins was improved compared with the models only with the sequence features or network information.  相似文献   

15.

Background

Self-interacting Proteins (SIPs) plays a critical role in a series of life function in most living cells. Researches on SIPs are important part of molecular biology. Although numerous SIPs data be provided, traditional experimental methods are labor-intensive, time-consuming and costly and can only yield limited results in real-world needs. Hence,it’s urgent to develop an efficient computational SIPs prediction method to fill the gap. Deep learning technologies have proven to produce subversive performance improvements in many areas, but the effectiveness of deep learning methods for SIPs prediction has not been verified.

Results

We developed a deep learning model for predicting SIPs by constructing a Stacked Long Short-Term Memory (SLSTM) neural network that contains “dropout”. We extracted features from protein sequences using a novel feature extraction scheme that combined Zernike Moments (ZMs) with Position Specific Weight Matrix (PSWM). The capability of the proposed approach was assessed on S.erevisiae and Human SIPs datasets. The result indicates that the approach based on deep learning can effectively resist data skew and achieve good accuracies of 95.69 and 97.88%, respectively. To demonstrate the progressiveness of deep learning, we compared the results of the SLSTM-based method and the celebrated Support Vector Machine (SVM) method and several other well-known methods on the same datasets.

Conclusion

The results show that our method is overall superior to any of the other existing state-of-the-art techniques. As far as we know, this study first applies deep learning method to predict SIPs, and practical experimental results reveal its potential in SIPs identification.
  相似文献   

16.
In the era of structural genomics, the prediction of protein interactions using docking algorithms is an important goal. The success of this method critically relies on the identification of good docking solutions among a vast excess of false solutions. We have adapted the concept of mutual information (MI) from information theory to achieve a fast and quantitative screening of different structural features with respect to their ability to discriminate between physiological and nonphysiological protein interfaces. The strategy includes the discretization of each structural feature into distinct value ranges to optimize its mutual information. We have selected 11 structural features and two datasets to demonstrate that the MI is dimensionless and can be directly compared for diverse structural features and between datasets of different sizes. Conversion of the MI values into a simple scoring function revealed that those features with a higher MI are actually more powerful for the identification of good docking solutions. Thus, an MI-based approach allows the rapid screening of structural features with respect to their information content and should therefore be helpful for the design of improved scoring functions in future. In addition, the concept presented here may also be adapted to related areas that require feature selection for biomolecules or organic ligands.  相似文献   

17.
Protein post-translational modifications are crucial to the function of many proteins. In this study, we have investigated the structural environment of 8378 incidences of 44 types of post-translational modifications with 19 different approaches. We show that modified amino acids likely to be involved in protein-protein interactions, such as ester-linked phosphorylation, methylarginine, acetyllysine, sulfotyrosine, hydroxyproline, and hydroxylysine, are clearly surface associated. Other modifications, including O-GlcNAc, phosphohistidine, 4-aspartylphosphate, methyllysine, and ADP-ribosylarginine, are either not surface associated or are in a protein's core. Artifactual modifications were found to be randomly distributed throughout the protein. We discuss how the surface accessibility of post-translational modifications can be important for protein-protein interactivity.  相似文献   

18.
To characterize water binding to proteins, which is fundamental to protein folding, stability and activity, the relationships of 10,837 bound water positions to protein surface shape and residue type were analyzed in 56 high-resolution crystallographic structures. Fractal atomic density and accessibility algorithms provided an objective characterization of deep grooves in solvent-accessible protein surfaces. These deep grooves consistently had approximately the diameter of one water molecule, suggesting that deep grooves are formed by the interactions between protein atoms and bound water molecules. Protein surface topography dominates the chemistry and extent of water binding. Protein surface area within grooves bound three times as many water molecules as non-groove surface; grooves accounted for one-quarter of the total surface area yet bound half the water molecules. Moreover, only within grooves did bound water molecules discriminate between different side-chains. In grooves, main-chain surface was as hydrated as that of the most hydrophilic side-chains, Asp and Glu, whereas outside grooves all main and side-chains bound water to a similar, and much decreased, extent. This identification of the interdependence of protein surface shape and hydration has general implications for modelling and prediction of protein surface shape, recognition, local folding and solvent binding.  相似文献   

19.
Yuan Z  Huang B 《Proteins》2004,57(3):558-564
A novel support vector regression (SVR) approach is proposed to predict protein accessible surface areas (ASAs) from their primary structures. In this work, we predict the real values of ASA in squared angstroms for residues instead of relative solvent accessibility. Based on protein residues, the mean and median absolute errors are 26.0 A(2) and 18.87 A(2), respectively. The correlation coefficient between the predicted and observed ASAs is 0.66. Cysteine is the best predicted amino acid (mean absolute error is 13.8 A(2) and median absolute error is 8.37 A(2)), while arginine is the least predicted amino acid (mean absolute error is 42.7 A(2) and median absolute error is 36.31 A(2)). Our work suggests that the SVR approach can be directly applied to the ASA prediction where data preclassification has been used.  相似文献   

20.
Secondary structures of histones H1, H2A, H2B, H3, H4 and H5 have been calculated by the computer program ALB based on a molecular theory of protein secondary structure. The predicted secondary structures of all histones are predominantly alpha-helical. The calculated secondary structure of linker histones H1 and H5 is close to that previously obtained from two-dimensional NMR data. For each of the core histones (H2A, H2B, H3, H4) one long alpha-helix and several short ones have been predicted. These long helices can be identified with rods in the low-resolution electron density map.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号