首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Yuan Z  Bailey TL  Teasdale RD 《Proteins》2005,58(4):905-912
The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods.  相似文献   

2.
We present a set of four parameters that in combination can predict DNA-binding residues on protein structures to a high degree of accuracy. These are the number of evolutionary conserved residues (N(cons)) and their spatial clustering (ρ(e)), hydrogen bond donor capability (D(p)) and residue propensity (R(p)). We first used these parameters to characterize 130 interfaces in a set of 126 DNA-binding proteins (DBPs). The applicability of these parameters both individually and in combination, to distinguish the true binding region from the rest of the protein surface was then analyzed. R(p) shows the best performance identifying the true interface with the top rank in 83% cases. Importantly, we also used the unbound-bound test cases of the protein-DNA docking benchmark to test the efficacy of our method. When applied to the unbound form of the DBPs, R(p) can distinguish 86% cases. Finally, we have applied the SVM approach for recognizing the interface region using the above parameters along with the individual amino acid composition as attributes. The accuracy of prediction is 90.5% for the bound structures and 93.6% for the unbound form of the proteins.  相似文献   

3.
Cation-pi interactions play an important role to the stability of protein structures. In our earlier work, we have analyzed the influence and energetic contribution of cation-pi interactions in three-dimensional structures of membrane proteins. In this work, we investigate the characteristic features of residues that are involved in cation-pi interactions. We have computed several parameters, such as surrounding hydrophobicity, number of long-range contacts, conservation score and normalized B-factor for all these residues and identified their location, whether in the membrane or at surface. We found that the cation-pi interactions are mainly formed by long-range interactions. The cationic residues involved in cation-pi interactions have higher surrounding hydrophobicity than their average values in the whole dataset and an opposite trend is observed for aromatic residues. In transmembrane helical proteins, except Phe, all other residues that are responsible for cation-pi interactions are highly conserved with other related protein sequences whereas in transmembrane strand proteins, an appreciable conservation is observed only for Arg. The analysis on the flexibility of residues reveals that the cation-pi interaction forming residues are more stable than other residues. The results obtained in the present study would be helpful to understand the role of cation-pi interactions in the structure and folding of membrane proteins.  相似文献   

4.
Lin CP  Huang SW  Lai YL  Yen SC  Shih CH  Lu CH  Huang CC  Hwang JK 《Proteins》2008,72(3):929-935
It has recently been shown that in proteins the atomic mean-square displacement (or B-factor) can be related to the number of the neighboring atoms (or protein contact number), and that this relationship allows one to compute the B-factor profiles directly from protein contact number. This method, referred to as the protein contact model, is appealing, since it requires neither trajectory integration nor matrix diagonalization. As a result, the protein contact model can be applied to very large proteins and can be implemented as a high-throughput computational tool to compute atomic fluctuations in proteins. Here, we show that this relationship can be further refined to that between the atomic mean-square displacement and the weighted protein contact-number, the weight being the square of the reciprocal distance between the contacting pair. In addition, we show that this relationship can be utilized to compute the cross-correlation of atomic motion (the B-factor is essentially the auto-correlation of atomic motion). For a nonhomologous dataset comprising 972 high-resolution X-ray protein structures (resolution <2.0 A and sequence identity <25%), the mean correlation coefficient between the X-ray and computed B-factors based on the weighted protein contact-number model is 0.61, which is better than those of the original contact-number model (0.51) and other methods. We also show that the computed correlation maps based on the weighted contact-number model are globally similar to those computed through normal model analysis for some selected cases. Our results underscore the relationship between protein dynamics and protein packing. We believe that our method will be useful in the study of the protein structure-dynamics relationship.  相似文献   

5.
Amino acid residues, which play important roles in protein function, are often conserved. Here, we analyze thermodynamic and structural data of protein-DNA interactions to explore a relationship between free energy, sequence conservation and structural cooperativity. We observe that the most stabilizing residues or putative hotspots are those which occur as clusters of conserved residues. The higher packing density of the clusters and available experimental thermodynamic data of mutations suggest cooperativity between conserved residues in the clusters. Conserved singlets contribute to the stability of protein-DNA complexes to a lesser extent. We also analyze structural features of conserved residues and their clusters and examine their role in identifying DNA-binding sites. We show that about half of the observed conserved residue clusters are in the interface with the DNA, which could be identified from their amino acid composition; whereas the remaining clusters are at the protein-protein or protein-ligand interface, or embedded in the structural scaffolds. In protein-protein interfaces, conserved residues are highly correlated with experimental residue hotspots, contributing dominantly and often cooperatively to the stability of protein-protein complexes. Overall, the conservation patterns of the stabilizing residues in DNA-binding proteins also highlight the significance of clustering as compared to single residue conservation.  相似文献   

6.
7.
The structures of DNA-protein complexes have illuminated the diversity of DNA-protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA-protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on approximately 4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to approximately 1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.  相似文献   

8.
Protein-DNA interactions play an essential role in the genetic activities of life. Many structures of protein-DNA complexes are already known, but the common rules on how and where proteins bind to DNA have not emerged. Many attempts have been made to predict protein-DNA interactions using structural information, but the success rate is still about 80%. We analyzed 63 protein-DNA complexes by focusing our attention on the shape of the molecular surface of the protein and DNA, along with the electrostatic potential on the surface, and constructed a new statistical evaluation function to make predictions of DNA interaction sites on protein molecular surfaces. The shape of the molecular surface was described by a combination of local and global average curvature, which are intended to describe the small convex and concave and the large-scale concave curvatures of the protein surface preferentially appearing at DNA-binding sites. Using these structural features, along with the electrostatic potential obtained by solving the Poisson-Boltzmann equation numerically, we have developed prediction schemes with 86% and 96% accuracy for DNA-binding and non-DNA-binding proteins, respectively.  相似文献   

9.
Ho SY  Yu FC  Chang CY  Huang HL 《Bio Systems》2007,90(1):234-241
In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.  相似文献   

10.
Prediction of DNA-binding residues from sequence   总被引:2,自引:0,他引:2  
MOTIVATION: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. RESULTS: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein-DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. AVAILABILITY: http://cubic.bioc.columbia.edu/services/disis.  相似文献   

11.

Background

Previous studies on protein-DNA interaction mostly focused on the bound structure of DNA-binding proteins but few paid enough attention to the unbound structures. As more new proteins are discovered, it is useful and imperative to develop algorithms for the functional prediction of unbound proteins. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and extract useful statistical and geometric features, and use structural alignment and support vector machines for the prediction of unbound DNA-binding proteins.

Results

The performance of our method is evaluated by discriminating a set of 104 DNA-binding proteins from 401 non-DNA-binding proteins. In the same test, the proposed method outperforms the other method using conditional probability. The results achieved by our proposed method for; precision, 83.33%; accuracy, 86.53%; and MCC, 0.5368 demonstrate its good performance.

Conclusions

In this study we develop an effective method for the prediction of protein-DNA interactions based on statistical and geometric features and support vector machines. Our results show that interface surface features play an important role in protein-DNA interaction. Our technique is able to predict unbound DNA-binding protein and discriminatory DNA-binding proteins from proteins that bind with other molecules.
  相似文献   

12.
13.
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.  相似文献   

14.
Weitao Sun  Jing He 《Biopolymers》2010,93(10):904-916
Residue clusters play essential role in stabilizing protein structures in the form of complex networks. We show that the cluster sizes in a native protein follow the log‐normal distribution for a dataset consisting of 424 proteins. To our knowledge, this is the first time of such fitting for the native structures. Based on log‐normal model, the asymptotically increasing mean cluster sizes produce a critical protein chain length of about 200 amino acids, beyond which length most globular proteins have nearly the same mean cluster sizes. This suggests that the larger proteins use a different packing mechanism than the smaller proteins. We confirmed the scale‐free property of the residue contact network for most of the protein structures in the dataset, although the violations were observed for the tightly packed proteins. Residue cluster network wheel (RCNW) is proposed to visualize the relationship between the multiple properties of the residue network such as the cluster size, the residue types and contacts, and the flexibility of the residue. We noticed that the residues with large cluster size have smaller Cα displacement measured using the normal mode analysis. © 2010 Wiley Periodicals, Inc. Biopolymers 93: 904–916, 2010.  相似文献   

15.
Bhardwaj N  Lu H 《FEBS letters》2007,581(5):1058-1066
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.  相似文献   

16.
17.
Chen YC  Wu CY  Lim C 《Proteins》2007,67(3):671-680
Binding of polyanionic DNA depends on the cluster of electropositive atoms in the binding site of a DNA-binding protein. Such a cluster of electropositive protein atoms would be electrostatically unfavorable without stabilizing interactions from the respective electronegative DNA atoms and would likely be evolutionary conserved due to its critical biological role. Consequently, our strategy for predicting DNA-binding residues is based on detecting a cluster of evolutionary conserved surface residues that are electrostatically stabilized upon mutation to negatively charged Asp/Glu residues. The method requires as input the protein structure and sufficient sequence homologs to define each residue's relative conservation, and it yields as output experimentally testable residues that are predicted to bind DNA. By incorporating characteristic DNA-binding site features (i.e., electrostatic strain and amino acid conservation), the new method yields a prediction accuracy of 83%, which is much higher than methods based on only electrostatic strain (57%) or conservation alone (50%). It is also less sensitive to protein conformational changes upon DNA binding than methods that mainly depend on the 3D protein structure.  相似文献   

18.
19.

Background

DNA-binding proteins perform their functions through specific or non-specific sequence recognition. Although many sequence- or structure-based approaches have been proposed to identify DNA-binding residues on proteins or protein-binding sites on DNA sequences with satisfied performance, it remains a challenging task to unveil the exact mechanism of protein-DNA interactions without crystal complex structures. Without information from complexes, the linkages between DNA-binding proteins and their binding sites on DNA are still missing.

Methods

While it is still difficult to acquire co-crystallized structures in an efficient way, this study proposes a knowledge-based learning method to effectively predict DNA orientation and base locations around the protein’s DNA-binding sites when given a protein structure. First, the functionally important residues of a query protein are predicted by a sequential pattern mining tool. After that, surface residues falling in the predicted functional regions are determined based on the given structure. These residues are then clustered based on their spatial coordinates and the resultant clusters are ranked by a proposed DNA-binding propensity function. Clusters with high DNA-binding propensities are treated as DNA-binding units (DBUs) and each DBU is analyzed by principal component analysis (PCA) to predict potential orientation of DNA grooves. More specifically, the proposed method is developed to predict the direction of the tangent line to the helix curve of the DNA groove where a DBU is going to bind.

Results

This paper proposes a knowledge-based learning procedure to determine the spatial location of the DNA groove with respect to the query protein structure by considering geometric propensity between protein side chains and DNA bases. The 11 test cases used in this study reveal that the location and orientation of the DNA groove around a selected DBU can be predicted with satisfied errors.

Conclusions

This study presents a method to predict the location and orientation of DNA grooves with respect to the structure of a DNA-binding protein. The test cases shown in this study reveal the possibility of imaging protein-DNA binding conformation before co-crystallized structure can be determined. How the proposed method can be incorporated with existing protein-DNA docking tools to study protein-DNA interactions deserve further studies in the near future.
  相似文献   

20.
The basic DNA-binding modules of 128 protein-DNA interfaces have been analyzed. Although these are less planar, like the protein-protein interfaces, the protein-DNA interfaces can also be dissected into core regions in which all the fully-buried atoms are located, and rim regions having atoms with residual accessibilities. The sequence entropy of the core residues is smaller than those in the rim, indicating that the former are better conserved and possibly contribute more towards the binding free energy, as has been implicated in protein-protein interactions. On the protein side, 1014 A(2) of the surface is buried of which 63% belong to the core. There are some differences in the propensities of residues to occur in the core and the rim. In the DNA strands, the nucleotide(s) containing fully-buried atoms in all three components usually occupy central positions of the binding region. A new classification scheme for the interfaces has been introduced based on the composition of secondary structural elements of residues and the results compared with the conventional classification of DNA-binding proteins, as well as the protein class of the molecule. It appears that a common framework may be developed to understand both protein-protein and protein-DNA interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号