共查询到20条相似文献,搜索用时 0 毫秒
1.
Prediction of DNA-binding residues from sequence 总被引:2,自引:0,他引:2
MOTIVATION: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. RESULTS: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein-DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. AVAILABILITY: http://cubic.bioc.columbia.edu/services/disis. 相似文献
2.
Background
The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.Methods
We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.Results
Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.Conclusions
The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.3.
Hayat S Walter P Park Y Helms V 《Journal of bioinformatics and computational biology》2011,9(1):43-65
We present BTMX (Beta barrel TransMembrane eXposure), a computational method to predict the exposure status (i.e. exposed to the bilayer or hidden in the protein structure) of transmembrane residues in transmembrane beta barrel proteins (TMBs). BTMX predicts the exposure status of known TM residues with an accuracy of 84.2% over 2,225 residues and provides a confidence score for all predictions. Predictions made are in concert with the fact that hydrophobic residues tend to be more exposed to the bilayer. The biological relevance of the input parameters is also discussed. The highest prediction accuracy is obtained when a sliding window comprising three residues with similar C(α)-C(β) vector orientations is employed. The prediction accuracy of the BTMX method on a separate unseen non-redundant test dataset is 78.1%. By employing out-pointing residues that are exposed to the bilayer, we have identified various physico-chemical properties that show statistically significant differences between the beta strands located at the oligomeric interfaces compared to the non-oligomeric strands. The BTMX web server generates colored, annotated snake-plots as part of the prediction results and is available under the BTMX tab at http://service.bioinformatik.uni-saarland.de/tmx-site/. Exposure status prediction of TMB residues may be useful in 3D structure prediction of TMBs. 相似文献
4.
Predicting DNA-binding residues from a protein three-dimensional structure is a key task of computational structural proteomics. In the present study, based on machine learning technology, we aim to explore a reduced set of weighted average features for improving prediction of DNA-binding residues on protein surfaces. Via constructing the spatial environment around a DNA-binding residue, a novel weighting factor is first proposed to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Then, a weighted average scheme is introduced to represent the surface patch of the considering residue. Finally, the classifier is trained on the reduced set of these weighted average features, consisting of evolutionary profile, interface propensity, betweenness centrality and solvent surface area of side chain. Experimental results on 5-fold cross validation and independent tests indicate that the new feature set are effective to describe DNA-binding residues and our approach has significantly better performance than two previous methods. Furthermore, a brief case study suggests that the weighted average features are powerful for identifying DNA-binding residues and are promising for further study of protein structure-function relationship. The source code and datasets are available upon request. 相似文献
5.
Changhui Yan Michael Terribilini Feihong Wu Robert L Jernigan Drena Dobbs Vasant Honavar 《BMC bioinformatics》2006,7(1):262
Background
Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. 相似文献6.
Pugalenthi G Kumar KK Suganthan PN Gangal R 《Biochemical and biophysical research communications》2008,367(3):630-634
Identification of catalytic residues can provide valuable insights into protein function. With the increasing number of protein 3D structures having been solved by X-ray crystallography and NMR techniques, it is highly desirable to develop an efficient method to identify their catalytic sites. In this paper, we present an SVM method for the identification of catalytic residues using sequence and structural features. The algorithm was applied to the 2096 catalytic residues derived from Catalytic Site Atlas database. We obtained overall prediction accuracy of 88.6% from 10-fold cross validation and 95.76% from resubstitution test. Testing on the 254 catalytic residues shows our method can correctly predict all 254 residues. This result suggests the usefulness of our approach for facilitating the identification of catalytic residues from protein structures. 相似文献
7.
Background
Previous studies on protein-DNA interaction mostly focused on the bound structure of DNA-binding proteins but few paid enough attention to the unbound structures. As more new proteins are discovered, it is useful and imperative to develop algorithms for the functional prediction of unbound proteins. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and extract useful statistical and geometric features, and use structural alignment and support vector machines for the prediction of unbound DNA-binding proteins.Results
The performance of our method is evaluated by discriminating a set of 104 DNA-binding proteins from 401 non-DNA-binding proteins. In the same test, the proposed method outperforms the other method using conditional probability. The results achieved by our proposed method for; precision, 83.33%; accuracy, 86.53%; and MCC, 0.5368 demonstrate its good performance.Conclusions
In this study we develop an effective method for the prediction of protein-DNA interactions based on statistical and geometric features and support vector machines. Our results show that interface surface features play an important role in protein-DNA interaction. Our technique is able to predict unbound DNA-binding protein and discriminatory DNA-binding proteins from proteins that bind with other molecules.8.
9.
10.
MOTIVATION: All residues in a protein are not equally important. Some are essential for the proper structure and function of the protein, whereas others can be readily replaced. Conservation analysis is one of the most widely used methods for predicting these functionally important residues in protein sequences. RESULTS: We introduce an information-theoretic approach for estimating sequence conservation based on Jensen-Shannon divergence. We also develop a general heuristic that considers the estimated conservation of sequentially neighboring sites. In large-scale testing, we demonstrate that our combined approach outperforms previous conservation-based measures in identifying functionally important residues; in particular, it is significantly better than the commonly used Shannon entropy measure. We find that considering conservation at sequential neighbors improves the performance of all methods tested. Our analysis also reveals that many existing methods that attempt to incorporate the relationships between amino acids do not lead to better identification of functionally important sites. Finally, we find that while conservation is highly predictive in identifying catalytic sites and residues near bound ligands, it is much less effective in identifying residues in protein-protein interfaces. AVAILABILITY: Data sets and code for all conservation measures evaluated are available at http://compbio.cs.princeton.edu/conservation/ 相似文献
11.
12.
13.
This article describes DP-Bind, a web server for predicting DNA-binding sites in a DNA-binding protein from its amino acid sequence. The web server implements three machine learning methods: support vector machine, kernel logistic regression and penalized logistic regression. Prediction can be performed using either the input sequence alone or an automatically generated profile of evolutionary conservation of the input sequence in the form of PSI-BLAST position-specific scoring matrix (PSSM). PSSM-based kernel logistic regression achieves the accuracy of 77.2%, sensitivity of 76.4% and specificity of 76.6%. The outputs of all three individual methods are combined into a consensus prediction to help identify positions predicted with high level of confidence. AVAILABILITY: Freely available at http://lcg.rit.albany.edu/dp-bind. SUPPLEMENTARY INFORMATION: http://lcg.rit.albany.edu/dp-bind/dpbind_supplement.html. 相似文献
14.
15.
Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO (http://bioinfo.ggc.org/seesumo/), for sequence-based prediction of protein sumoylation sites. 相似文献
16.
An algorithm was derived to relate the amino acid sequence of a collagen triple helix to its thermal stability. This calculation is based on the triple helical stabilization propensities of individual residues and their intermolecular and intramolecular interactions, as quantitated by melting temperature values of host-guest peptides. Experimental melting temperature values of a number of triple helical peptides of varying length and sequence were successfully predicted by this algorithm. However, predicted T(m) values are significantly higher than experimental values when there are strings of oppositely charged residues or concentrations of like charges near the terminus. Application of the algorithm to collagen sequences highlights regions of unusually high or low stability, and these regions often correlate with biologically significant features. The prediction of stability from sequence indicates an understanding of the major forces maintaining this protein motif. The use of highly favorable KGE and KGD sequences is seen to complement the stabilizing effects of imino acids in modulating stability and may become dominant in the collagenous domains of bacterial proteins that lack hydroxyproline. The effect of single amino acid mutations in the X and Y positions can be evaluated with this algorithm. An interactive collagen stability calculator based on this algorithm is available online. 相似文献
17.
18.
Background
The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. 相似文献19.
In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information. 相似文献
20.
Zinc finger proteins interact via their individual fingers to three base pair subsites on the target DNA. The four key residue positions -1, 2, 3 and 6 on the alpha-helix of the zinc fingers have hydrogen bond interactions with the DNA. Mutating these key residues enables generation of a plethora of combinatorial possibilities that can bind to any DNA stretch of interest. Exploiting the binding specificity and affinity of the interaction between the zinc fingers and the respective DNA can help to generate engineered zinc fingers for therapeutic purposes involving genome targeting. Exploring the structure-function relationships of the existing zinc finger-DNA complexes can aid in predicting the probable zinc fingers that could bind to any target DNA. Computational tools ease the prediction of such engineered zinc fingers by effectively utilizing information from the available experimental data. A study of literature reveals many approaches for predicting DNA-binding specificity in zinc finger proteins. However, an alternative approach that looks into the physico-chemical properties of these complexes would do away with the difficulties of designing unbiased zinc fingers with the desired affinity and specificity. We present a physico-chemical approach that exploits the relative strengths of hydrogen bonding between the target DNA and all combinatorially possible zinc fingers to select the most optimum zinc finger protein candidate. 相似文献