期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Functional restraints on the patterns of amino acid substitutions: application to sequence-structure homology recognition

Chelliah V Blundell T Mizuguchi K 《Proteins》2005,61(4):722-731

Amino acid residues that are involved in functional interactions in proteins have strong evolutionary pressure to remain unchanged and consequently their substitution patterns are different from those that are noninteracting. To characterize and quantify the differences between amino acid substitution patterns due to structural restraints and those under functional restraints, we have made a comparative analysis of families of homologous proteins. Residues classified as having the same amino acid type, secondary structure, accessibility, and side-chain hydrogen bonds are shown to be better conserved if they are close to the active site. We have focused on enzyme families for this analysis since they have functional sites that are easily defined by their catalytic residues. We have derived new sets of environment-specific substitution tables, which we term function-dependent environment-specific substitution tables, where amino acid residues are classified according to their distance from the functional sites. The residues that are within a distance of 9 A from the active site have distinct amino acid substitution patterns when compared to the other sites. The function-dependent environment-specific substitution tables have been tested using the sequence-structure homology recognition program FUGUE and the results compared with the recognition performance obtained using the standard environment-specific substitution tables. Significant improvements are obtained in both recognition performance and alignment accuracy using the function-dependent environment-specific substitution tables (P-value = 0.02, according to the Wilcoxon signed rank test for alignment accuracy). The alignments near the active site are greatly improved with pronounced improvements at lower percentage identities (less than 30%). 相似文献

2.

Functional divergence and comparative in‐silico study of Cas4 proteins of DUF83 class

下载免费PDF全文

Vineeta Kaushik Ved Vrat Verma Manisha Goel 《Journal of molecular recognition : JMR》2018,31(5)

Clustered Regularly Interspaced Short Palindromic Repeats‐ C RISPR a ssociated (CRISPR‐Cas) systems present in genomes of bacteria and archaea have been the focus of many research studies recently. The Cas4 proteins of these systems are thought to be responsible for the adaptation step in the CRISPR mechanism. Cas4 proteins exhibit low sequence similarity among themselves and are currently classified into 2 main classes: DUF83 and DUF911. The characteristic features of Cas4 proteins belonging to DUF83 class have been elucidated by determining the structures of Cas4 protein from Sulfolobus solfataricus and Pyrobaculum calidifontis. Although, both Cas4 proteins characterized structurally are of same DUF83 class, these 2 proteins do exhibit significant biochemical and functional differences. The aim of the present study was to explore the structural and evolutionary features responsible for these differences. Our study predicts residues which might be responsible for such differences. Functional divergence analysis was used to predict sites exhibiting type I divergence, where certain amino acids are conserved in 1 clade whereas the same site is highly variable in the other clade. Our intra‐molecular interaction analysis reinforces the influence of such divergence sites on the other functionally important amino acids. In general, this study identifies some of the divergence hotspots that could be the focus of future experimental studies for better understanding of Cas4 enzymatic activity in CRISPR mechanism. 相似文献

3.

Understanding the functional difference between growth arrest-specific protein 6 and protein S: an evolutionary approach

Romain A. Studer Fred R. Opperdoes Gerry A. F. Nicolaes André B. Mulder René Mulder 《Open biology》2014,4(10)

Although protein S (PROS1) and growth arrest-specific protein 6 (GAS6) proteins are homologous with a high degree of structural similarity, they are functionally different. The objectives of this study were to identify the evolutionary origins from which these functional differences arose. Bioinformatics methods were used to estimate the evolutionary divergence time and to detect the amino acid residues under functional divergence between GAS6 and PROS1. The properties of these residues were analysed in the light of their three-dimensional structures, such as their stability effects, the identification of electrostatic patches and the identification potential protein–protein interaction. The divergence between GAS6 and PROS1 probably occurred during the whole-genome duplications in vertebrates. A total of 78 amino acid sites were identified to be under functional divergence. One of these sites, Asn463, is involved in N-glycosylation in GAS6, but is mutated in PROS1, preventing this post-translational modification. Sites experiencing functional divergence tend to express a greater diversity of stabilizing/destabilizing effects than sites that do not experience such functional divergence. Three electrostatic patches in the LG1/LG2 domains were found to differ between GAS6 and PROS1. Finally, a surface responsible for protein–protein interactions was identified. These results may help researchers to analyse disease-causing mutations in the light of evolutionary and structural constraints, and link genetic pathology to clinical phenotypes. 相似文献

4.

Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking

Aloy P Querol E Aviles FX Sternberg MJ 《Journal of molecular biology》2001,311(2):395-408

A major problem in genome annotation is whether it is valid to transfer the function from a characterised protein to a homologue of unknown activity. Here, we show that one can employ a strategy that uses a structure-based prediction of protein functional sites to assess the reliability of functional inheritance. We have automated and benchmarked a method based on the evolutionary trace approach. Using a multiple sequence alignment, we identified invariant polar residues, which were then mapped onto the protein structure. Spatial clusters of these invariant residues formed the predicted functional site. For 68 of 86 proteins examined, the method yielded information about the observed functional site. This algorithm for functional site prediction was then used to assess the validity of transferring the function between homologues. This procedure was tested on 18 pairs of homologous proteins with unrelated function and 70 pairs of proteins with related function, and was shown to be 94 % accurate. This automated method could be linked to schemes for genome annotation. Finally, we examined the use of functional site prediction in protein-protein and protein-DNA docking. The use of predicted functional sites was shown to filter putative docked complexes with a discrimination similar to that obtained by manually including biological information about active sites or DNA-binding residues. 相似文献

5.

Interactions in native binding sites cause a large change in protein dynamics

Ming D Wall ME 《Journal of molecular biology》2006,358(1):213-223

Cellular functions are regulated by molecules that interact with proteins and alter their activities. To enable such control, protein activity, and therefore protein conformational distributions, must be susceptible to alteration by molecular interactions at functional sites. Here we investigate whether interactions at functional sites cause a large change in the protein conformational distribution. We apply a computational method, called dynamics perturbation analysis (DPA), to identify sites at which interactions have a large allosteric potential D(x), which is the Kullback-Leibler divergence between protein conformational distributions with and without an interaction. In DPA, a protein is decorated with surface points that interact with neighboring protein atoms, and D(x) is calculated for each of the points in a coarse-grained model of protein vibrations. We use DPA to examine hundreds of protein structures from a standard small-molecule docking test set, and find that ligand-binding sites have elevated values of D(x): for 95% of proteins, the probability of randomly obtaining values as high as those in the binding site is 10(-3) or smaller. We then use DPA to develop a computational method to predict functional sites in proteins, and find that the method accurately predicts ligand-binding-site residues for proteins in the test set. The performance of this method compares favorably with that of a cleft analysis method. The results confirm that interactions at small-molecule binding sites cause a large change in the protein conformational distribution, and motivate using DPA for large-scale prediction of functional sites in proteins. They also suggest that natural selection favors proteins whose activities are capable of being regulated by molecular interactions. 相似文献

6.

Lipid-binding surfaces of membrane proteins: evidence from evolutionary and structural analysis

Adamian L Naveed H Liang J 《Biochimica et biophysica acta》2011,1808(4):1092-1102

Membrane proteins function in the diverse environment of the lipid bilayer. Experimental evidence suggests that some lipid molecules bind tightly to specific sites on the membrane protein surface. These lipid molecules often act as co-factors and play important functional roles. In this study, we have assessed the evolutionary selection pressure experienced at lipid-binding sites in a set of α-helical and β-barrel membrane proteins using posterior probability analysis of the ratio of synonymous vs. nonsynonymous substitutions (ω-ratio). We have also carried out a geometric analysis of the membrane protein structures to identify residues in close contact with co-crystallized lipids. We found that residues forming cholesterol-binding sites in both β(2)-adrenergic receptor and Na(+)-K(+)-ATPase exhibit strong conservation, which can be characterized by an expanded cholesterol consensus motif for GPCRs. Our results suggest the functional importance of aromatic stacking interactions and interhelical hydrogen bonds in facilitating protein-cholesterol interactions, which is now reflected in the expanded motif. We also find that residues forming the cardiolipin-binding site in formate dehydrogenase-N γ-subunit and the phosphatidylglycerol binding site in KcsA are under strong purifying selection pressure. Although the lipopolysaccharide (LPS)-binding site in ferric hydroxamate uptake receptor (FhuA) is only weakly conserved, we show using a statistical mechanical model that LPS binds to the least stable FhuA β-strand and protects it from the bulk lipid. Our results suggest that specific lipid binding may be a general mechanism employed by β-barrel membrane proteins to stabilize weakly stable regions. Overall, we find that the residues forming specific lipid binding sites on the surfaces of membrane proteins often experience strong purifying selection pressure. 相似文献

7.

Functional divergence and catalytic properties of dehydroascorbate reductase family proteins from Populus tomentosa

Zhen-Xin Tang Hai-Ling Yang 《Molecular biology reports》2013,40(8):5105-5114

Dehydroascorbate reductase (DHAR) is a key enzyme in the ascorbate–glutathione cycle that maintains reduced pools of ascorbic acid and serves as an important antioxidant. In this study, to investigate functional divergence of plant DHAR family and catalytic characteristics of the glutathione binding site (G-site) residues of DHAR proteins, we cloned three DHAR genes (PtoDHAR1/2/3) from Populus tomentosa and predicted the G-site residues. PtoDHAR1 protein was localized in chloroplast, while PtoDHAR2/3 proteins showed cytosolic localizations. Three DHAR proteins showed different enzymatic activities, apparent kinetic characteristics, optimum T _m and pH profiles, indicating their functional divergence. Cys20, Lys8, Pro61, Asp72 and Ser73 of PtoDHAR2 were predicted as G-site residues based on their N-terminal amino acid sequence identity and the available crystal structures of glutathione S-transferases. The biochemical functions of these residues are examined in this study through site-directed mutagenesis. The aforementioned five residues are critical components of active sites that contribute to the enzyme’s catalytic activity. Cys20, Pro61 and Asp72 of PtoDHAR2 are also responsible for maintaining proper protein structure. This study provides new insights into the functional divergence of the plant DHAR family and biochemical properties of the G-site residues in plant DHAR proteins. 相似文献

8.

Robust recognition of zinc binding sites in proteins

Ebert JC Altman RB 《Protein science : a publication of the Protein Society》2008,17(1):54-65

Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals. 相似文献

9.

Statistical methods for testing functional divergence after gene duplication 总被引：11，自引：0，他引：11

Gu X 《Molecular biology and evolution》1999,16(12):1664-1674

Functional innovations after gene duplication may result in altered functional constraints between member gene clusters of a gene family. This type (type I) of functional divergence is measured by the coefficient of functional divergence (theta lambda), which can be interpreted as the decrease in rate correlation between gene clusters, or the probability that the evolutionary rate at a site is statistically independent between two gene clusters. A simple stochastic model has been developed for estimating theta lambda and testing its statistical significance. The current model includes the model of rate variation among sites as a special case when theta lambda = 0. Moreover, we have developed a site-specific profile based on the hidden Markov model to identify critical amino acid residues that are responsible for these functional differences between two gene clusters, which may have great potential in functional genomics. 相似文献

10.

Prediction of interaction sites from apo 3D structures when the holo conformation is different

Murga LF Ondrechen MJ Ringe D 《Proteins》2008,72(3):980-992

The predictability of catalytic and binding sites from apo structures is addressed for proteins that undergo significant conformational change upon binding. Theoretical microscopic titration curves (THEMATICS), an electrostatics-based method for the prediction of functional sites, is performed on a test set of 24 proteins with both apo and holo structures available. For 23 of these 24 proteins (96%), THEMATICS predicts the correct catalytic or binding site for both the apo and holo forms. For only one of the 24 proteins, THEMATICS makes the correct prediction for the holo structure but fails for the apo structure. The metrics used by THEMATICS to identify functional residues generally are larger in absolute value for the functional residues in the holo forms compared to the corresponding residues in the apo forms. However, even in the apo forms, these identifying metrics are still statistically significantly larger for functional residues than for residues not involved in catalysis or binding. This indicates that some of the unusual electrostatic properties of functional residues are preserved in the apo conformation. Evidence is presented that certain residues immediately surrounding the active catalytic and binding residues impart functionally important chemical and electrostatic properties to the active residues. At least parts of these microenvironments exist in the unbound conformations, such that THEMATICS is able to distinguish the functional residues even in the apo structures. 相似文献

11.

Establishment of distinct MyoD, E2A, and twist DNA binding specificities by different basic region-DNA conformations

下载免费PDF全文

Kophengnavong T Michnowicz JE Blackwell TK 《Molecular and cellular biology》2000,20(1):261-272

Basic helix-loop-helix (bHLH) proteins perform a wide variety of biological functions. Most bHLH proteins recognize the consensus DNA sequence CAN NTG (the E-box consensus sequence is underlined) but acquire further functional specificity by preferring distinct internal and flanking bases. In addition, induction of myogenesis by MyoD-related bHLH proteins depends on myogenic basic region (BR) and BR-HLH junction residues that are not essential for binding to a muscle-specific site, implying that their BRs may be involved in other critical interactions. We have investigated whether the myogenic residues influence DNA sequence recognition and how MyoD, Twist, and their E2A partner proteins prefer distinct CAN NTG sites. In MyoD, the myogenic BR residues establish specificity for particular CAN NTG sites indirectly, by influencing the conformation through which the BR helix binds DNA. An analysis of DNA binding by BR and junction mutants suggests that an appropriate BR-DNA conformation is necessary but not sufficient for myogenesis, supporting the model that additional interactions with this region are important. The sequence specificities of E2A and Twist proteins require the corresponding BR residues. In addition, mechanisms that position the BR allow E2A to prefer distinct half-sites as a heterodimer with MyoD or Twist, indicating that the E2A BR can be directed toward different targets by dimerization with different partners. Our findings indicate that E2A and its partner bHLH proteins bind to CAN NTG sites by adopting particular preferred BR-DNA conformations, from which they derive differences in sequence recognition that can be important for functional specificity. 相似文献

12.

Resolving protein structure‐function‐binding site relationships from a binding site similarity network perspective

下载免费PDF全文

Richa Mudgal Narayanaswamy Srinivasan Nagasuma Chandra 《Proteins》2017,85(7):1319-1335

Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non‐homologous protein families, leading to mis‐annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold‐function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold‐function‐binding site relationships has been systematically generated. A network‐based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one‐to‐one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly‐pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319–1335. © 2017 Wiley Periodicals, Inc. 相似文献

13.

HotPatch: a statistical approach to finding biologically relevant features on protein surfaces

Pettit FK Bare E Tsai A Bowie JU 《Journal of molecular biology》2007,369(3):863-879

We describe a fully automated algorithm for finding functional sites on protein structures. Our method finds surface patches of unusual physicochemical properties on protein structures, and estimates the patches' probability of overlapping functional sites. Other methods for predicting the locations of specific types of functional sites exist, but in previous analyses, it has been difficult to compare methods when they are applied to different types of sites. Thus, we introduce a new statistical framework that enables rigorous comparisons of the usefulness of different physicochemical properties for predicting virtually any kind of functional site. The program's statistical models were trained for 11 individual properties (electrostatics, concavity, hydrophobicity, etc.) and for 15 neural network combination properties, all optimized and tested on 15 diverse protein functions. To simulate what to expect if the program were run on proteins of unknown function, as might arise from structural genomics, we tested it on 618 proteins of diverse mixed functions. In the higher-scoring top half of all predictions, a functional residue could typically be found within the first 1.7 residues chosen at random. The program may or may not use partial information about the protein's function type as an input, depending on which statistical model the user chooses to employ. If function type is used as an additional constraint, prediction accuracy usually increases, and is particularly good for enzymes, DNA-interacting sites, and oligomeric interfaces. The program can be accessed online (at http://hotpatch.mbi.ucla.edu). 相似文献

14.

An analysis approach to identify specific functional sites in orthologous proteins using sequence and structural information: Application to neuroserpin reveals regions that differentially regulate inhibitory activity

下载免费PDF全文

Tet Woo Lee Annie Shu‐Ping Yang Thomas Brittain Nigel P. Birch 《Proteins》2015,83(1):135-152

The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution‐matrix‐based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface‐exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site‐directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. Proteins 2015; 83:135–152. © 2014 Wiley Periodicals, Inc. 相似文献

15.

SiteMotif: A graph-based algorithm for deriving structural motifs in Protein Ligand binding sites

Santhosh Sankar Nagasuma Chandra 《PLoS computational biology》2022,18(2)

Studying similarities in protein molecules has become a fundamental activity in much of biology and biomedical research, for which methods such as multiple sequence alignments are widely used. Most methods available for such comparisons cater to studying proteins which have clearly recognizable evolutionary relationships but not to proteins that recognize the same or similar ligands but do not share similarities in their sequence or structural folds. In many cases, proteins in the latter class share structural similarities only in their binding sites. While several algorithms are available for comparing binding sites, there are none for deriving structural motifs of the binding sites, independent of the whole proteins. We report the development of SiteMotif, a new algorithm that compares binding sites from multiple proteins and derives sequence-order independent structural site motifs. We have tested the algorithm at multiple levels of complexity and demonstrate its performance in different scenarios. We have benchmarked against 3 current methods available for binding site comparison and demonstrate superior performance of our algorithm. We show that SiteMotif identifies new structural motifs of spatially conserved residues in proteins, even when there is no sequence or fold-level similarity. We expect SiteMotif to be useful for deriving key mechanistic insights into the mode of ligand interaction, predict the ligand type that a protein can bind and improve the sensitivity of functional annotation. 相似文献

16.

A novel tripartite motif involved in aquaporin topogenesis, monomer folding and tetramerization

Buck TM Wagner J Grund S Skach WR 《Nature structural & molecular biology》2007,14(8):762-769

Aquaporin (AQP) folding in the endoplasmic reticulum is characterized by two distinct pathways of membrane insertion that arise from divergent residues within the second transmembrane segment. We now show that in AQP1 these residues (Asn49 and Lys51) interact with Asp185 at the C terminus of TM5 to form a polar, quaternary structural motif that influences multiple stages of folding. Asn49 and Asp185 form an intramolecular hydrogen bond needed for proper helical packing, monomer formation and function. In contrast, Lys51 interacts with Asp185 on an adjacent monomer to stabilize the AQP1 tetramer. Although these residues are unique to AQP1, they share a highly conserved architecture whose functional properties can be transferred to other family members. These findings suggest a general mechanism by which evolutionary divergence of membrane proteins can confer new functional properties via alternative folding pathways that give rise to a common final structure. 相似文献

17.

Identification of functional subclasses in the DJ-1 superfamily proteins

下载免费PDF全文

Wei Y Ringe D Wilson MA Ondrechen MJ 《PLoS computational biology》2007,3(1):e10

Genomics has posed the challenge of determination of protein function from sequence and/or 3-D structure. Functional assignment from sequence relationships can be misleading, and structural similarity does not necessarily imply functional similarity. Proteins in the DJ-1 family, many of which are of unknown function, are examples of proteins with both sequence and fold similarity that span multiple functional classes. THEMATICS (theoretical microscopic titration curves), an electrostatics-based computational approach to functional site prediction, is used to sort proteins in the DJ-1 family into different functional classes. Active site residues are predicted for the eight distinct DJ-1 proteins with available 3-D structures. Placement of the predicted residues onto a structural alignment for six of these proteins reveals three distinct types of active sites. Each type overlaps only partially with the others, with only one residue in common across all six sets of predicted residues. Human DJ-1 and YajL from Escherichia coli have very similar predicted active sites and belong to the same probable functional group. Protease I, a known cysteine protease from Pyrococcus horikoshii, and PfpI/YhbO from E. coli, a hypothetical protein of unknown function, belong to a separate class. THEMATICS predicts a set of residues that is typical of a cysteine protease for Protease I; the prediction for PfpI/YhbO bears some similarity. YDR533Cp from Saccharomyces cerevisiae, of unknown function, and the known chaperone Hsp31 from E. coli constitute a third group with nearly identical predicted active sites. While the first four proteins have predicted active sites at dimer interfaces, YDR533Cp and Hsp31 both have predicted sites contained within each subunit. Although YDR533Cp and Hsp31 form different dimers with different orientations between the subunits, the predicted active sites are superimposable within the monomer structures. Thus, the three predicted functional classes form four different types of quaternary structures. The computational prediction of the functional sites for protein structures of unknown function provides valuable clues for functional classification. 相似文献

18.

Text mining improves prediction of protein functional sites

Verspoor KM Cohn JD Ravikumar KE Wall ME 《PloS one》2012,7(2):e32171

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. 相似文献

19.

Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions

Bhardwaj N Lu H 《FEBS letters》2007,581(5):1058-1066

Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking. 相似文献

20.

From fold predictions to function predictions: automation of functional site conservation analysis for functional genome predictions.

下载免费PDF全文

B. Zhang L. Rychlewski K. Paw&#x;owski J. S. Fetrow J. Skolnick A. Godzik 《Protein science : a publication of the Protein Society》1999,8(5):1104-1115

相似文献