共查询到20条相似文献,搜索用时 15 毫秒
1.
The Medium-Chain Dehydrogenase/Reductase Engineering Database (MDRED, http://www.mdred.uni-stuttgart.de) has been established to serve as an analysis tool for a systematic investigation of sequence-structure-function relationships. It includes sequence and structure information of 2684 and 42 medium-chain dehydrogenases/reductases (MDRs), respectively. Although MDRs are very diverse in sequence, they have a conserved tertiary structure. MDRs are assigned to 199 homologous families and 29 superfamilies. For each family, annotated multiple sequence alignments are provided, and functionally relevant residues are annotated. Twenty-five superfamilies were classified as zinc-containing MDRs, four as non-zinc-containing MDRs. For the zinc-containing MDRs, three subclasses were identified by systematic analysis of a variable loop region, the quaternary structure determining loop (QSDL): the class of short, medium, and long QSDL, which include 11, 3, and 5 superfamilies, respectively. The length of the QSDL is predictive for tetramer (short QSDL) and dimer (long QSDL) formation. The class of medium QSDL includes both tetrameric and dimeric MDRs. The shape of the substrate-binding site is highly conserved in all zinc-containing MDRs with the exception of two variable regions, the substrate recognition sites (SRS): two residues located on the QSDL (SRS1) and, for the class of long QSDL, one residue located in the catalytic domain (SRS2). The MDRED is the first online-accessible resource of MDRs that integrates information on sequence, structure, and function. Annotation of functionally relevant residues assist the understanding of sequence-structure-function relationships. Thus, the MDRED serves as a valuable tool to identify potential hotspots for engineering properties such as substrate specificity. 相似文献
2.
Wu Y Lu M Chen M Li J Ma J 《Protein science : a publication of the Protein Society》2007,16(7):1449-1463
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling. 相似文献
3.
The DOcking decoy‐based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance‐dependent atom‐pair interactions. To optimize the atom‐pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand–receptor systems (or just pairs). Thus, a total of 8609 ligand–receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand–receptor systems, 1000 evenly sampled docking decoys with 0–10 Å interface root‐mean‐square‐deviation (iRMSD) were generated with a method used before for protein–protein docking. A neural network‐based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel‐like energy landscape for the interaction between these hypothetical ligand–receptor systems. Thus, our method hierarchically models the overall funnel‐like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom‐pair‐based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation‐dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand–receptor systems and their decoys are freely available at http://agknapp.chemie.fu‐berlin.de/doop/ . Proteins 2015; 83:881–890. © 2015 Wiley Periodicals, Inc. 相似文献
4.
One strategy for ab initio protein structure prediction is to generate a large number of possible structures (decoys) and select the most fitting ones based on a scoring or free energy function. The conformational space of a protein is huge, and chances are rare that any heuristically generated structure will directly fall in the neighborhood of the native structure. It is desirable that, instead of being thrown away, the unfitting decoy structures can provide insights into native structures so prediction can be made progressively. First, we demonstrate that a recently parameterized physics-based effective free energy function based on the GROMOS96 force field and a generalized Born/surface area solvent model is, as several other physics-based and knowledge-based models, capable of distinguishing native structures from decoy structures for a number of widely used decoy databases. Second, we observe a substantial increase in correlations of the effective free energies with the degree of similarity between the decoys and the native structure, if the similarity is measured by the content of native inter-residue contacts in a decoy structure rather than its root-mean-square deviation from the native structure. Finally, we investigate the possibility of predicting native contacts based on the frequency of occurrence of contacts in decoy structures. For most proteins contained in the decoy databases, a meaningful amount of native contacts can be predicted based on plain frequencies of occurrence at a relatively high level of accuracy. Relative to using plain frequencies, overwhelming improvements in sensitivity of the predictions are observed for the 4_state_reduced decoy sets by applying energy-dependent weighting of decoy structures in determining the frequency. There, approximately 80% native contacts can be predicted at an accuracy of approximately 80% using energy-weighted frequencies. The sensitivity of the plain frequency approach is much lower (20% to 40%). Such improvements are, however, not observed for the other decoy databases. The rationalization and implications of the results are discussed. 相似文献
5.
6.
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three‐dimensional context in proteins. We have constructed a knowledge‐based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon‐alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α‐carbon virtual bond opening and dihedral angles, pair‐wise contacts and hydrogen bond donor‐acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α‐carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full‐length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. Proteins 2009. © 2008 Wiley‐Liss, Inc. 相似文献
7.
We investigated the role of W140 in the folding of Staphylococcal nuclease. For this purpose, we constructed the 19 possible substitution mutations at residue 140. Only three mutants, W140F, W140H, and W140Y, adopted native-like structures under physiological conditions and showed native-like enzymatic activities. In contrast, the other 16 mutants took on compact unfolded structures under physiological conditions and the enzymatic activities of these mutants were decreased to approximately 70% of wild-type levels. These 16 mutants maintained substrate-induced foldability. These results strongly indicate that the side-chain information encoded by residue 140 is essential to maintain a stable native structure, and that this residue must be an aromatic side chain. The order of thermal stability was wild type > W140H > W140F = W140Y. Therefore, the five-membered nitrogen-containing ring of the indole is thought to bear the essential information. In the crystal structure of staphylococcal nuclease, the five-membered ring is at the local center of the C-terminal cluster through hydrophobic interactions. This cluster plays a key role in the interaction connecting the C-terminal region and the N-terminal beta-core. Mutants other than W140H, W140F, and W140Y lost the ability to form the local core, which caused the loss of the long-range interactions between the C-terminal and N-terminal regions. Inhibitor or substrate binding to these mutants compensates for the lack of long-range interactions generated by W140. 相似文献
8.
Taylor WR 《Journal of molecular biology》2006,357(2):676-699
A method is described to construct sets of decoy models that can be used to generate a background score distribution for protein structure comparison. The models are derived directly from the two proteins being compared and retain all the essential properties of the structures, including length, density, shape and secondary structure composition but have different folds. As each comparison involves a pair of proteins of the same length, no explicit normalisation is required to adjust for the length of the proteins being compared. This allows substructure (or domain) matches to score almost equally to the comparison of isolated domains. A normalised probability measure was derived that allows joint family/family comparison. The method was applied to some of the CASP6 models for targets with new folds. 相似文献
9.
In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change (\"the evolutionary plasticity of structure\") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core. 相似文献
10.
Gilberto Sánchez‐González Jae‐Kwan Kim Deok‐Soo Kim Ramón Garduño‐Juárez 《Proteins》2013,81(8):1420-1433
We present a new four‐body knowledge‐based potential for recognizing the native state of proteins from their misfolded states. This potential was extracted from a large set of protein structures determined by X‐ray crystallography using BetaMol, a software based on the recent theory of the beta‐complex (β‐complex) and quasi‐triangulation of the Voronoi diagram of spheres. This geometric construct reflects the size difference among atoms in their full Euclidean metric; property not accounted for in a typical 3D Delaunay triangulation. The ability of this potential to identify the native conformation over a large set of decoys was evaluated. Experiments show that this potential outperforms a potential constructed with a classical Delaunay triangulation in decoy discrimination tests. The addition of a statistical hydrogen bond potential to our four‐body potential allows a significant improvement in the decoy discrimination, in such a way that we are able to predict successfully the native structure in 90% of cases. Proteins 2013; 81:1420–1433. © 2013 Wiley Periodicals, Inc. 相似文献
11.
Coarse‐grained models for protein structure are increasingly used in simulations and structural bioinformatics. In this study, we evaluated the effectiveness of three granularities of protein representation based on their ability to discriminate between correctly folded native structures and incorrectly folded decoy structures. The three levels of representation used one bead per amino acid (coarse), two beads per amino acid (medium), and all atoms (fine). Multiple structure features were compared at each representation level including two‐body interactions, three‐body interactions, solvent exposure, contact numbers, and angle bending. In most cases, the all‐atom level was most successful at discriminating decoys, but the two‐bead level provided a good compromise between the number of model parameters which must be estimated and the accuracy achieved. The most effective feature type appeared to be two‐body interactions. Considering three‐body interactions increased accuracy only marginally when all atoms were used and not at all in medium and coarse representations. Though two‐body interactions were most effective for the coarse representations, the accuracy loss for using only solvent exposure or contact number was proportionally less at these levels than in the all‐atom representation. We propose an optimization method capable of selecting bead types of different granularities to create a mixed representation of the protein. We illustrate its behavior on decoy discrimination and discuss implications for data‐driven protein model selection. Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献
12.
We developed a new representation of local amino acid environments in protein structures called the Side‐chain Depth Environment (SDE). An SDE defines a local structural environment of a residue considering the coordinates and the depth of amino acids that locate in the vicinity of the side‐chain centroid of the residue. SDEs are general enough that similar SDEs are found in protein structures with globally different folds. Using SDEs, we developed a procedure called PRESCO (Protein Residue Environment SCOre) for selecting native or near‐native models from a pool of computational models. The procedure searches similar residue environments observed in a query model against a set of representative native protein structures to quantify how native‐like SDEs in the model are. When benchmarked on commonly used computational model datasets, our PRESCO compared favorably with the other existing scoring functions in selecting native and near‐native models. Proteins 2014; 82:3255–3272. © 2014 Wiley Periodicals, Inc. 相似文献
13.
In this report, we demonstrate that phylogenetic motifs, sequence regions conserving the overall familial phylogeny, represent a promising approach to protein functional site prediction. Across our structurally and functionally heterogeneous data set, phylogenetic motifs consistently correspond to functional sites defined by both surface loops and active site clefts. Additionally, the partially buried prosthetic group regions of cytochrome P450 and succinate dehydrogenase are identified as phylogenetic motifs. In nearly all instances, phylogenetic motifs are structurally clustered, despite little overall sequence proximity, around key functional site features. Based on calculated false-positive expectations and standard motif identification methods, we show that phylogenetic motifs are generally conserved in sequence. This result implies that they can be considered motifs in the traditional sense as well. However, there are instances where phylogenetic motifs are not (overall) well conserved in sequence. This point is enticing, because it implies that phylogenetic motifs are able to identify key sequence regions that traditional motif-based approaches would not. Further, phylogenetic motif results are also shown to be consistent with evolutionary trace results, and bootstrapping is used to demonstrate tree significance. 相似文献
14.
Fabian Dey Qiangfeng Cliff Zhang Donald Petrey Barry Honig 《Protein science : a publication of the Protein Society》2013,22(4):359-366
We outline a set of strategies to infer protein function from structure. The overall approach depends on extensive use of homology modeling, the exploitation of a wide range of global and local geometric relationships between protein structures and the use of machine learning techniques. The combination of modeling with broad searches of protein structure space defines a “structural BLAST” approach to infer function with high genomic coverage. Applications are described to the prediction of protein–protein and protein–ligand interactions. In the context of protein–protein interactions, our structure‐based prediction algorithm, PrePPI, has comparable accuracy to high‐throughput experiments. An essential feature of PrePPI involves the use of Bayesian methods to combine structure‐derived information with non‐structural evidence (e.g. co‐expression) to assign a likelihood for each predicted interaction. This, combined with a structural BLAST approach significantly expands the range of applications of protein structure in the annotation of protein function, including systems level biological applications where it has previously played little role. 相似文献
15.
The conformations of loops are determined by the water-mediated interactions between amino acid residues. Energy functions that describe the interactions can be derived either from physical principles (physical-based energy function) or statistical analysis of known protein structures (knowledge-based statistical potentials). It is commonly believed that statistical potentials are appropriate for coarse-grained representation of proteins but are not as accurate as physical-based potentials when atomic resolution is required. Several recent applications of physical-based energy functions to loop selections appear to support this view. In this article, we apply a recently developed DFIRE-based statistical potential to three different loop decoy sets (RAPPER, Jacobson, and Forrest-Woolf sets). Together with a rotamer library for side-chain optimization, the performance of DFIRE-based potential in the RAPPER decoy set (385 loop targets) is comparable to that of AMBER/GBSA for short loops (two to eight residues). The DFIRE is more accurate for longer loops (9 to 12 residues). Similar trend is observed when comparing DFIRE with another physical-based OPLS/SGB-NP energy function in the large Jacobson decoy set (788 loop targets). In the Forrest-Woolf decoy set for the loops of membrane proteins, the DFIRE potential performs substantially better than the combination of the CHARMM force field with several solvation models. The results suggest that a single-term DFIRE-statistical energy function can provide an accurate loop prediction at a fraction of computing cost required for more complicate physical-based energy functions. A Web server for academic users is established for loop selection at the softwares/services section of the Web site http://theory.med.buffalo.edu/. 相似文献
16.
17.
Antigenic peptides bind to major histocompatibility complex (MHC) molecules as a prerequisite for their presentation to T cells. In this study, we investigate possible structural preferences of MHC-binding peptides by examining the conformation space defined by the structures of these peptides within their native source proteins. Comparison of the conformation space of the native structures of MHC-binding nonamers and a corresponding conformation space defined by a random set of nonamers showed no significant difference. This suggests that the environment of the MHC binding groove has evolved to bind peptides with essentially any \"structural background.\" A slight tendency for an extended beta-conformation at positions 8 and 9 was observed for the set of native structures. We suggest that such a preference may facilitate the binding of the C-terminal anchor position of processed peptides into the corresponding specificity pocket. MHC-binding peptides represent examples of short subsequences that are present in two different structural environments: within their native protein and within the MHC binding groove. Comparison of the native and of the bound structure of the peptides showed that peptides up to 14 residues long may adopt different conformations within different protein environments. This has direct implications for structure prediction algorithms. 相似文献
18.
Computational prediction of protein structures is a difficult task, which involves fast and accurate evaluation of candidate model structures. We propose to enhance single‐model quality assessment with a functionality evaluation phase for proteins whose quantitative functional characteristics are known. In particular, this idea can be applied to evaluation of structural models of ion channels, whose main function ‐ conducting ions ‐ can be quantitatively measured with the patch‐clamp technique providing the current–voltage characteristics. The study was performed on a set of KcsA channel models obtained from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for calculating the current–voltage characteristics of structural models. We found that the computed charge selectivity and total current were sensitive to structural and electrostatic quality of models. In practical terms, we show that evaluating predicted conductance values is an appropriate method to eliminate models with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in highly accurate models. Tests on three other ion channels indicate that, in addition to being a proof of the concept, our function‐oriented single‐model quality assessment method can be directly applied to evaluation of structural models of some classes of protein channels. Finally, our work raises an important question whether a computational validation of functionality should be included in the evaluation process of structural models, whenever possible. Proteins 2016; 84:217–231. © 2015 Wiley Periodicals, Inc. 相似文献
19.
Taylor WR Jones DT Sadowski MI 《Protein science : a publication of the Protein Society》2012,21(2):299-305
Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone. 相似文献
20.
Tobias Olenyi Céline Marquet Michael Heinzinger Benjamin Kröger Tiha Nikolova Michael Bernhofer Philip Sändig Konstantin Schütze Maria Littmann Milot Mirdita Martin Steinegger Christian Dallago Burkhard Rost 《Protein science : a publication of the Protein Society》2023,32(1):e4524
The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha-helical and beta-barrel transmembrane segments; signal-peptides; variant effect) in seconds. The structure prediction provided by LambdaPP—leveraging ColabFold and computed in minutes—is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5. Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org , the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2 . The frontend of LambdaPP can be found on GitHub ( github.com/sacdallago/embed.predictprotein.org ), and can be freely used and distributed under the academic free use license (AFL-2). For high-throughput applications, all methods can be executed locally via the bio-embeddings ( bioembeddings.com ) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings , which also includes the backend of LambdaPP. 相似文献