首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Catalytic site structure is normally highly conserved between distantly related enzymes. As a consequence, templates representing catalytic sites have the potential to succeed at function prediction in cases where methods based on sequence or overall structure fail. There are many methods for searching protein structures for matches to structural templates, but few validated template libraries to use with these methods. We present a library of structural templates representing catalytic sites, based on information from the scientific literature. Furthermore, we analyse homologous template families to discover the diversity within families and the utility of templates for active site recognition. Templates representing the catalytic sites of homologous proteins mostly differ by less than 1A root mean square deviation, even when the sequence similarity between the two proteins is low. Within these sets of homologues there is usually no discernible relationship between catalytic site structure similarity and sequence similarity. Because of this structural conservation of catalytic sites, the templates can discriminate between matches to related proteins and random matches with over 85% sensitivity and predictive accuracy. Templates based on protein backbone positions are more discriminating than those based on side-chain atoms. These analyses show encouraging prospects for prediction of functional sites in structural genomics structures of unknown function, and will be of use in analyses of convergent evolution and exploring relationships between active site geometry and chemistry. The template library can be queried via a web server at and is available for download.  相似文献   

2.
Summary One of the key ingredients in drug discovery is the derivation of conceptual templates called pharmacophores. A pharmacophore model characterizes the physicochemical properties common to all active molecules, called ligands, bound to a particular protein receptor, together with their relative spatial arrangement. Motivated by this important application, we develop a Bayesian hierarchical model for the derivation of pharmacophore templates from multiple configurations of point sets, partially labeled by the atom type of each point. The model is implemented through a multistage template hunting algorithm that produces a series of templates that capture the geometrical relationship of atoms matched across multiple configurations. Chemical information is incorporated by distinguishing between atoms of different elements, whereby different elements are less likely to be matched than atoms of the same element. We illustrate our method through examples of deriving templates from sets of ligands that all bind structurally related protein active sites and show that the model is able to retrieve the key pharmacophore features in two test cases.  相似文献   

3.
Structural templates are 3D signatures representing protein functional sites, such as ligand binding cavities, metal coordination motifs, or catalytic sites. Here we explore methods to generate template libraries and algorithms to query structures for conserved 3D motifs. Applications of templates are discussed, as well as some exemplar cases for examining evolutionary links in enzymes. We also introduce the concept of using more than one template per structure to represent flexible sites, as an approach to better understand catalysis through snapshots captured in enzyme structures. Functional annotation from structure is an important topic that has recently resurfaced due to the new more accurate methods of protein structure prediction. Therefore, we anticipate that template‐based functional site detection will be a powerful tool in the task of characterizing a vast number of new protein models.  相似文献   

4.
Peng J  Xu J 《Proteins》2011,79(6):1930-1939
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.  相似文献   

5.
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using Modeller, we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.  相似文献   

6.
Protein function prediction using local 3D templates   总被引:8,自引:0,他引:8  
The prediction of a protein's function from its 3D structure is becoming more and more important as the worldwide structural genomics initiatives gather pace and continue to solve 3D structures, many of which are of proteins of unknown function. Here, we present a methodology for predicting function from structure that shows great promise. It is based on 3D templates that are defined as specific 3D conformations of small numbers of residues. We use four types of template, covering enzyme active sites, ligand-binding residues, DNA-binding residues and reverse templates. The latter are templates generated from the target structure itself and scanned against a representative subset of all known protein structures. Together, the templates provide a fairly thorough coverage of the known structures and ensure that if there is a match to a known structure it is unlikely to be missed. A new scoring scheme provides a highly sensitive means of discriminating between true positive and false positive template matches. In all, the methodology provides a powerful new tool for function prediction to complement those already in use.  相似文献   

7.
Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons.net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons.net protein structure prediction server. AVAILABILITY AND IMPLEMENTATION: PconsM is freely available from http://pcons.net/.  相似文献   

8.
One approach to predict a protein fold from a sequence (a target) is based on structures of related proteins that are used as templates. We present an algorithm that examines a set of candidates for templates, builds from each of the templates an atomically detailed model, and ranks the models. The algorithm performs a hierarchical selection of the best model using a diverse set of signals. After a quick and suboptimal screening of template candidates from the protein data bank, the current method fine‐tunes the selection to a few models. More detailed signals test the compatibility of the sequence and the proposed structures, and are merged to give a global fitness measure using linear programming. This algorithm is a component of the prediction server LOOPP ( http://www.loopp.org ). Large‐scale training and tests sets were designed and are presented. Recent results of the LOOPP server in CASP8 are discussed. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

9.
Structure-based protein NMR assignments using native structural ensembles   总被引:1,自引:0,他引:1  
An important step in NMR protein structure determination is the assignment of resonances and NOEs to corresponding nuclei. Structure-based assignment (SBA) uses a model structure ("template") for the target protein to expedite this process. Nuclear vector replacement (NVR) is an SBA framework that combines multiple sources of NMR data (chemical shifts, RDCs, sparse NOEs, amide exchange rates, TOCSY) and has high accuracy when the template is close to the target protein's structure (less than 2 A backbone RMSD). However, a close template may not always be available. We extend the circle of convergence of NVR for distant templates by using an ensemble of structures. This ensemble corresponds to the low-frequency perturbations of the given template and is obtained using normal mode analysis (NMA). Our algorithm assigns resonances and sparse NOEs using each of the structures in the ensemble separately, and aggregates the results using a voting scheme based on maximum bipartite matching. Experimental results on human ubiquitin, using four distant template structures show an increase in the assignment accuracy. Our algorithm also improves the robustness of NVR with respect to structural noise. We provide a confidence measure for each assignment using the percentage of the structures that agree on that assignment. We use this measure to assign a subset of the peaks with even higher accuracy. We further validate our algorithm on data for two additional proteins with NVR. We then show the general applicability of our approach by applying our NMA ensemble-based voting scheme to another SBA tool, MARS. For three test proteins with corresponding templates, including the 370-residue maltose binding protein, we increase the number of reliable assignments made by MARS. Finally, we show that our voting scheme is sound and optimal, by proving that it is a maximum likelihood estimator of the correct assignments.  相似文献   

10.
Homology modeling predicts protein structures using known structures of related proteins as templates. We developed MULTIDOMAIN ASSEMBLER (MDA) to address the special problems that arise when modeling proteins with large numbers of domains, such as fibronectin with 30 domains, as well as cases with hundreds of templates. These problems include how to spatially arrange nonoverlapping template structures, and how to get the best template coverage when some sequence regions have hundreds of available structures while other regions have a few distant homologs. MDA automates the tasks of template searching, visualization, and selection followed by multidomain model generation, and is part of the widely used molecular graphics package UCSF CHIMERA (University of California, San Francisco). We demonstrate applications and discuss MDA’s benefits and limitations.  相似文献   

11.
We assume that each class of protein has a core structure that is defined by internal residues, and that the external, solvent-contacting residues contribute to the stability of the structure, are of primary importance to function, but do not determine the architecture of the core portions of the polypeptide chain. An algorithm has been developed to supply a list of permitted sequences of internal residues compatible with a known core structure. This list is referred to as the tertiary template for that structure. In general the positions in the template are not sequentially adjacent and are distributed throughout the polypeptide chain. The template is derived using the fixed positions for the main-chain and beta-carbon atoms in the test structure and selected stereochemical rules. The focus of this paper is on the use of two packing criteria: avoidance of steric overlap and complete filling of available space. The program also notes potential polar group interactions and disulfide bonds as well as possible burial of formal charges. Central to the algorithm is the side-chain rotamer library. In an update of earlier studies by others, we show that 17 of the 20 amino acids (omitting Met, Lys and Arg) can be represented adequately by 67 side-chain rotamers. A list of chi angles and their standard deviations is given. The newer, high-resolution, refined structures in the Brookhaven Protein Data Bank show similar mean chi values, but have much smaller deviations than those of earlier studies. This suggests that a rotamer library may be a better structural approximation than was previously thought. In using packing constraints, it has been found essential to include all hydrogen atoms specifically. The "unified atom" representation is not adequate. The permitted rotamer sequences are severely restricted by the main-chain plus beta-carbon atoms of the test structure. Further restriction is introduced if the full set of atoms of the external residues are held fixed, the full-chain model. The space-filling requirement has a major role in restricting the template lists. The preliminary tests reported here make it appear likely that templates prepared from the currently known core structures will be able to discriminate between these structures. The templates should thus be useful in deciding whether a sequence of unknown tertiary structure fits any of the known core classes and, if a fit is found, how the sequence should be aligned in three dimensions to fit the core of that class.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

12.

Background

Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.

Results

Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.

Conclusion

We have developed a novel multi-template algorithm to improve protein comparative modeling.  相似文献   

13.
We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER , a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of ≈ 1 Å, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein. © 1995 Wiley-Liss, Inc.  相似文献   

14.
It is well established that sequence templates (e.g., PROSITE) and databases are powerful tools for identifying biological function and tertiary structure for an unknown protein sequence. Here we describe a method for automatically deriving 3D templates from the protein structures deposited in the Brookhaven Protein Data Bank. As an example, we describe a template derived for the Ser-His-Asp catalytic triad found in the serine proteases and triacylglycerol lipases. We find that the resultant template provides a highly selective tool for automatically differentiating between catalytic and noncatalytic Ser-His-Asp associations. When applied to nonproteolytic proteins, the template picks out two "non-esterase" catalytic triads that may be of biological relevance. This suggests that the development of databases of 3D templates, such as those that currently exist for protein sequence templates, will help identify the functions of new protein structures as they are determined and pinpoint their functionally important regions.  相似文献   

15.
Moreno E  León K 《Proteins》2002,47(1):1-13
We present a new method for representing the binding site of a protein receptor that allows the use of the DOCK approach to screen large ensembles of receptor conformations for ligand binding. The site points are constructed from templates of what we called "attached points" (ATPTS). Each template (one for each type of amino acid) is composed of a set of representative points that are attached to side-chain and backbone atoms through internal coordinates, carry chemical information about their parent atoms and are intended to cover positions that might be occupied by ligand atoms when complexed to the protein. This method is completely automatic and proved to be extremely fast. With the aim of obtaining an experimental basis for this approach, the Protein Data Bank was searched for proteins in complex with small molecules, to study the geometry of the interactions between the different types of protein residues and the different types of ligand atoms. As a result, well-defined patterns of interaction were obtained for most amino acids. These patterns were then used for constructing a set of templates of attached points, which constitute the core of the ATPTS approach. The quality of the ATPTS representation was demonstrated by using this method, in combination with the DOCK matching and orientation algorithms, to generate correct ligand orientations for >1000 protein--ligand complexes.  相似文献   

16.
Certain protein‐design calculations involve using an experimentally determined high‐resolution structure as a template to identify new sequences that can adopt the same fold. This approach has led to the successful design of many novel, well‐folded, native‐like proteins. Although any atomic‐resolution structure can serve as a template in such calculations, most successful designs have used high‐resolution crystal structures. Because there are many proteins for which crystal structures are not available, it is of interest whether nuclear magnetic resonance (NMR) templates are also appropriate. We have analyzed differences between using X‐ray and NMR templates in side‐chain repacking and design calculations. We assembled a database of 29 proteins for which both a high‐resolution X‐ray structure and an ensemble of NMR structures are available. Using these pairs, we compared the rotamericity, χ1‐angle recovery, and native‐sequence recovery of X‐ray and NMR templates. We carried out design using RosettaDesign on both types of templates, and compared the energies and packing qualities of the resulting structures. Overall, the X‐ray structures were better templates for use with Rosetta. However, for ~20% of proteins, a member of the reported NMR ensemble gave rise to designs with similar properties. Re‐evaluating RosettaDesign structures with other energy functions indicated much smaller differences between the two types of templates. Ultimately, experiments are required to confirm the utility of particular X‐ray and NMR templates. But our data suggest that the lack of a high‐resolution X‐ray structure should not preclude attempts at computational design if an NMR ensemble is available. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
MOTIVATION: The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). RESULTS: The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.  相似文献   

18.
Sadowski MI  Jones DT 《Proteins》2007,69(3):476-485
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.  相似文献   

19.
In modern biology, there is a critical need to develop a high-throughput and inexpensive platform for DNA sequencing. Pyrosequencing is a nonelectrophoretic single-tube DNA sequencing method that takes advantage of cooperativity between four enzymes to monitor DNA synthesis. In these studies, single-stranded DNA-binding protein (SSB) was added to the primed DNA template prior to the Pyrosequencing reaction. The addition of SSB to a Pyrosequencing reaction system resulted in a read length of more than 30 nucleotides. Improvements were observed as: (i) increased efficiency of the enzymes, (ii) reduced mispriming, as measured by nonspecific signals, (iii) an increase in signal intensity during the reaction, (iv) higher accuracy in reading the number of identical adjacent nucleotides in difficult templates, and (v) longer reads. The usefulness of these results for future Pyrosequencing applications is discussed.  相似文献   

20.
Small-angle x-ray scattering (SAXS) is able to extract low-resolution protein shape information without requiring a specific crystal formation. However, it has found little use in atomic-level protein structure determination due to the uncertainty of residue-level structural assignment. We developed a new algorithm, SAXSTER, to couple the raw SAXS data with protein-fold-recognition algorithms and thus improve template-based protein-structure predictions. We designed nine different matching scoring functions of template and experimental SAXS profiles. The logarithm of the integrated correlation score showed the best template recognition ability and had the highest correlation with the true template modeling (TM)-score of the target structures. We tested the method in large-scale protein-fold-recognition experiments and achieved significant improvements in prioritizing the best template structures. When SAXSTER was applied to the proteins of asymmetric SAXS profile distributions, the average TM-score of the top-ranking templates increased by 18% after homologous templates were excluded, which corresponds to a p-value < 10−9 in Student's t-test. These data demonstrate a promising use of SAXS data to facilitate computational protein structure modeling, which is expected to work most efficiently for proteins of irregular global shape and/or multiple-domain protein complexes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号