首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The accuracy of comparative models of proteins is addressed here. A set of 12732 single-template models of sequences of known high-resolution structures was built by an automated procedure. Accuracy of several structure-derived properties, such as surface area, residue accessibility, presence of pockets, electrostatic potential and others, was determined as a function of template:target sequence identity by comparing models with their corresponding experimental structures. As expected, the average accuracy of structure-derived properties always increases with higher template:target sequence identity, but the exact shape of this relationship can differ from one property to another. A comparison of structure-derived properties measured from NMR and X-ray structures of the same protein shows that for most properties, the NMR/X-ray difference is of the same order as the error in models based on ~40% template:target sequence identity. The exact sequence identity at which properties reach that accuracy varies between 25 and 50%, depending on the property being analyzed. A general characteristic of simple comparative models is that their surface has increased area as a consequence of being more rugged than that of experimental structures. This suggests that including solvent effects during model building or refinement could significantly improve the accuracy of surface properties in comparative models.  相似文献   

2.
Sequences of the ubiquitin-conjugating enzyme (UBC or E2) family were used as a test set to investigate issues associated with the high-throughput comparative modelling of protein structures. A semi-automatic method was initially developed with particular emphasis on producing models of a quality suitable for structural comparison. Structural and sequence features of the E2 family were used to improve the sequence alignment and the quality of the structural templates. Initially, failure to correct for subtle structural inconsistencies between templates lead to problems in the comparative analysis of the UBC electrostatic potentials. Modelling of known UBC structures using Modeller 4.0 showed that multiple templates produced, on average, no better models than the use of just one template, as judged by the root-mean-squared deviation between the comparative model and crystal structure backbones. Using four different quality-checking methods, for a given target sequence, it was not possible to distinguish the model most similar to the experimental structure. The UBC models were thus finally modelled using only the crystal structure template with the highest sequence identity to the target to be modelled, and producing only one model solution. Quality checking was used to reject models with obvious structural anomalies (e.g., bad side-chain packing). The resulting models have been used for a comparison of UBC structural features and of their electrostatic potentials. The work was extended through the development of a fully automated pipeline that identifies E2 sequences in the sequence databases, aligns and models them, and calculates the associated electrostatic potential.  相似文献   

3.
4.
Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons.net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons.net protein structure prediction server. AVAILABILITY AND IMPLEMENTATION: PconsM is freely available from http://pcons.net/.  相似文献   

5.
Zhu M  Li M 《Molecular bioSystems》2012,8(6):1686-1693
G-protein coupled receptors (GPCRs) are recognized to constitute the largest family of membrane proteins. Due to the disproportion in the quantity of crystal structures and their amino acid sequences, homology modeling contributes a reasonable and feasible approach to GPCR theoretical coordinates. With the brand new crystal structures resolved recently, herein we deliberated how to designate them as templates to carry out homology modeling in four aspects: (1) various sequence alignment methods; (2) protein weight matrix; (3) different sets of multiple templates; (4) active and inactive state of templates. The accuracy of models was evaluated by comparing the similarity of stereo conformation and molecular docking results between models and the experimental structure of Meleagris gallopavo β(1)-adrenergic receptor (Mg_Adrb1) that we desired to develop as an example. Our results proposed that: (1) Cobalt and MAFFT, two algorithms of sequence alignment, were suitable for single- and multiple-template modeling, respectively; (2) Blosum30 is applicable to align sequences in the case of low sequence identity; (3) multiple-template modeling is not always better than single-template one; (4) the state of template is an influential factor in simulating the GPCR structures as well.  相似文献   

6.
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes.  相似文献   

7.
Peng J  Xu J 《Proteins》2011,79(6):1930-1939
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.  相似文献   

8.
We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER , a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of ≈ 1 Å, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein. © 1995 Wiley-Liss, Inc.  相似文献   

9.
The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. We develop a novel structural alignment method, SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursive neural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five‐fold cross‐validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state‐of‐the‐art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI‐BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond the Twilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction of structural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

10.

Background  

Although multiple templates are frequently used in comparative modeling, the effect of inclusion of additional template(s) on model accuracy (when compared to that of corresponding single-template based models) is not clear. To address this, we systematically analyze two-template models, the simplest case of multiple-template modeling. For an existing target-template pair (single-template modeling), a two-template based model of the target sequence is constructed by including an additional template without changing the original alignment to measure the effect of the second template on model accuracy.  相似文献   

11.
MOTIVATION: There are two main areas of difficulty in homology modelling that are particularly important when sequence identity between target and template falls below 50%: sequence alignment and loop building. These problems become magnified with automatic modelling processes, as there is no human input to correct mistakes. As such we have benchmarked several stand-alone strategies that could be implemented in a workflow for automated high-throughput homology modelling. These include three new sequence-structure alignment programs: 3D-Coffee, Staccato and SAlign, plus five homology modelling programs and their respective loop building methods: Builder, Nest, Modeller, SegMod/ENCAD and Swiss-Model. The SABmark database provided 123 targets with at least five templates from the same SCOP family and sequence identities 相似文献   

12.

Background  

For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment.  相似文献   

13.
Structural genomics is the idea of covering protein space so that every protein sequence comes within model building distance of a protein of known structure. Unfortunately, reproducing the structural alignment of distantly related proteins is a difficult challenge to existing sequence alignment and motif search software. We have developed a new transitive alignment algorithm (MaxFlow), which generates accurate alignments between proteins deep in the twilight zone of sequence similarity, below 20% sequence identity. In particular, MaxFlow reliably identifies conserved core motifs between proteins which are only indirect PSI-Blast neighbours. Based on MaxFlow alignments, useful 3D models can be generated for all members of a superfamily from as few as a single structural template – despite hundreds of representatives at 40% sequence identity level and patchy detection of homology by PSI-Blast. We propose novel strategies for target prioritization using MaxFlow scores to predict the optimal templates in a superfamily. Our results support an increase in the granularity of covering protein space that has potentially enormous economic implications for planning the transition to the full production phase of structural genomics.  相似文献   

14.
So far, 13 groups of mammalian Toll-like receptors (TLRs) have been identified. Most TLRs have been shown to recognize pathogen-associated molecular patterns from a wide range of invading agents and initiate both innate and adaptive immune responses. The TLR ectodomains are composed of varying numbers and types of leucine-rich repeats (LRRs). As the crystal structures are currently missing for most TLR ligand-binding ectodomains, homology modeling enables first predictions of their three-dimensional structures on the basis of the determined crystal structures of TLR ectodomains. However, the quality of the predicted models that are generated from full-length templates can be limited due to low sequence identity between the target and templates. To obtain better templates for modeling, we have developed an LRR template assembly approach. Individual LRR templates that are locally optimal for the target sequence are assembled into multiple templates. This method was validated through the comparison of a predicted model with the crystal structure of mouse TLR3. With this method, we also constructed ectodomain models of human TLR5, TLR6, TLR7, TLR8, TLR9, and TLR10 and mouse TLR11, TLR12, and TLR13 that can be used as first passes for a computational simulation of ligand docking or to design mutation experiments. This template assembly approach can be extended to other repetitive proteins.  相似文献   

15.
The structural biology of proteins mediating iron-sulfur (Fe-S) cluster assembly is central for understanding several important biological processes. Here we present the NMR structure of the 16-kDa protein YgdK from Escherichia coli, which shares 35% sequence identity with the E. coli protein SufE. The SufE X-ray crystal structure was solved in parallel with the YdgK NMR structure in the Northeast Structural Genomics (NESG) consortium. Both proteins are (1) key components for Fe-S metabolism, (2) exhibit the same distinct fold, and (3) belong to a family of at least 70 prokaryotic and eukaryotic sequence homologs. Accurate homology models were calculated for the YgdK/SufE family based on YgdK NMR and SufE crystal structure. Both structural templates contributed equally, exemplifying synergy of NMR and X-ray crystallography. SufE acts as an enhancer of the cysteine desulfurase activity of SufS by SufE-SufS complex formation. A homology model of CsdA, a desulfurase encoded in the same operon as YgdK, was modeled using the X-ray structure of SufS as a template. Protein surface and electrostatic complementarities strongly suggest that YgdK and CsdA likewise form a functional two-component desulfurase complex. Moreover, structural features of YgdK and SufS, which can be linked to their interaction with desulfurases, are conserved in all homology models. It thus appears very likely that all members of the YgdK/SufE family act as enhancers of Suf-S-like desulfurases. The present study exemplifies that "refined" selection of two (or more) targets enables high-quality homology modeling of large protein families.  相似文献   

16.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

17.
Protein comparative modeling has useful applications in large-scale structural initiatives and in rational design of drug targets in medicinal chemistry. The reliability of a homology model is dependent on the sequence identity between the query and the structural homologue used as a template for modeling. Here, we present a method for the utilization and conservation of important structural features of template structures by providing additional spatial restraints in comparative modeling programs like MODELLER. We show that root mean square deviation at C(alpha) positions between the model and the corresponding experimental structure and the quality of the models can be significantly improved for distantly related systems by utilizing additional spatial restraints of the template structures. We demonstrate the influence of such approaches to homology modeling during distant relationships in understanding functional properties of protein such as ligand binding using cytochrome P450 as an example.  相似文献   

18.
Opioid receptors are the principal targets for opioids, which have been used as analgesics for centuries. Opioid receptors belong to the rhodopsin family of G-protein coupled receptors (GPCRs). In the absence of crystal structures of opioid receptors, 3D homology models have been reported with bovine rhodopsin as a template, though the sequence homology is low. Recently, it has been reported that use of multiple templates results in a better model for a target having low sequence identity with a single template. With the objective of carrying out a comparative study on the structural quality of the 3D models based on single and multiple templates, the homology models for opioid receptors (mu, delta and kappa) were generated using bovine rhodopsin as single template and the recently deposited crystal structures of squid rhodopsin, turkey β-1 and human β-2 adrenoreceptors along with bovine rhodopsin as multiple templates. In this paper we report the results of comparison between the refined 3D models based on multiple sequence alignment (MSA) and models built with bovine rhodopsin as template, using validation programs PROCHECK, PROSA, Verify 3D, Molprobity and docking studies. The results indicate that homology models of mu and kappa with multiple templates are better than those built with only bovine rhodopsin as template, whereas, in many aspects, the homology model of delta opioid receptor with single template is better with respect to the model based on multiple templates. Three nonselective ligands were docked to both the models of mu, delta and kappa opioid receptors using GOLD 3.1. The results of docking complied well with the pharamacophore, reported for nonspecific opioid ligands. The comparison of docking results for models with multiple templates and those with single template have been discussed in detail. Three selective ligands for each receptor were also docked. As the crystallographic structures are not yet known, this comparison will help in choosing better homology models of opioid receptors for studying ligand receptor interactions to design new potent opioid antagonists.  相似文献   

19.
One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target sequence since the three-dimensional structure of the target sequence is built on the sequence-template alignment. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-score is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the threading scoring function is the weighted sum of several energy items of different physical meanings. This paper presents a support vector machine (SVM) regression approach to directly predict the alignment accuracy of a sequence-template alignment, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method.  相似文献   

20.
MOTIVATION: The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). RESULTS: The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号