首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ABSTRACT: BACKGROUND: Protein structures can be reliably predicted by template-based modeling (TBM) when experimental structures of homologous proteins are available. However, it is challenging to obtain structures more accurate than the single best templates by either combining information from multiple templates or by modeling regions that vary among templates or are not covered by any templates. RESULTS: We introduce GalaxyTBM, a new TBM method in which the more reliable core region is modeled first from multiple templates and less reliable, variable local regions, such as loops or termini, are then detected and re-modeled by an ab initio method. This TBM method is based on "Seok-server," which was tested in CASP9 and assessed to be amongst the top TBM servers. The accuracy of the initial core modeling is enhanced by focusing on more conserved regions in the multiple-template selection and multiple sequence alignment stages. Additional improvement is achieved by ab initio modeling of up to 3 unreliable local regions in the fixed framework of the core structure. Overall, GalaxyTBM reproduced the performance of Seok-server, with GalaxyTBM and Seok-server resulting in average GDT-TS of 68.1 and 68.4, respectively, when tested on 68 single-domain CASP9 TBM targets. For application to multi-domain proteins, GalaxyTBM must be combined with domain-splitting methods. CONCLUSION: Application of GalaxyTBM to CASP9 targets demonstrates that accurate protein structure prediction is possible by use of a multiple-template-based approach, and ab initio modeling of variable regions can further enhance the model quality.  相似文献   

2.
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.  相似文献   

3.
M. F. Thorpe  S. Banu Ozkan 《Proteins》2015,83(12):2279-2292
The most successful protein structure prediction methods to date have been template‐based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug‐design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr‐REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native‐like structures from a template and to provide a set of persistent contacts to be employed during re‐folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. Proteins 2015; 83:2279–2292. © 2015 Wiley Periodicals, Inc.  相似文献   

4.
Small-angle x-ray scattering (SAXS) is able to extract low-resolution protein shape information without requiring a specific crystal formation. However, it has found little use in atomic-level protein structure determination due to the uncertainty of residue-level structural assignment. We developed a new algorithm, SAXSTER, to couple the raw SAXS data with protein-fold-recognition algorithms and thus improve template-based protein-structure predictions. We designed nine different matching scoring functions of template and experimental SAXS profiles. The logarithm of the integrated correlation score showed the best template recognition ability and had the highest correlation with the true template modeling (TM)-score of the target structures. We tested the method in large-scale protein-fold-recognition experiments and achieved significant improvements in prioritizing the best template structures. When SAXSTER was applied to the proteins of asymmetric SAXS profile distributions, the average TM-score of the top-ranking templates increased by 18% after homologous templates were excluded, which corresponds to a p-value < 10−9 in Student's t-test. These data demonstrate a promising use of SAXS data to facilitate computational protein structure modeling, which is expected to work most efficiently for proteins of irregular global shape and/or multiple-domain protein complexes.  相似文献   

5.
Zhu M  Li M 《Molecular bioSystems》2012,8(6):1686-1693
G-protein coupled receptors (GPCRs) are recognized to constitute the largest family of membrane proteins. Due to the disproportion in the quantity of crystal structures and their amino acid sequences, homology modeling contributes a reasonable and feasible approach to GPCR theoretical coordinates. With the brand new crystal structures resolved recently, herein we deliberated how to designate them as templates to carry out homology modeling in four aspects: (1) various sequence alignment methods; (2) protein weight matrix; (3) different sets of multiple templates; (4) active and inactive state of templates. The accuracy of models was evaluated by comparing the similarity of stereo conformation and molecular docking results between models and the experimental structure of Meleagris gallopavo β(1)-adrenergic receptor (Mg_Adrb1) that we desired to develop as an example. Our results proposed that: (1) Cobalt and MAFFT, two algorithms of sequence alignment, were suitable for single- and multiple-template modeling, respectively; (2) Blosum30 is applicable to align sequences in the case of low sequence identity; (3) multiple-template modeling is not always better than single-template one; (4) the state of template is an influential factor in simulating the GPCR structures as well.  相似文献   

6.
Allergenic proteins must crosslink specific IgE molecules, bound to the surface of mast cells and basophils, to stimulate an immune response. A structural understanding of the allergen–IgE interface is needed to predict cross‐reactivities between allergens and to design hypoallergenic proteins. However, there are less than 90 experimentally determined structures available for the approximately 1500 sequences of allergens and isoallergens cataloged in the Structural Database of Allergenic Proteins. To provide reliable structural data for the remaining proteins, we previously produced more than 500 3D models using an automated procedure, with strict controls on template choice and model quality evaluation. Here, we assessed how well the fold and residue surface exposure of 10 of these models correlated with recently published experimental 3D structures determined by X‐ray crystallography or NMR. We also discuss the impact of intrinsically disordered regions on the structural comparison and epitope prediction. Overall, for seven allergens with sequence identities to the original templates higher than 27%, the backbone root‐mean square deviations were less than 2 Å between the models and the subsequently determined experimental structures for the ordered regions. Further, the surface exposure of the known IgE epitopes on the models of three major allergens, from peanut (Ara h 1), latex (Hev b 2), and soy (Gly m 4), was very similar to the experimentally determined structures. For the three remaining allergens with lower sequence identities to the modeling templates, the 3D folds were correctly identified. However, the accuracy of those models is not sufficient for a reliable epitope mapping. © Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

7.
Many proteins need to form oligomers to be functional, so oligomer structures provide important clues to biological roles of proteins. Prediction of oligomer structures therefore can be a useful tool in the absence of experimentally resolved structures. In this article, we describe the server and human methods that we used to predict oligomer structures in the CASP13 experiment. Performances of the methods on the 42 CASP13 oligomer targets consisting of 30 homo-oligomers and 12 hetero-oligomers are discussed. Our server method, Seok-assembly, generated models with interface contact similarity measure greater than 0.2 as model 1 for 11 homo-oligomer targets when proper templates existed in the database. Model refinement methods such as loop modeling and molecular dynamics (MD)-based overall refinement failed to improve model qualities when target proteins have domains not covered by templates or when chains have very small interfaces. In human predictions, additional experimental data such as low-resolution electron microscopy (EM) map were utilized. EM data could assist oligomer structure prediction by providing a global shape of the complex structure.  相似文献   

8.
BackgroundInterphase chromosomes adopt a hierarchical structure, and recent data have characterized their chromatin organization at very different scales, from sub-genic regions associated with DNA-binding proteins at the order of tens or hundreds of bases, through larger regions with active or repressed chromatin states, up to multi-megabase-scale domains associated with nuclear positioning, replication timing and other qualities. However, we have lacked detailed, quantitative models to understand the interactions between these different strata.ResultsHere we collate large collections of matched locus-level chromatin features and Hi-C interaction data, representing higher-order organization, across three human cell types. We use quantitative modeling approaches to assess whether locus-level features are sufficient to explain higher-order structure, and identify the most influential underlying features. We identify structurally variable domains between cell types and examine the underlying features to discover a general association with cell-type-specific enhancer activity. We also identify the most prominent features marking the boundaries of two types of higher-order domains at different scales: topologically associating domains and nuclear compartments. We find parallel enrichments of particular chromatin features for both types, including features associated with active promoters and the architectural proteins CTCF and YY1.ConclusionsWe show that integrative modeling of large chromatin dataset collections using random forests can generate useful insights into chromosome structure. The models produced recapitulate known biological features of the cell types involved, allow exploration of the antecedents of higher-order structures and generate testable hypotheses for further experimental studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0661-x) contains supplementary material, which is available to authorized users.  相似文献   

9.
Comparative docking is based on experimentally determined structures of protein-protein complexes (templates), following the paradigm that proteins with similar sequences and/or structures form similar complexes. Modeling utilizing structure similarity of target monomers to template complexes significantly expands structural coverage of the interactome. Template-based docking by structure alignment can be performed for the entire structures or by aligning targets to the bound interfaces of the experimentally determined complexes. Systematic benchmarking of docking protocols based on full and interface structure alignment showed that both protocols perform similarly, with top 1 docking success rate 26%. However, in terms of the models' quality, the interface-based docking performed marginally better. The interface-based docking is preferable when one would suspect a significant conformational change in the full protein structure upon binding, for example, a rearrangement of the domains in multidomain proteins. Importantly, if the same structure is selected as the top template by both full and interface alignment, the docking success rate increases 2-fold for both top 1 and top 10 predictions. Matching structural annotations of the target and template proteins for template detection, as a computationally less expensive alternative to structural alignment, did not improve the docking performance. Sophisticated remote sequence homology detection added templates to the pool of those identified by structure-based alignment, suggesting that for practical docking, the combination of the structure alignment protocols and the remote sequence homology detection may be useful in order to avoid potential flaws in generation of the structural templates library.  相似文献   

10.
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using Modeller, we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.  相似文献   

11.
G protein-coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha) root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR).  相似文献   

12.
It is well established that sequence templates (e.g., PROSITE) and databases are powerful tools for identifying biological function and tertiary structure for an unknown protein sequence. Here we describe a method for automatically deriving 3D templates from the protein structures deposited in the Brookhaven Protein Data Bank. As an example, we describe a template derived for the Ser-His-Asp catalytic triad found in the serine proteases and triacylglycerol lipases. We find that the resultant template provides a highly selective tool for automatically differentiating between catalytic and noncatalytic Ser-His-Asp associations. When applied to nonproteolytic proteins, the template picks out two "non-esterase" catalytic triads that may be of biological relevance. This suggests that the development of databases of 3D templates, such as those that currently exist for protein sequence templates, will help identify the functions of new protein structures as they are determined and pinpoint their functionally important regions.  相似文献   

13.
We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER , a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of ≈ 1 Å, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein. © 1995 Wiley-Liss, Inc.  相似文献   

14.
Certain protein‐design calculations involve using an experimentally determined high‐resolution structure as a template to identify new sequences that can adopt the same fold. This approach has led to the successful design of many novel, well‐folded, native‐like proteins. Although any atomic‐resolution structure can serve as a template in such calculations, most successful designs have used high‐resolution crystal structures. Because there are many proteins for which crystal structures are not available, it is of interest whether nuclear magnetic resonance (NMR) templates are also appropriate. We have analyzed differences between using X‐ray and NMR templates in side‐chain repacking and design calculations. We assembled a database of 29 proteins for which both a high‐resolution X‐ray structure and an ensemble of NMR structures are available. Using these pairs, we compared the rotamericity, χ1‐angle recovery, and native‐sequence recovery of X‐ray and NMR templates. We carried out design using RosettaDesign on both types of templates, and compared the energies and packing qualities of the resulting structures. Overall, the X‐ray structures were better templates for use with Rosetta. However, for ~20% of proteins, a member of the reported NMR ensemble gave rise to designs with similar properties. Re‐evaluating RosettaDesign structures with other energy functions indicated much smaller differences between the two types of templates. Ultimately, experiments are required to confirm the utility of particular X‐ray and NMR templates. But our data suggest that the lack of a high‐resolution X‐ray structure should not preclude attempts at computational design if an NMR ensemble is available. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
The total number of protein-protein complex structures currently available in the Protein Data Bank (PDB) is six times smaller than the total number of tertiary structures in the PDB, which limits the power of homology-based approaches to complex structure modeling. We present a threading-recombination approach, COTH, to boost the protein complex structure library by combining tertiary structure templates with complex alignments. The query sequences are first aligned to complex templates using a modified dynamic programming algorithm, guided by ab initio binding-site predictions. The monomer alignments are then shifted to the multimeric template framework by structural alignments. COTH was tested on 500 nonhomologous dimeric proteins, which can successfully detect correct templates for 50% of the cases after homologous templates are excluded, which significantly outperforms conventional homology modeling algorithms. It also shows a higher accuracy in interface modeling than rigid-body docking of unbound structures from ZDOCK although with lower coverage. These data demonstrate new avenues to model complex structures from nonhomologous templates.  相似文献   

16.
Recent research indicates that hundreds of thousands of G-rich sequences within the human genome have the potential to form secondary structures known as G-quadruplexes. Telomeric regions, consisting of long arrays of TTAGGG/AATCCC repeats, are among the most likely areas in which these structures might form. Since G-quadruplexes assemble from certain G-rich single-stranded sequences, they might arise when duplex DNA is unwound such as during replication. Coincidentally, these bulky structures when present in the DNA template might also hinder the action of DNA polymerases. In this study, single-stranded telomeric templates with the potential to form G-quadruplexes were examined for their effects on a variety of replicative and translesion DNA polymerases from humans and lower organisms. Our results demonstrate that single-stranded templates containing four telomeric GGG runs fold into intramolecular G-quadruplex structures. These intramolecular G quadruplexes are somewhat dynamic in nature and stabilized by increasing KCl concentrations and decreasing temperatures. Furthermore, the presence of these intramolecular G-quadruplexes in the template dramatically inhibits DNA synthesis by various DNA polymerases, including the human polymerase δ employed during lagging strand replication of G-rich telomeric strands and several human translesion DNA polymerases potentially recruited to sites of replication blockage. Notably, misincorporation of nucleotides is observed when certain translesion polymerases are employed on substrates containing intramolecular G-quadruplexes, as is extension of the resulting mismatched base pairs upon dynamic unfolding of this secondary structure. These findings reveal the potential for blockage of DNA replication and genetic changes related to sequences capable of forming intramolecular G-quadruplexes.  相似文献   

17.
So far, 13 groups of mammalian Toll-like receptors (TLRs) have been identified. Most TLRs have been shown to recognize pathogen-associated molecular patterns from a wide range of invading agents and initiate both innate and adaptive immune responses. The TLR ectodomains are composed of varying numbers and types of leucine-rich repeats (LRRs). As the crystal structures are currently missing for most TLR ligand-binding ectodomains, homology modeling enables first predictions of their three-dimensional structures on the basis of the determined crystal structures of TLR ectodomains. However, the quality of the predicted models that are generated from full-length templates can be limited due to low sequence identity between the target and templates. To obtain better templates for modeling, we have developed an LRR template assembly approach. Individual LRR templates that are locally optimal for the target sequence are assembled into multiple templates. This method was validated through the comparison of a predicted model with the crystal structure of mouse TLR3. With this method, we also constructed ectodomain models of human TLR5, TLR6, TLR7, TLR8, TLR9, and TLR10 and mouse TLR11, TLR12, and TLR13 that can be used as first passes for a computational simulation of ligand docking or to design mutation experiments. This template assembly approach can be extended to other repetitive proteins.  相似文献   

18.
G-protein coupled receptors (GPCRs) are thought to be proteins with 7-membered transmembrane helical bundles (7TM proteins). Recently, the X-ray structures have been solved for two such proteins, namely for bacteriorhodopsin (BR) and rhodopsin (Rh), the latter being a GPCR. Despite similarities, the structures are different enough to suggest that 3D models for different GPCRs cannot be obtained directly employing 3D structures of BR or Rh as a unique template. The approach to computer modeling of 7TM proteins developed in this work was capable of reproducing the experimental X-ray structure of BR with great accuracy. A combination of helical packing and low-energy conformers for loops most close to the X-ray structure possesses the r.m.s.d. value of 3.13 A. Such a level of accuracy for the 3D-structure prediction for a 216-residue protein has not been achieved, so far, by any available ab initio procedure of protein folding. The approach may produce also other energetically consistent combinations of helical bundles and loop conformers, creating a variety of possible templates for 3D structures of 7TM proteins, including GPCRs. These templates may provide experimentalists with various plausible options for 3D structure of a given GPCR; in our view, only experiments will determine the final choice of the most reasonable 3D template.  相似文献   

19.
The advent of the complete genome sequences of various organisms in the mid-1990s raised the issue of how one could determine the function of hypothetical proteins. While insight might be obtained from a 3D structure, the chances of being able to predict such a structure is limited for the deduced amino acid sequence of any uncharacterized gene. A template for modeling is required, but there was only a low probability of finding a protein closely-related in sequence with an available structure. Thus, in the late 1990s, an international effort known as structural genomics (SG) was initiated, its primary goal to “fill sequence-structure space” by determining the 3D structures of representatives of all known protein families. This was to be achieved mainly by X-ray crystallography and it was estimated that at least 5,000 new structures would be required. While the proteins (genes) for SG have subsequently been derived from hundreds of different organisms, extremophiles and particularly thermophiles have been specifically targeted due to the increased stability and ease of handling of their proteins, relative to those from mesophiles. This review summarizes the significant impact that extremophiles and proteins derived from them have had on SG projects worldwide. To what extent SG has influenced the field of extremophile research is also discussed.  相似文献   

20.
Structural characterization of protein–protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template‐free or template‐based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high‐resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have predefined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model‐to‐native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model‐like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu . Proteins 2015; 83:891–897. © 2015 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号