共查询到20条相似文献,搜索用时 15 毫秒
1.
The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1–2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug‐specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans‐membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α‐helical MPs as well as β‐barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge‐based scoring functions. Moreover, de novo methods have benefited from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade. Proteins 2015; 83:1–24. © 2014 Wiley Periodicals, Inc. 相似文献
2.
Contemporary template-based modeling techniques allow applications of modeling methods to vast biological problems. However, they tend to fail to provide accurate structures for less-conserved local regions in sequence even when the overall structure can be modeled reliably. We call these regions unreliable local regions (ULRs). Accurate modeling of ULRs is of enormous value because they are frequently involved in functional specificity. In this article, we introduce a new method for modeling ULRs in template-based models by employing a sophisticated loop modeling technique. Combined with our previous study on protein termini, the method is applicable to refinement of both loop and terminus ULRs. A large-scale test carried out in a blind fashion in CASP9 (the 9th Critical Assessment of techniques for protein structure prediction) shows that ULR structures are improved over initial template-based models by refinement in more than 70% of the successfully detected ULRs. It is also notable that successful modeling of several long ULRs over 12 residues is achieved. Overall, the current results show that a careful application of loop and terminus modeling can be a promising tool for model refinement in template-based modeling. 相似文献
3.
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously. 相似文献
4.
Inter-residue interactions play an essential role in driving protein folding, and analysis of these interactions increases our understanding of protein folding and stability and facilitates the development of tools for protein structure and function prediction. In this work, we systematically characterized the change of inter-residue interactions at various sequence separation cutoffs using two protein datasets. The first set included 100 diverse, nonredundant and high-resolution soluble protein structures, covering all four major structural classes, all-alpha, alpha/beta, alpha+beta, and all-beta; and the second set included 20 diverse, nonredundant and high-resolution membrane protein structures, representing 19 unique superfamilies. It was shown that the average number of inter-residue interactions in structures of both datasets displays the power-law behavior. Fitting parameters of the power-law function are directly related to the structural classes analyzed. These findings provided further insight into the distribution of short-, medium-, and long-range inter-residue interactions in both soluble and membrane proteins and could be used for protein structure prediction. 相似文献
5.
Template‐based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template‐based methods often perform better than single template‐based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSERVMT. We first develop an algorithm that improves the target‐template alignment for a given template. The improved alignment, called the SP3 alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge‐based scores. The refined top model is then structurally aligned to the template to produce the SP3 alternative alignment. Templates identified using SP3 threading are combined with the SP3 alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full‐length models. Then, the models from all sets of templates are pooled, and the top 20–50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro‐sp3‐TASSER, on a set with 874 easy and 318 hard targets. The average GDT‐TS score improvements for the first model are 3.5 and 4.3% for easy and hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT‐TS scores as compared to pro‐sp3‐TASSER by 8.2 and 9.3% for the 80 easy and 32 hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang‐Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/ . © 2011 Wiley Periodicals, Inc. 相似文献
6.
Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods. 相似文献
7.
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures. 相似文献
8.
Keehyoung Joo Jinwoo Lee Joo‐Hyun Seo Kyoungrim Lee Byung‐Gee Kim Jooyoung Lee 《Proteins》2009,75(4):1010-1023
We have investigated the effect of rigorous optimization of the MODELLER energy function for possible improvement in protein all‐atom chain‐building. For this we applied the global optimization method called conformational space annealing (CSA) to the standard MODELLER procedure to achieve better energy optimization than what MODELLER provides. The method, which we call MODELLERCSA , is tested on two benchmark sets. The first is the 298 proteins taken from the HOMSTRAD multiple alignment set. By simply optimizing the MODELLER energy function, we observe significant improvement in side‐chain modeling, where MODELLERCSA provides about 10.7% (14.5%) improvement for χ1 (χ1 + χ2) accuracy compared to the standard MODELLER modeling. The improvement of backbone accuracy by MODELLERCSA is shown to be less prominent, and a similar improvement can be achieved by simply generating many standard MODELLER models and selecting lowest energy models. However, the level of side‐chain modeling accuracy by MODELLERCSA could not be matched either by extensive MODELLER strategies, side‐chain remodeling by SCWRL3, or copying unmutated rotamers. The identical procedure was successfully applied to 100 CASP7 template base modeling domains during the prediction season in a blind fashion, and the results are included here for comparison. From this study, we observe a good correlation between the MODELLER energy and the side‐chain accuracy. Our findings indicate that, when a good alignment between a target protein and its templates is provided, thorough optimization of the MODELLER energy function leads to accurate all‐atom models. Proteins 2009. © 2008 Wiley‐Liss, Inc. 相似文献
9.
Zen A Carnevale V Lesk AM Micheletti C 《Protein science : a publication of the Protein Society》2008,17(5):918-929
Proteins that show similarity in their equilibrium dynamics can be aligned by identifying regions that undergo similar concerted movements. These movements are computed from protein native structures using coarse-grained elastic network models. We show the existence of common large-scale movements in enzymes selected from the main functional and structural classes. Alignment via dynamics does not require prior detection of sequence or structural correspondence. Indeed, a third of the statistically significant dynamics-based alignments involve enzymes that lack substantial global or local structural similarities. The analysis of specific residue-residue correspondences of these structurally dissimilar enzymes in some cases suggests a functional relationship of the detected common dynamic features. Including dynamics-based criteria in protein alignment thus provides a promising avenue for relating and grouping enzymes in terms of dynamic aspects that often, though not always, assist or accompany biological function. 相似文献
10.
Allergenic proteins must crosslink specific IgE molecules, bound to the surface of mast cells and basophils, to stimulate an immune response. A structural understanding of the allergen–IgE interface is needed to predict cross‐reactivities between allergens and to design hypoallergenic proteins. However, there are less than 90 experimentally determined structures available for the approximately 1500 sequences of allergens and isoallergens cataloged in the Structural Database of Allergenic Proteins. To provide reliable structural data for the remaining proteins, we previously produced more than 500 3D models using an automated procedure, with strict controls on template choice and model quality evaluation. Here, we assessed how well the fold and residue surface exposure of 10 of these models correlated with recently published experimental 3D structures determined by X‐ray crystallography or NMR. We also discuss the impact of intrinsically disordered regions on the structural comparison and epitope prediction. Overall, for seven allergens with sequence identities to the original templates higher than 27%, the backbone root‐mean square deviations were less than 2 Å between the models and the subsequently determined experimental structures for the ordered regions. Further, the surface exposure of the known IgE epitopes on the models of three major allergens, from peanut (Ara h 1), latex (Hev b 2), and soy (Gly m 4), was very similar to the experimentally determined structures. For the three remaining allergens with lower sequence identities to the modeling templates, the 3D folds were correctly identified. However, the accuracy of those models is not sufficient for a reliable epitope mapping. © Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献
11.
As a first step toward a novel de novo structure prediction approach for alpha-helical membrane proteins, we developed coarse-grained knowledge-based potentials to score the mutual configuration of transmembrane (TM) helices. Using a comprehensive database of 71 known membrane protein structures, pairwise potentials depending solely on amino acid types and distances between C(alpha)-atoms were derived. To evaluate the potentials, they were used as an objective function for the rigid docking of 442 TM helix pairs. This is by far the largest test data set reported to date for that purpose. After clustering 500 docking runs for each pair and considering the largest cluster, we found solutions with a root mean squared (RMS) deviation <2 A for about 30% of all helix pairs. Encouragingly, if only clusters that contain at least 20% of all decoys are considered, a success rate >71% (with a RMS deviation <2 A) is obtained. The cluster size thus serves as a measure of significance to identify good docking solutions. In a leave-one-protein-family-out cross-validation study, more than 2/3 of the helix pairs were still predicted with an RMS deviation <2.5 A (if only clusters that contain at least 20% of all decoys are considered). This demonstrates the predictive power of the potentials in general, although it is advisable to further extend the knowledge base to derive more robust potentials in the future. When compared to the scoring function of Fleishman and Ben-Tal, a comparable performance is found by our cross-validated potentials. Finally, well-predicted \"anchor helix pairs\" can be reliably identified for most of the proteins of the test data set. This is important for an extension of the approach towards TM helix bundles because these anchor pairs will act as \"nucleation sites\" to which more helices will be added subsequently, which alleviates the sampling problem. 相似文献
12.
13.
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/. 相似文献
14.
Structural alignment of proteins is widely used in various fields of structural biology. In order to further improve the quality of alignment, we describe an algorithm for structural alignment based on text modelling techniques. The technique firstly superimposes secondary structure elements of two proteins and then, models the 3D-structure of the protein in a sequence of alphabets. These sequences are utilized by a step-by-step sequence alignment procedure to align two protein structures. A benchmark test was organized on a set of 200 non-homologous proteins to evaluate the program and compare it to state of the art programs, e.g. CE, SAL, TM-align and 3D-BLAST. On average, the results of all-against-all structure comparison by the program have a competitive accuracy with CE and TM-align where the algorithm has a high running speed like 3D-BLAST. 相似文献
15.
Hong Wing Lee Hong Ching Lee Lawrence K. Lee Erdahl T. Teber 《Journal of biomolecular structure & dynamics》2013,31(2):308-318
Major advances have been made in the prediction of soluble protein structures, led by the knowledge-based modeling methods that extract useful structural trends from known protein structures and incorporate them into scoring functions. The same cannot be reported for the class of transmembrane proteins, primarily due to the lack of high-resolution structural data for transmembrane proteins, which render many of the knowledge-based method unreliable or invalid. We have developed a method that harnesses the vast structural knowledge available in soluble protein data for use in the modeling of transmembrane proteins. At the core of the method, a set of transmembrane protein decoy sets that allow us to filter and train features recognized from soluble proteins for transmembrane protein modeling into a set of scoring functions. We have demonstrated that structures of soluble proteins can provide significant insight into transmembrane protein structures. A complementary novel two-stage modeling/selection process that mimics the two-stage helical membrane protein folding was developed. Combined with the scoring function, the method was successfully applied to model 5 transmembrane proteins. The root mean square deviations of the predicted models ranged from 5.0 to 8.8?Å to the native structures. 相似文献
16.
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction. 相似文献
17.
Membrane proteins are challenging to study and restraints for structure determination are typically sparse or of low resolution because the membrane environment that surrounds them leads to a variety of experimental challenges. When membrane protein structures are determined by different techniques in different environments, a natural question is “which structure is most biologically relevant?” Towards answering this question, we compiled a dataset of membrane proteins with known structures determined by both solution NMR and X‐ray crystallography. By investigating differences between the structures, we found that RMSDs between crystal and NMR structures are below 5 Å in the membrane region, NMR ensembles have a higher convergence in the membrane region, crystal structures typically have a straighter transmembrane region, have higher stereo‐chemical correctness, and are more tightly packed. After quantifying these differences, we used high‐resolution refinement of the NMR structures to mitigate them, which paves the way for identifying and improving the structural quality of membrane proteins. 相似文献
18.
The similarity between folding and binding led us to posit the concept that the number of protein-protein interface motifs in nature is limited, and interacting protein pairs can use similar interface architectures repeatedly, even if their global folds completely vary. Thus, known protein-protein interface architectures can be used to model the complexes between two target proteins on the proteome scale, even if their global structures differ. This powerful concept is combined with a flexible refinement and global energy assessment tool. The accuracy of the method is highly dependent on the structural diversity of the interface architectures in the template dataset. Here, we validate this knowledge-based combinatorial method on the Docking Benchmark and show that it efficiently finds high-quality models for benchmark complexes and their binding regions even in the absence of template interfaces having sequence similarity to the targets. Compared to \"classical\" docking, it is computationally faster; as the number of target proteins increases, the difference becomes more dramatic. Further, it is able to distinguish binders from nonbinders. These features allow performing large-scale network modeling. The results on an independent target set (proteins in the p53 molecular interaction map) show that current method can be used to predict whether a given protein pair interacts. Overall, while constrained by the diversity of the template set, this approach efficiently produces high-quality models of protein-protein complexes. We expect that with the growing number of known interface architectures, this type of knowledge-based methods will be increasingly used by the broad proteomics community. 相似文献
19.
STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments. 相似文献
20.
We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra-molecular residue-residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue-residue correspondence. Typical pairwise alignments take FAST less than a second with a Pentium III 1.2GHz CPU. FAST software and a web server are available at http://biowulf.bu.edu/FAST/. 相似文献