首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
Homology modeling is a powerful tool for predicting protein structures, whose success depends on obtaining a reasonable alignment between a given structural template and the protein sequence being analyzed. In order to leverage greater predictive power for proteins with few structural templates, we have developed a method to rank homology models based upon their compliance to secondary structure derived from experimental solid-state NMR (SSNMR) data. Such data is obtainable in a rapid manner by simple SSNMR experiments (e.g., 13C–13C 2D correlation spectra). To test our homology model scoring procedure for various amino acid labeling schemes, we generated a library of 7,474 homology models for 22 protein targets culled from the TALOS+/SPARTA+ training set of protein structures. Using subsets of amino acids that are plausibly assigned by SSNMR, we discovered that pairs of the residues Val, Ile, Thr, Ala and Leu (VITAL) emulate an ideal dataset where all residues are site specifically assigned. Scoring the models with a predicted VITAL site-specific dataset and calculating secondary structure with the Chemical Shift Index resulted in a Pearson correlation coefficient (−0.75) commensurate to the control (−0.77), where secondary structure was scored site specifically for all amino acids (ALL 20) using STRIDE. This method promises to accelerate structure procurement by SSNMR for proteins with unknown folds through guiding the selection of remotely homologous protein templates and assessing model quality.  相似文献   

2.
MOTIVATION: Our aim is to develop a process that automatically defines a repertory of contiguous 3D protein structure fragments and can be used in homology modeling. We present here improvements to the method we introduced previously: the 'hybrid protein model' (de Brevern and Hazout, THEOR: Chem. Acc., 106, 36-47, (2001)) The hybrid protein learns a non-redundant databank encoded in a structural alphabet composed of 16 Protein Blocks (PBs; de Brevern et al., Proteins, 41, 271-287, (2000)). Every local fold is learned by looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position corresponds to a cluster of similar 3D local folds. RESULTS: In this paper, we describe improvements to our method for building an optimal hybrid protein: (i) 'baby training,' which is defined as the introduction of large structure fragments and the progressive reduction in the size of training fragments; and (ii) the deletion of the redundant parts of the hybrid protein. This repertory of contiguous 3D protein structure fragments should be a useful tool for molecular modeling.  相似文献   

3.
Structural genomics projects are providing large quantities of new 3D structural data for proteins. To monitor the quality of these data, we have developed the protein structure validation software suite (PSVS), for assessment of protein structures generated by NMR or X-ray crystallographic methods. PSVS is broadly applicable for structure quality assessment in structural biology projects. The software integrates under a single interface analyses from several widely-used structure quality evaluation tools, including PROCHECK (Laskowski et al., J Appl Crystallog 1993;26:283-291), MolProbity (Lovell et al., Proteins 2003;50:437-450), Verify3D (Luthy et al., Nature 1992;356:83-85), ProsaII (Sippl, Proteins 1993;17: 355-362), the PDB validation software, and various structure-validation tools developed in our own laboratory. PSVS provides standard constraint analyses, statistics on goodness-of-fit between structures and experimental data, and knowledge-based structure quality scores in standardized format suitable for database integration. The analysis provides both global and site-specific measures of protein structure quality. Global quality measures are reported as Z scores, based on calibration with a set of high-resolution X-ray crystal structures. PSVS is particularly useful in assessing protein structures determined by NMR methods, but is also valuable for assessing X-ray crystal structures or homology models. Using these tools, we assessed protein structures generated by the Northeast Structural Genomics Consortium and other international structural genomics projects, over a 5-year period. Protein structures produced from structural genomics projects exhibit quality score distributions similar to those of structures produced in traditional structural biology projects during the same time period. However, while some NMR structures have structure quality scores similar to those seen in higher-resolution X-ray crystal structures, the majority of NMR structures have lower scores. Potential reasons for this "structure quality score gap" between NMR and X-ray crystal structures are discussed.  相似文献   

4.
The three-dimensional structure of rubredoxin from the hyperthermophilic archaebacterium, Pyrococcus furiosus, has been modeled from the X-ray crystal structures of three homologous proteins from Clostridium pasteurianum, Desulfovibrio gigas, and Desulfovibrio vulgaris. All three homology models are similar. When comparing the positions of all heavy atoms and essential hydrogen atoms to the recently solved crystal structure (Day, M. W., et al., 1992, Protein Sci. 1, 1494-1507) of the same protein, the homology model differ from the X-ray structure by 2.09 A root mean square (RMS). The X-ray and the zinc-substituted NMR structures (Blake, P. R., et al., 1992b, Protein Sci. 1, 1508-1521) show a similar level of difference (2.05 A RMS). On average, the homology models are closer to the X-ray structure than to the NMR structures (2.09 vs. 2.42 A RMS).  相似文献   

5.
The structural refinement of protein models is a challenging problem in protein structure prediction (Moult et al., Proteins 2003;53(Suppl 6):334-339). Most attempts to refine comparative models lead to degradation rather than improvement in model quality, so most current comparative modeling procedures omit the refinement step. However, it has been shown that even in the absence of alignment errors and using optimal templates, methods based on a single template have intrinsic limitations, and that refinement is needed to improve model accuracy. It is thought that failure of current methods originates on one hand from the inaccuracy of the effective free energy functions adopted, which do not represent properly the energetic balance in the native state, and on the other hand from the difficulty to sample the high dimensional and rugged free energy landscape of protein folding, in the search for the global minimum. Here, we address this second issue. We define the evolutionary and vibrational armonics subspace (EVA), a reduced sampling subspace that consists of a combination of evolutionarily favored directions, defined by the principal components of the structural variation within a homologous family, plus topologically favored directions, derived from the low frequency normal modes of the vibrational dynamics, up to 50 dimensions. This subspace is accurate enough so that the cores of most proteins can be represented within 1 A accuracy, and reduced enough so that Replica Exchange Monte Carlo (Hukushima and Nemoto, J Phys Soc Jpn 1996;65:1604-1608; Hukushima et al., Int J Mod Phys C: Phys Comput 1996;7:337-344; Mitsutake et al., J Chem Phys 2003;118:6664-6675; Mitsutake et al., J Chem Phys 2003;118:6676-6688) (REMC) can be applied. REMC is one of the best sampling methods currently available, but its applicability is restricted to spaces of small dimensionality. We show that the combination of the EVA subspace and REMC can essentially solve the optimization problem for backbone atoms in the reduced sampling subspace, even for rather rugged free energy landscapes. Applications and limitations of this methodology are finally discussed.  相似文献   

6.
Pfam family DUF1023 consists entirely of uncharacterized proteins generated by sequencing the genomes of Actinobacteria (Bateman A., et al., Nucleic Acids Res. 2004;32 Database issue:D138-141.) Utilizing sequence similarity detection methods, we infer homology between DUF1023 and alpha/beta hydrolases. DUF1023 proteins conserve the core secondary structures in alpha/beta hydrolase fold, and share similar catalytic machinery as that of alpha/beta hydrolases. We predict DUF1023 spatial structure and deduce that they function as hydrolases utilizing catalytic Ser-His-Asp triad with the serine as a nucleophile.  相似文献   

7.
Scott KA  Daggett V 《Biochemistry》2007,46(6):1545-1556
The problem of how a protein folds from a linear chain of amino acids to the three-dimensional structure necessary for function is often investigated using proteins with a low degree of sequence identity that adopt different folds. The design of pairs of proteins with a high degree of sequence identity but different folds offers the opportunity for a complementary study; in two highly similar sequences, which residues are the most important in directing folding to a particular structure? Here we use molecular dynamics simulations to characterize the folding-unfolding pathways of a pair of proteins designed by Bryan and co-workers [Alexander, P. A., et al. (2005) Biochemistry 44, 14045-14054; He, Y. N., et al. (2005) Biochemistry 44, 14055-14061]. Despite being 59% identical, the two protein sequences fold to two different structures. The first sequence folds to the alpha+beta protein G structure and the second to the all-alpha-helical protein A structure. We show that the final protein structure is determined early along the folding pathway. In folding to the protein G structure, the single alpha-helix (alpha1) and the beta3-beta4 turn fold early. Formation of the hairpin turn essentially prevents folding to helical structure in this region of the protein. This early structure is then consolidated by formation of long-range hydrophobic interactions between alpha1 and the beta3-beta4 turn. The protein A sequence differs both in the residues that form the beta3-beta4 turn and also in many of the residues that form the early hydrophobic interactions in the protein G structure. Instead, in the protein A sequence, a more hierarchical mechanism is observed, with helices folding before many of the tertiary interactions are formed. We find that small, but critical, sequence differences determine the topology of the protein early along the folding pathway, which help to explain the process by which one fold can evolve into another.  相似文献   

8.
Modeling of protein loops by simulated annealing.   总被引:6,自引:5,他引:1       下载免费PDF全文
A method is presented to model loops of protein to be used in homology modeling of proteins. This method employs the ESAP program of Higo et al. (Higo, J., Collura, V., & Garnier, J., 1992, Biopolymers 32, 33-43) and is based on a fast Monte Carlo simulation and a simulated annealing algorithm. The method is tested on different loops or peptide segments from immunoglobulin, bovine pancreatic trypsin inhibitor, and bovine trypsin. The predicted structure is obtained from the ensemble average of the coordinates of the Monte Carlo simulation at 300 K, which exhibits the lowest internal energy. The starting conformation of the loop prior to modeling is chosen to be completely extended, and a closing harmonic potential is applied to N, CA, C, and O atoms of the terminal residues. A rigid geometry potential of Robson and Platt (1986, J. Mol. Biol. 188, 259-281) with a united atom representation is used. This we demonstrate to yield a loop structure with good hydrogen bonding and torsion angles in the allowed regions of the Ramachandran map. The average accuracy of the modeling evaluated on the eight modeled loops is 1 A root mean square deviation (rmsd) for the backbone atoms and 2.3 A rmsd for all heavy atoms.  相似文献   

9.
Reva B  Finkelstein A  Topiol S 《Proteins》2002,47(2):180-193
We present a new method for more accurate modeling of protein structure, called threading with chemostructural restrictions. This method addresses those cases in which a target sequence has only remote homologues of known structure for which sequence comparison methods cannot provide accurate alignments. Although remote homologues cannot provide an accurate model for the whole chain, they can be used in constructing practically useful models for the most conserved-and often the most interesting-part of the structure. For many proteins of interest, one can suggest certain chemostructural patterns for the native structure based on the available information on the structural superfamily of the protein, the type of activity, the sequence location of the functionally significant residues, and other factors. We use such patterns to restrict (1) a number of possible templates, and (2) a number of allowed chain conformations on a template. The latter restrictions are imposed in the form of additional template potentials (including terms acting as sequence anchors) that act on certain residues. This approach is tested on remote homologues of alpha/beta-hydrolases that have significant structural similarity in the positions of their catalytic triads. The study shows that, in spite of significant deviations between the model and the native structures, the surroundings of the catalytic triad (positions of C(alpha) atoms of 20-30 nearby residues) can be reproduced with accuracy of 2-3 A. We then apply the approach to predict the structure of dipeptidylpeptidase IV (DPP-IV). Using experimentally available data identifying the catalytic triad residues of DPP-IV (David et al., J Biol Chem 1993;268:17247-17252); we predict a model structure of the catalytic domain of DPP-IV based on the 3D fold of prolyl oligopeptidase (Fulop et al., Cell 1998;94:161-170) and use this structure for modeling the interaction of DPP-IV with inhibitor.  相似文献   

10.
Manfred J. Sippl 《Proteins》1993,17(4):355-362
A major problem in the determination of the three-dimensional structure of proteins concerns the quality of the structural models obtained from the interpretation of experimental data. New developments in X-ray crystallography and nuclear magnetic resonance spectroscopy have acceleratedd the process of structure determination and the biological community is confronted with a steadily increasing number of experimentally determined protein folds. However, in the recent past several experimentally determined protein structures have been proven to contain major errors, indicating that in some cases the interpretation of experimental data is difficult and may yield incorrect models. Such problems can be avoided when computational methods are employed which complement experimental structure determinations. A prerequisite of such computational tools is that they are independent of the parameters obtained from a particular experiment. In addition such techniques are able to support and accelerate experimental structure determinations. Here we present techniques based on knowledge based mean fields which can be used to judge the quality of protein folds. The methods can be used to identify misfolded structures as well as faulty parts of structural models. The techniques are even applicable in cases where only the Cα trace of a protein conformation is available. The capabilities of the technique are demonstrated using correct and incorrect protein folds. © 1993 Wiley-Liss, Inc.  相似文献   

11.
Gao C  Stern HA 《Proteins》2007,68(1):67-75
We perform a systematic examination of the ability of several different high-resolution, atomic-detail scoring functions to discriminate native conformations of loops in membrane proteins from non-native but physically reasonable, or "decoy," conformations. Decoys constructed from changing a loop conformation while keeping the remainder of the protein fixed are a challenging test of energy function accuracy. Nevertheless, the best of the energy functions we examined recognized the native structure as lowest in energy around half the time, and consistently chose it as a low-energy structure. This suggests that the best of present energy functions, even without a representation of the lipid bilayer, are of sufficient accuracy to give reasonable confidence in predictions of membrane protein structure. We also constructed homology models for each structure, using other known structures in the same protein family as templates. Homology models were constructed using several scoring functions and modeling programs, but with a comparable sampling effort for each procedure. Our results indicate that the quality of sequence alignment is probably the most important factor in model accuracy for sequence identity from 20-40%; one can expect a reasonably accurate model for membrane proteins when sequence identity is greater than 30%, in agreement with previous studies. Most errors are localized in loop regions, which tend to be found outside the lipid bilayer. For the most discriminative energy functions, it appears that errors are most likely due to lack of sufficient sampling, although it should be stressed that present energy functions are still far from perfectly reliable.  相似文献   

12.
Protein disulfide isomerase (PDI), a luminal enzyme of the endoplasmic reticulum (ER), is thought to be involved in the process that assures that the correct disulfide bonds form as a newly synthesized protein folds into its appropriate three-dimensional structure (Freeman, 1984). In recent years, the ER has been shown to have at least two additional, distinct PDI-related luminal proteins (Bennett et al., 1988; Mazzarella et al., 1990). As a potential first step toward an investigation of the structure and function of PDI and of the PDI-related proteins as well, we have developed a bacterial expression system in Escherichia coli capable of synthesizing significant levels of enzymatically active PDI under the control of the inducible tac promoter. We have observed that the use of this bacterial expression system is complicated by the fact that there is a significant amount of internal initiation of protein synthesis within the PDI coding sequence and the fact that all of the PDI-related expression products are found equally distributed between the cytoplasmic and periplasmic fractions due to a single peptide-independent mechanism. Our studies with this system have demonstrated that at least some truncated PDI molecules containing the carboxy-terminal most active site have significant PDI activity.  相似文献   

13.
Most proteins found in the outer membrane of gram-negative bacteria share a common domain: the transmembrane β-barrel. These outer membrane β-barrels (OMBBs) occur in multiple sizes and different families with a wide range of functions evolved independently by amplification from a pool of homologous ancestral ββ-hairpins. This is part of the reason why predicting their three-dimensional (3D) structure, especially by homology modeling, is a major challenge. Recently, DeepMind's AlphaFold v2 (AF2) became the first structure prediction method to reach close-to-experimental atomic accuracy in CASP even for difficult targets. However, membrane proteins, especially OMBBs, were not abundant during their training, raising the question of how accurate the predictions are for these families. In this study, we assessed the performance of AF2 in the prediction of OMBBs and OMBB-like folds of various topologies using an in-house-developed tool for the analysis of OMBB 3D structures, and barrOs. In agreement with previous studies on other membrane protein classes, our results indicate that AF2 predicts transmembrane β-barrel structures at high accuracy independently of the use of templates, even for novel topologies absent from the training set. These results provide confidence on the models generated by AF2 and open the door to the structural elucidation of novel transmembrane β-barrel topologies identified in high-throughput OMBB annotation studies or designed de novo.  相似文献   

14.
Large-scale sequencing projects are widening the gap between the known protein universe and the fraction for which structural information has been experimentally obtained. Through the application of homology (comparative) modeling and more general structure prediction techniques, this gap can, however, be narrowed, providing indirect structural information for a considerable number of proteins. Moreover, the estimated number of existing protein folds seems to be limited and many of these yet unknown folds should be discovered by dedicated large-scale structural genomics projects. Within this perspective, homology (comparative) modeling will gain in importance, as will the use of models derived by this technique. Here we discuss how well a sequence alignment, the most common starting point for generating a model, reflects the structural conservation between homologous proteins and we show that sequence information is able to direct construction of acceptable models as far as the structural core is concerned. We also show here that the regions surrounding insertions and deletions are much less conserved than the core and discuss the implications of this observation for loop modeling.  相似文献   

15.
Cuticular proteins are one of the determinants of the physical properties of cuticle. A common consensus region (extended R&R Consensus) in these proteins binds to chitin, the other major component of cuticle. We previously predicted the preponderance of beta-pleated sheet in the consensus region and proposed its responsibility for the formation of helicoidal cuticle (Iconomidou et al., Insect Biochem. Mol. Biol. 29 (1999) 285). Subsequently, we verified experimentally the abundance of antiparallel beta-pleated sheet in the structure of cuticle proteins (Iconomidou et al., Insect Biochem. Mol. Biol. 31 (2001) 877). Homology modelling of soft (RR-1) cuticular proteins using bovine plasma retinol binding protein (RBP) as a template revealed an antiparallel beta-sheet half-barrel structure as the basic folding motif (Hamodrakas et al., Insect Biochem. Molec. Biol. 32 (2002) 1577). The RR-2 proteins characteristic of hard cuticle, have a far more conserved consensus and frequently more histidine residues. Extension of modelling to this class of consensus, in this work, reveals in detail several unique features of the proposed structural model to serve as a chitin binding structural motif, thus providing the basis for elucidating cuticle's overall architecture and chitin-protein interactions in cuticle.  相似文献   

16.
L Holm  C Sander 《Proteins》1992,14(2):213-223
An unknown protein structure can be predicted with fair accuracy once an evolutionary connection at the sequence level has been made to a protein of known 3-D structure. In model building by homology, one typically starts with a backbone framework, rebuilds new loop regions, and replaces nonconserved side chains. Here, we use an extremely efficient Monte Carlo algorithm in rotamer space with simulated annealing and simple potential energy functions to optimize the packing of side chains on given backbone models. Optimized models are generated within minutes on a workstation, with reasonable accuracy (average of 81% side chain chi 1 dihedral angles correct in the cores of proteins determined at better than 2.5 A resolution). As expected, the quality of the models decreases with decreasing accuracy of backbone coordinates. If the back-bone was taken from a homologous rather than the same protein, about 70% side chain chi 1 angles were modeled correctly in the core in a case of strong homology and about 60% in a case of medium homology. The algorithm can be used in automated, fast, and reproducible model building by homology.  相似文献   

17.
Protein structure prediction remains an unsolved problem. Since prediction of the native structure seems very difficult, one usually tries to predict the correct fold of a protein. Here the "fold" is defined by the approximate backbone structure of the protein. However, physicochemical factors that determine the correct fold are not well understood. It has recently been reported that molecular mechanics energy functions combined with effective solvent terms can discriminate the native structures from misfolded ones. Using such a physicochemical energy function, we studied the factors necessary for discrimination of correct and incorrect folds. We first selected correct and incorrect folds by a conventional threading method. Then, all-atom models of those folds were constructed by simply minimizing the atomic overlaps. The constructed correct model representing the native fold has almost the same backbone structure as the native structure but differs in side-chain packing. Finally, the energy values of the constructed models were compared with that of the experimentally determined native structure. The correct model as well as the native structure showed lower energy than misfolded models. However, a large energy gap was found between the native structure and the correct model. By decomposing the energy values into their components, it was found that solvent effects such as the hydrophobic interaction or solvent shielding and the Born energy stabilized the correct model rather than the native structure. The large energetic stabilization of the native structure was attained by specific side-chain packing. The stabilization by solvent effects is small compared to that by side-chain packing. Therefore, it is suggested that in order to confidently predict the correct fold of a protein, it is also necessary to predict correct side-chain packing.  相似文献   

18.
Protein fold recognition using sequence-derived predictions.   总被引:18,自引:9,他引:9       下载免费PDF全文
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.  相似文献   

19.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

20.
Fischer D 《Proteins》2003,51(3):434-441
To gain a better understanding of the biological role of proteins encoded in genome sequences, knowledge of their three-dimensional (3D) structure and function is required. The computational assignment of folds is becoming an increasingly important complement to experimental structure determination. In particular, fold-recognition methods aim to predict approximate 3D models for proteins bearing no sequence similarity to any protein of known structure. However, fully automated structure-prediction methods can currently produce reliable models for only a fraction of these sequences. Using a number of semiautomated procedures, human expert predictors are often able to produce more and better predictions than automated methods. We describe a novel, fully automatic, fold-recognition meta-predictor, named 3D-SHOTGUN, which incorporates some of the strategies human predictors have successfully applied. This new method is reminiscent of the so-called cooperative algorithms of Computer Vision. The input to 3D-SHOTGUN are the top models predicted by a number of independent fold-recognition servers. The meta-predictor consists of three steps: (i) assembly of hybrid models, (ii) confidence assignment, and (iii) selection. We have applied 3D-SHOTGUN to an unbiased test set of 77 newly released protein structures sharing no sequence similarity to proteins previously released. Forty-six correct rank-1 predictions were obtained, 30 of which had scores higher than that of the first incorrect prediction-a significant improvement over the performance of all individual servers. Furthermore, the predicted hybrid models were, on average, more similar to their corresponding native structures than those produced by the individual servers. This opens the possibility of generating more accurate, full-atom homology models for proteins with no sequence similarity to proteins of known structure. These improvements represent a step forward toward the wider applicability of fully automated structure-prediction methods at genome scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号