首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
    
Taylor WR  Jonassen I 《Proteins》2004,56(2):222-234
A method (SPREK) was developed to evaluate the register of a sequence on a structure based on the matching of structural patterns against a library derived from the protein structure databank. The scores obtained were normalized against random background distributions derived from sequence shuffling and permutation methods. 'Random' structures were also used to evaluate the effectiveness of the method. These were generated by a simple random-walk and a more sophisticated structure prediction method that produced protein-like folds. For comparison with other methods, the performance of the method was assessed using collections of models including decoys and models from the CASP-5 exercise. The performance of SPREK on the decoy models was equivalent to (and sometimes better than) those obtained with more complex approaches. An exception was the two smallest proteins, for which SPREK did not perform well due to a lack of patterns. Using the best parameter combination from trials on decoy models, the CASP models of intermediate difficulty were evaluated by SPREK and the quality of the top scoring model was evaluated by its CASP ranking. Of the 14 targets in this class, half lie in the top 10% (out of around 140 models for each target). The two worst rankings resulted from the selection by our method of a well-packed model that was based on the wrong fold. Of the other poor rankings, one was the smallest protein and the others were the four largest (all over 250 residues).  相似文献   

2.
The ability to separate correct models of protein structures from less correct models is of the greatest importance for protein structure prediction methods. Several studies have examined the ability of different types of energy function to detect the native, or native-like, protein structure from a large set of decoys. In contrast to earlier studies, we examine here the ability to detect models that only show limited structural similarity to the native structure. These correct models are defined by the existence of a fragment that shows significant similarity between this model and the native structure. It has been shown that the existence of such fragments is useful for comparing the performance between different fold recognition methods and that this performance correlates well with performance in fold recognition. We have developed ProQ, a neural-network-based method to predict the quality of a protein model that extracts structural features, such as frequency of atom-atom contacts, and predicts the quality of a model, as measured either by LGscore or MaxSub. We show that ProQ performs at least as well as other measures when identifying the native structure and is better at the detection of correct models. This performance is maintained over several different test sets. ProQ can also be combined with the Pcons fold recognition predictor (Pmodeller) to increase its performance, with the main advantage being the elimination of a few high-scoring incorrect models. Pmodeller was successful in CASP5 and results from the latest LiveBench, LiveBench-6, indicating that Pmodeller has a higher specificity than Pcons alone.  相似文献   

3.
We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

4.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

5.
    
Zhou H  Zhou Y 《Proteins》2005,58(2):321-328
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.  相似文献   

6.
    
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.  相似文献   

7.
    
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user‐specified global root‐mean‐squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed‐forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state‐of‐the‐art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.  相似文献   

8.
    
In the fold recognition approach to structure prediction, a sequence is tested for compatibility with an already known fold. For membrane proteins, however, few folds have been determined experimentally. Here the feasibility of computing the vast majority of likely membrane protein folds is tested. The results indicate that conformation space can be effectively sampled for small numbers of helices. The vast majority of potential monomeric membrane protein structures can be represented by about 30-folds for three helices, but increases exponentially to about 1,500,000 folds for seven helices. The generated folds could serve as templates for fold recognition or as starting points for conformational searches that are well distributed throughout conformation space.  相似文献   

9.
Stephen H. Bryant 《Proteins》1996,26(2):172-185
Threading experiments with proteins from the globin family provide an indication of the nature of the structural similarity required for successful fold recognition and accurate sequence-structure alignment. Threading scores are found to rise above the noise of false positives whenever roughly 60% of residues from a sequence can be aligned with analogous sites in the structure of a remote homolog. Fold recognition specificity thus appears to be limited by the extent of structural similarity, regardless of the degree of sequence similarity. Threading alignment accuracy is found to depend more critically on the degree of structural similarity. Alignments are accurate, placing the majority of residues exactly as in structural alignment, only when superposition residuals are less than 2.5 Å. These criteria for successful recognition and sequence-structure alignment appear to be consistent with the successes and failures of threading methods in blind structure prediction. They also suggest a direct assay for improved threading methods: Potentials and alignment models should be tested for their ability to detect less extensive structural similarities, and to produce accurate alignments when superposition residuals for this conserved “core” fall in the range characteristic of remote homologs. © 1996 Wiley-Liss, Inc.
  • 1 This article is a US Government work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    10.
        
    The manganese-stabilizing protein (PsbO) is an essential component of photosystem II (PSII) and is present in all oxyphotosynthetic organisms. PsbO allows correct water splitting and oxygen evolution by stabilizing the reactions driven by the manganese cluster. Despite its important role, its structure and detailed functional mechanism are still unknown. In this article we propose a structural model based on fold recognition and molecular modeling. This model has additional support from a study of the distribution of characteristics of the PsbO sequence family, such as the distribution of conserved, apolar, tree-determinants, and correlated positions. Our threading results consistently showed PsbO as an all-beta (beta) protein, with two homologous beta domains of approximately 120 amino acids linked by a flexible Proline-Glycine-Glycine (PGG) motif. These features are compatible with a general elongated and flexible architecture, in which the two domains form a sandwich-type structure with Greek key topology. The first domain is predicted to include 8 to 9 beta-strands, the second domain 6 to 7 beta-strands. An Ig-like beta-sandwich structure was selected as a template to build the 3-D model. The second domain has, between the strands, long-loops rich in Pro and Gly that are difficult to model. One of these long loops includes a highly conserved region (between P148 and P174) and a short alpha-helix (between E181 and N188)). These regions are characteristic parts of PsbO and show that the second domain is not so similar to the template. Overall, the model was able to account for much of the experimental data reported by several authors, and it would allow the detection of key residues and regions that are proposed in this article as essential for the structure and function of PsbO.  相似文献   

    11.
    12.
        
    The threading approach to protein structure prediction suffers from the limited number of substantially different folds available as templates. A method is presented for the generation of artificial protein structures, amenable to threading, by modification of native ones. The artificial structures so generated are compared to the native ones and it is shown that, within the accuracy of the pseudoenergy function or force field used, these two types of structures appear equally useful for threading. Since a multitude of pseudonative artificial structures can be generated per native structure, the pool of pseudonative template structures for threading can be enormously enlarged by the inclusion of the pseudonative artificial structures. Proteins 28:522–529, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

    13.
        
    Paul Mach  Patrice Koehl 《Proteins》2013,81(9):1556-1570
    It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self‐consistent mean field approach, and score the fitness of the corresponding models using a semi‐empirical physical potential. Sequences designed for one template are translated into a hidden Markov model‐based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E‐value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; © 2013 Wiley Periodicals, Inc.  相似文献   

    14.
    15.
        
    Hassan SA  Mehler EL 《Proteins》2002,47(1):45-61
    An analysis of the screened Coulomb potential--implicit solvent model (SCP--ISM) is presented showing that general equations for both the electrostatic and solvation free energy can be derived in a continuum approach, using statistical averaging of the polarization field created by the solvent around the molecule. The derivation clearly shows how the concept of boundary, usually found in macroscopic approaches, is eliminated when the continuum model is obtained from a microscopic treatment using appropriate averaging techniques. The model is used to study the alanine dipeptide in aqueous solution, as well as the discrimination of native protein structures from misfolded conformations. For the alanine dipeptide the free energy surface in the phi--psi space is calculated and compared with recently reported results of a detailed molecular dynamics simulation using an explicit representation of the solvent, and with other available data. The study showed that the results obtained using the SCP--ISM are comparable to those of the explicit water calculation and compares favorably to the FDPB approach. Both transition states and energy minima show a high correlation (r > 0.98) with the results obtained in the explicit water analysis. The study of the misfolded structures of proteins comprised the analysis of three standard decoy sets, namely, the EMBL, Park and Levitt, and Baker's CASP3 sets. In all cases the SCP--ISM discriminated well the native structures of the proteins, and the best-predicted structures were always near-native (cRMSD approximately 2 A).  相似文献   

    16.
    The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.  相似文献   

    17.
    The dispositions of 39 alpha helices of greater than 2.5 turns and four beta sheets in the major capsid protein (VP5, 149 kDa) of herpes simplex virus type 1 were identified by computational and visualization analysis from the 8.5A electron cryomicroscopy structure of the whole capsid. The assignment of helices in the VP5 upper domain was validated by comparison with the recently determined crystal structure of this region. Analysis of the spatial arrangement of helices in the middle domain of VP5 revealed that the organization of a tightly associated bundle of ten helices closely resembled that of a domain fold found in the annexin family of proteins. Structure-based sequence searches suggested that sequences in both the N and C-terminal portions of the VP5 sequence contribute to this domain. The long helices seen in the floor domain of VP5 form an interconnected network within and across capsomeres. The combined structural and sequence-based informatics has led to an architectural model of VP5. This model placed in the context of the capsid provides insights into the strategies used to achieve viral capsid stability.  相似文献   

    18.
        
    Using a combination of theoretical sequence structure recognition predictions and experimental disulfide bond assignments, a three-dimensional (3D) model of human interleukin-7 (hIL-7) was constructed that predicts atypical surface chemistry in helix D that is important for receptor activation. A 3D model of hIL-7 was built using the X-ray crystal structure of interleukin-4 (IL-4) as a template (Walter MR et al., 1992, J Mol Biol. 224:1075-1085; Walter MR et al., 1992, J Biol Chem 267:20371-20376). Core secondary structures were constructed from sequences of hIL-7 predicted to form helices. The model was constructed by superimposing IL-7 helices onto the IL-4 template and connecting them together in an up-up down-down topology. The model was finished by incorporating the disulfide bond assignments (Cys3, Cys142), (Cys35, Cys130), and (Cys48, Cys93), which were determined by MALDI mass spectroscopy and site-directed mutagenesis (Cosenza L, Sweeney E, Murphy JR, 1997, J Biol Chem 272:32995-33000). Quality analysis of the hIL-7 model identified poor structural features in the carboxyl terminus that, when further studied using hydrophobic moment analysis, detected an atypical structural property in helix D, which contains Cys 130 and Cys142. This analysis demonstrated that helix D had a hydrophobic surface exposed to bulk solvent that accounted for the poor quality of the model, but was suggestive of a region in IL-7 that maybe important for protein interactions. Alanine (Ala) substitution scanning mutagenesis was performed to test if the predicted atypical surface chemistry of helix D in the hIL-7 model is important for receptor activation. This analysis resulted in the construction, purification, and characterization of four hIL-7 variants, hIL-7(K121A), hIL-7(L136A), hIL-7(K140A), and hIL-7(W143A), that displayed reduced or abrogated ability to stimulate a murine IL-7 dependent pre-B cell proliferation. The mutant hIL-7(W143A), which is biologically inactive and displaces [125I]-hIL-7, is the first reported IL-7R system antagonist.  相似文献   

    19.
        
    Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone.  相似文献   

    20.
        
    Bastolla U  Porto M  Ortíz AR 《Proteins》2008,71(1):278-299
    We adopt a model of inverse folding in which folding stability results from the combination of the hydrophobic effect with local interactions responsible for secondary structure preferences. Site-specific amino acid distributions can be calculated analytically for this model. We determine optimal parameters for the local interactions by fitting the complete inverse folding model to the site-specific amino acid distributions found in the Protein Data Bank. This procedure reduces drastically the influence on the derived parameters of the preference of different secondary structures for buriedness, which affects local interaction parameters determined through the standard approach based on amino acid propensities. The quality of the fit is evaluated through the likelihood of the observed amino acid distributions given the model and the Bayesian Information Criterion, which indicate that the model with optimal local interaction parameters is strongly preferable to the model where local interaction parameters are determined through propensities. The optimal model yields a mean correlation coefficient r = 0.96 between observed and predicted amino acid distributions. The local interaction parameters are then tested in threading experiments, in combination with contact interactions, for their capacity to recognize the native structure and structures similar to the native against unrelated ones. In a challenging test, proteins structurally aligned with the Mammoth algorithm are scored with the effective free energy function. The native structure gets the highest stability score in 100% of the cases, a high recognition rate comparable to that achieved against easier decoys generated by gapless threading. We then examine proteins for which at least one highly similar template exists. In 61% of the cases, the structure with the highest stability score excluding the native belongs to the native fold, compared to 60% if we use local interaction parameters derived from the usual amino acid propensities and 52% if we use only contact interactions. A highly similar structure is present within the five best stability scores in 82%, 81%, and 76% of the cases, for local interactions determined through inverse folding, through propensity, and set to zero, respectively. These results indicate that local interactions improve substantially the performances of contact free energy functions in fold recognition, and that similar structures tend to get high stability scores, although they are often not high enough to discriminate them from unrelated structures. This work highlights the importance to apply more challenging tests, as the recognition of homologous structures, for testing stability scores for protein folding.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号