首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.  相似文献   

2.
Qiu J  Elber R 《Proteins》2005,61(1):44-55
Atomically detailed potentials for recognition of protein folds are presented. The potentials consist of pair interactions between atoms. One or three distance steps are used to describe the range of interactions between a pair. Training is carried out with the mathematical programming approach on the decoy sets of Baker, Levitt, and some of our own design. Recognition is required not only for decoy-native structural pairs but also for pairs of decoy and homologous structures. Performance is tested on the targets of CASP5 using templates from the Protein Data Bank, on two test ab initio decoy sets from Skolnick's laboratory, and on decoy sets from Moult's laboratory. We conclude that the newly derived potentials have significant recognition capacity, comparable to the best models derived from other techniques. The new potentials require a significantly smaller number of parameters. The enhanced recognition capacity extends primarily to the identification of structures generated by ab initio simulation and less to the recognition of approximate shapes created by homology.  相似文献   

3.
We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

4.
Crippen GM 《Proteins》2005,60(1):82-89
Cluster distance geometry is a recent generalization of distance geometry whereby protein structures can be described at even lower levels of detail than one point per residue. With improvements in the clustering technique, protein conformations can be summarized in terms of alternative contact patterns between clusters, where each cluster contains four sequentially adjacent amino acid residues. A very simple potential function involving 210 adjustable parameters can be determined that favors the native contacts of 31 small, monomeric proteins over their respective sets of nonnative contacts. This potential then favors the native contacts for 174 small, monomeric proteins that have low sequence identity with any of the training set. A broader search finds 698 small protein chains from the Protein Data Bank where the native contacts are preferred over all alternatives, even though they have low sequence identity with the training set. This amounts to a highly predictive method for ab initio protein folding at low spatial resolution.  相似文献   

5.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

6.
Vincent J. Hilser 《Proteins》2016,84(4):435-447
Knowing the determinants of conformational specificity is essential for understanding protein structure, stability, and fold evolution. To address this issue, a novel statistical measure of energetic compatibility between sequence and structure was developed using an experimentally validated model of the energetics of the native state ensemble. This approach successfully matched sequences from a diverse subset of the human proteome to their respective folds. Unexpectedly, significant energetic compatibility between ostensibly unrelated sequences and structures was also observed. Interrogation of these matches revealed a general framework for understanding the origins of conformational specificity within a proteome: specificity is a complex function of both the ability of a sequence to adopt folds other than the native, and ability of a fold to accommodate sequences other than the native. The regional variation in energetic compatibility indicates that the compatibility is dominated by incompatibility of sequence for alternative fold segments, suggesting that evolution of protein sequences has involved substantial negative selection, with certain segments serving as “gatekeepers” that presumably prevent alternative structures. Beyond these global trends, a size dependence exists in the degree to which the energetic compatibility is determined from negative selection, with smaller proteins displaying more negative selection. This partially explains how short sequences can adopt unique folds, despite the higher probability in shorter proteins for small numbers of mutations to increase compatibility with other folds. In providing evolutionary ground rules for the thermodynamic relationship between sequence and fold, this framework imparts valuable insight for rational design of unique folds or fold switches. Proteins 2016; 84:435–447. © 2016 Wiley Periodicals, Inc.  相似文献   

7.
Bastolla U  Porto M  Ortíz AR 《Proteins》2008,71(1):278-299
We adopt a model of inverse folding in which folding stability results from the combination of the hydrophobic effect with local interactions responsible for secondary structure preferences. Site-specific amino acid distributions can be calculated analytically for this model. We determine optimal parameters for the local interactions by fitting the complete inverse folding model to the site-specific amino acid distributions found in the Protein Data Bank. This procedure reduces drastically the influence on the derived parameters of the preference of different secondary structures for buriedness, which affects local interaction parameters determined through the standard approach based on amino acid propensities. The quality of the fit is evaluated through the likelihood of the observed amino acid distributions given the model and the Bayesian Information Criterion, which indicate that the model with optimal local interaction parameters is strongly preferable to the model where local interaction parameters are determined through propensities. The optimal model yields a mean correlation coefficient r = 0.96 between observed and predicted amino acid distributions. The local interaction parameters are then tested in threading experiments, in combination with contact interactions, for their capacity to recognize the native structure and structures similar to the native against unrelated ones. In a challenging test, proteins structurally aligned with the Mammoth algorithm are scored with the effective free energy function. The native structure gets the highest stability score in 100% of the cases, a high recognition rate comparable to that achieved against easier decoys generated by gapless threading. We then examine proteins for which at least one highly similar template exists. In 61% of the cases, the structure with the highest stability score excluding the native belongs to the native fold, compared to 60% if we use local interaction parameters derived from the usual amino acid propensities and 52% if we use only contact interactions. A highly similar structure is present within the five best stability scores in 82%, 81%, and 76% of the cases, for local interactions determined through inverse folding, through propensity, and set to zero, respectively. These results indicate that local interactions improve substantially the performances of contact free energy functions in fold recognition, and that similar structures tend to get high stability scores, although they are often not high enough to discriminate them from unrelated structures. This work highlights the importance to apply more challenging tests, as the recognition of homologous structures, for testing stability scores for protein folding.  相似文献   

8.
The manganese-stabilizing protein (PsbO) is an essential component of photosystem II (PSII) and is present in all oxyphotosynthetic organisms. PsbO allows correct water splitting and oxygen evolution by stabilizing the reactions driven by the manganese cluster. Despite its important role, its structure and detailed functional mechanism are still unknown. In this article we propose a structural model based on fold recognition and molecular modeling. This model has additional support from a study of the distribution of characteristics of the PsbO sequence family, such as the distribution of conserved, apolar, tree-determinants, and correlated positions. Our threading results consistently showed PsbO as an all-beta (beta) protein, with two homologous beta domains of approximately 120 amino acids linked by a flexible Proline-Glycine-Glycine (PGG) motif. These features are compatible with a general elongated and flexible architecture, in which the two domains form a sandwich-type structure with Greek key topology. The first domain is predicted to include 8 to 9 beta-strands, the second domain 6 to 7 beta-strands. An Ig-like beta-sandwich structure was selected as a template to build the 3-D model. The second domain has, between the strands, long-loops rich in Pro and Gly that are difficult to model. One of these long loops includes a highly conserved region (between P148 and P174) and a short alpha-helix (between E181 and N188)). These regions are characteristic parts of PsbO and show that the second domain is not so similar to the template. Overall, the model was able to account for much of the experimental data reported by several authors, and it would allow the detection of key residues and regions that are proposed in this article as essential for the structure and function of PsbO.  相似文献   

9.
Paul Mach  Patrice Koehl 《Proteins》2013,81(9):1556-1570
It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self‐consistent mean field approach, and score the fitness of the corresponding models using a semi‐empirical physical potential. Sequences designed for one template are translated into a hidden Markov model‐based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E‐value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; © 2013 Wiley Periodicals, Inc.  相似文献   

10.
11.
Chen H  Kihara D 《Proteins》2011,79(1):315-334
Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods.  相似文献   

12.
Zhou H  Zhou Y 《Proteins》2004,55(4):1005-1013
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.  相似文献   

13.
To explore whether the generation of new protein folds could be linked to metallic cofactor recruitment, we identified the oldest examples of folds for manganese, iron, zinc, and copper proteins by analyzing their fold‐domain mapping patterns. We discovered that the generation of these folds was tightly coupled to corresponding metals. We found that the emerging order for these folds, i.e., manganese and iron protein folds appeared earlier than zinc and copper counterparts, coincides with the putative bioavailability of the corresponding metals in the ancient anoxic ocean. Therefore, we conclude that metallic cofactors, like organic cofactors, play an evolutionary role in the formation of new protein folds. This link could be explained by the emergence of protein structures with novel folds that could fulfill the new protein functions introduced by the metallic cofactors. These findings not only have important implications for understanding the evolutionary mechanisms of protein architectures, but also provide a further interpretation for the evolutionary story of superoxide dismutases.  相似文献   

14.
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.  相似文献   

15.
Loose C  Klepeis JL  Floudas CA 《Proteins》2004,54(2):303-314
A new force field for pairwise residue interactions as a function of C(alpha) to C(alpha) distances is presented. The force field was developed through the solution of a linear programming formulation with large sets of constraints. The constraints are based on the construction of >80,000 low-energy decoys for a set of proteins and requiring the decoy energies for each protein system to be higher than the native conformation of that particular protein. The generation of a robust force field was facilitated by the use of a novel decoy generation process, which involved the rational selection of proteins to add to the training set and included a significant energy minimization of the decoys. The force field was tested on a large set of decoys for various proteins not included in the training set and shown to perform well compared with a leading force field in identifying the native conformation for these proteins.  相似文献   

16.
On the study of protein inverse folding problem, one goal is to find simple and efficient potential to evaluate the compatibility between structure and a given sequence. We present here a novo empirical mean force potential to address the importance of electrostatic interactions in protein inverse folding study. It is based on protein main chain polar fraction and constructed in a way similar with Sippl's from a database of 64 known independent three-dimensional protein structures. This potential was applied to recognize the protein native conformations among a conformation pool. Calculated results show that this potential is powerful in picking out native conformations, in addition it can also find structure similarity between proteins with low sequence similarity. The success of this new potential clearly shows the importance of electrostatic factors in protein inverse folding studies. © 1995 Wiley-Liss, Inc.  相似文献   

17.
Shirota M  Ishida T  Kinoshita K 《Proteins》2011,79(5):1550-1563
In protein structure prediction, it is crucial to evaluate the degree of native-likeness of given model structures. Statistical potentials extracted from protein structure data sets are widely used for such quality assessment problems, but they are only applicable for comparing different models of the same protein. Although various other methods, such as machine learning approaches, were developed to predict the absolute similarity of model structures to the native ones, they required a set of decoy structures in addition to the model structures. In this paper, we tried to reformulate the statistical potentials as absolute quality scores, without using the information from decoy structures. For this purpose, we regarded the native state and the reference state, which are necessary components of statistical potentials, as the good and bad standard states, respectively, and first showed that the statistical potentials can be regarded as the state functions, which relate a model structure to the native and reference states. Then, we proposed a standardized measure of protein structure, called native-likeness, by interpolating the score of a model structure between the native and reference state scores defined for each protein. The native-likeness correlated with the similarity to the native structures and discriminated the native structures from the models, with better accuracy than the raw score. Our results show that statistical potentials can quantify the native-like properties of protein structures, if they fully utilize the statistical information obtained from the data set.  相似文献   

18.
Reddy BV  Li WW  Bourne PE 《Biopolymers》2002,64(3):139-145
By using three-dimensional (3D) structure alignments and a previously published method to determine Conserved Key Amino Acid Positions (CKAAPs) we propose a theoretical method to design mutations that can be used to morph the protein folds. The original Paracelsus challenge, met by several groups, called for the engineering of a stable but different structure by modifying less than 50% of the amino acid residues. We have used the sequences from the Protein Data Bank (PDB) identifiers 1ROP, and 2CRO, which were previously used in the Paracelsus challenge by those groups, and suggest mutation to CKAAPs to morph the protein fold. The total number of mutations suggested is less than 40% of the starting sequence theoretically improving the challenge results. From secondary structure prediction experiments of the proposed mutant sequence structures, we observe that each of the suggested mutant protein sequences likely folds to a different, non-native potentially stable target structure. These results are an early indicator that analyses using structure alignments leading to CKAAPs of a given structure are of value in protein engineering experiments.  相似文献   

19.
UreF is a protein that plays a role in the in vivo urease activation as a chaperone involved in the insertion of two Ni(2+) ions in the apo-urease active site. The molecular details of this process are unknown. In the absence of any molecular information on the UreF protein class, and as a step toward the comprehension of the relationships between UreF function and structure, we applied a structural modeling approach to infer useful biochemical knowledge on Bacillus pasteurii UreF (BpUreF). Similarity searches and multiple alignment of UreF protein sequences indicated that this class of proteins has a low homology with proteins of known structure. Fold recognition methods were therefore used to identify useful protein structural templates to model the structure of BpUreF. In particular, the templates belong to the class of GTPase-activating proteins. Modeling of BpUreF based on these templates was performed using the program MODELLER. The structure validation yielded good statistics, indicating that the model is plausible. This result suggests a role for UreF in urease active site biosynthesis as a regulator of the activity of UreG, a small G protein involved in the in vivo apo-urease activation process and established to catalyze GTP hydrolysis.  相似文献   

20.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号