首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Liang S  Grishin NV 《Proteins》2004,54(2):271-281
We have developed an effective scoring function for protein design. The atomic solvation parameters, together with the weights of energy terms, were optimized so that residues corresponding to the native sequence were predicted with low energy in the training set of 28 protein structures. The solvation energy of non-hydrogen-bonded hydrophilic atoms was considered separately and expressed in a nonlinear way. As a result, our scoring function predicted native residues as the most favorable in 59% of the total positions in 28 proteins. We then tested the scoring function by comparing the predicted stability changes for 103 T4 lysozyme mutants with the experimental values. The correlation coefficients were 0.77 for surface mutations and 0.71 for all mutations. Finally, the scoring function combined with Monte Carlo simulation was used to predict favorable sequences on a fixed backbone. The designed sequences were similar to the natural sequences of the family to which the template structure belonged. The profile of the designed sequences was helpful for identification of remote homologues of the native sequence.  相似文献   

2.
Sun JM  Li TH  Cong PS  Tang SN  Xiong WW 《Molecular & cellular proteomics : MCP》2012,11(7):M111.016808-M111.016808-8
Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.  相似文献   

3.
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.  相似文献   

4.
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.  相似文献   

5.
Shih CH  Chang CM  Lin YS  Lo WC  Hwang JK 《Proteins》2012,80(6):1647-1657
The knowledge of conserved sequences in proteins is valuable in identifying functionally or structurally important residues. Generating the conservation profile of a sequence requires aligning families of homologous sequences and having knowledge of their evolutionary relationships. Here, we report that the conservation profile at the residue level can be quantitatively derived from a single protein structure with only backbone information. We found that the reciprocal packing density profiles of protein structures closely resemble their sequence conservation profiles. For a set of 554 nonhomologous enzymes, 74% (408/554) of the proteins have a correlation coefficient > 0.5 between these two profiles. Our results indicate that the three-dimensional structure, instead of being a mere scaffold for positioning amino acid residues, exerts such strong evolutionary constraints on the residues of the protein that its profile of sequence conservation essentially reflects that of its structural characteristics.  相似文献   

6.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc.  相似文献   

7.
A new computer program (CORE) is described that predicts core hydrophobic sequences of predetermined target protein structures. A novel scoring function is employed, which for the first time incorporates parameters directly correlated to free energies of unfolding (deltaGu), melting temperatures (Tm), and cooperativity. Metropolis-driven simulated annealing and low-temperature Monte Carlo sampling are used to optimize this score, generating sequences predicted to yield uniquely folded, stable proteins with cooperative unfolding transitions. The hydrophobic core residues of four natural proteins were predicted using CORE with the backbone structure and solvent exposed residues as input. In the two smaller proteins tested (Gbeta1, 11 core amino acids; 434 cro, 10 core amino acids), the native sequence was regenerated as well as the sequence of known thermally stable variants that exhibit cooperative denaturation transitions. Previously designed sequences of variants with lower thermal stability and weaker cooperativity were not predicted. In the two larger proteins tested (myoglobin, 32 core amino acids; methionine aminopeptidase, 63 core amino acids), sequences with corresponding side-chain conformations remarkably similar to that of native were predicted.  相似文献   

8.
9.
One of the classical DNA-binding proteins, bacteriophage lambda Cro, forms a homodimer with a unique fold of alpha-helices and beta-sheets. We have computationally designed an artificial sequence of 60 amino acid residues to stabilize the backbone tertiary structure of the lambda Cro dimer by simulated annealing using knowledge-based structure-sequence compatibility functions. The designed amino acid sequence has 25% identity with that of natural lambda Cro and preserves Phe58, which is important for formation of the stably folded structure of lambda Cro. The designed dimer protein and its monomeric variant, which was redesigned by the insertion of a beta-hairpin sequence at the C-terminal region to prevent dimerization, were synthesized and biochemically characterized to be well folded. The designed protein was monomeric under a wide range of protein concentrations and its solution structure was determined by NMR spectroscopy. The solved structure is similar to that of a monomeric variant of natural lambda Cro with a root-mean-square deviation of the polypeptide backbones at 2.1A and has a well-packed protein core. Thus, our knowledge-based functions provide approximate but essential relationships between amino acid sequences and protein structures, and are useful for finding novel sequences that are foldable into a given target structure.  相似文献   

10.
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15–25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at .  相似文献   

11.
12.
Multistate computational protein design (MSD) with backbone ensembles approximating conformational flexibility can predict higher quality sequences than single‐state design with a single fixed backbone. However, it is currently unclear what characteristics of backbone ensembles are required for the accurate prediction of protein sequence stability. In this study, we aimed to improve the accuracy of protein stability predictions made with MSD by using a variety of backbone ensembles to recapitulate the experimentally measured stability of 85 Streptococcal protein G domain β1 sequences. Ensembles tested here include an NMR ensemble as well as those generated by molecular dynamics (MD) simulations, by Backrub motions, and by PertMin, a new method that we developed involving the perturbation of atomic coordinates followed by energy minimization. MSD with the PertMin ensembles resulted in the most accurate predictions by providing the highest number of stable sequences in the top 25, and by correctly binning sequences as stable or unstable with the highest success rate (≈90%) and the lowest number of false positives. The performance of PertMin ensembles is due to the fact that their members closely resemble the input crystal structure and have low potential energy. Conversely, the NMR ensemble as well as those generated by MD simulations at 500 or 1000 K reduced prediction accuracy due to their low structural similarity to the crystal structure. The ensembles tested herein thus represent on‐ or off‐target models of the native protein fold and could be used in future studies to design for desired properties other than stability. Proteins 2014; 82:771–784. © 2013 Wiley Periodicals, Inc.  相似文献   

13.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

14.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone.Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions.These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues.This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.  相似文献   

15.
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main‐chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five‐residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino‐acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign‐SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild‐type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi‐blast. More importantly, the sequences designed by RosettaDesign‐SR have 2–3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild‐type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign‐SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

16.
Current methods for identification of domains within protein sequences require either structural information or the identification of homologous domain sequences in different sequence contexts. Knowledge of structural domain boundaries is important for fold recognition experiments and structural determination by X-ray crystallography or nuclear magnetic resonance spectroscopy using the divide-and-conquer approach. Here, a new and conceptually simple method for the identification of structural domain boundaries in multiple protein sequence alignments is presented. Analysis of covariance at positions within the alignment is first used to predict 3D contacts. By the nature of the domain as an independent folding unit, inter-domain predicted contacts are fewer than intra-domain predicted contacts. By analysing all possible domain boundaries and constructing a smoothed profile of predicted contact density (PCD), true structural domain boundaries are predicted as local profile minima associated with low PCD. A training data set is constructed from 52 non-homologous two-domain protein sequences of known 3D structure and used to determine optimal parameters for the profile analysis. The alignments in the training data set contained 48 +/- 17 (mean +/- SD) sequences and lengths of 257 +/- 121 residues. Of the 47 alignments yielding predictions, 35% of true domain boundaries are predicted to within 15 amino acids by the local profile minimum with the lowest profile value. Including predictions from the second- and third-lowest local minima increases the correct domain boundary coverage to 60%, whereas the lowest five local minima cover 79% of correct domain boundaries. Through further profile analysis, criteria are presented which reliably identify subsets of more accurate predictions. Retrospective analysis of CASP3 targets shows predictions of sufficient accuracy to enable dramatically improved fold recognition results. Finally, a prediction is made for geminivirus AL1 protein which is in full agreement with biochemical data, yielding a plausible, novel threading result.  相似文献   

17.
Hidetoshi Kono  Junta Doi 《Proteins》1994,19(3):244-255
Globular proteins have high packing densities as a result of residue side chains in the core achieving a tight, complementary packing. The internal packing is considered the main determinant of native protein structure. From that point of view, we present here a method of energy minimization using an automata network to predict a set of amino acid sequences and their side-chain conformations from a desired backbone geometry for de novo design of proteins. Using discrete side-chain conformations, that is, rotamers, the sequence generation problem from a given backbone geometry becomes one of combinatorial problems. We focused on the residues composing the interior core region and predicted a set of amino acid Sequences and their side-chain conformations only from a given backbone geometry. The kinds of residues were restricted to six hydrophobic amino acids (Ala, Ile, Met, Leu, Phe, and Val) because the core regions are almost always composed of hydrophobic residues. The obtained sequences were well packed as was the native sequence. The method can be used for automated sequence generation in the de novo design of proteins. © 1994 Wiley-Liss, Inc.  相似文献   

18.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

19.
Computational protein design methods can complement experimental screening and selection techniques by predicting libraries of low-energy sequences compatible with a desired structure and function. Incorporating backbone flexibility in computational design allows conformational adjustments that should broaden the range of predicted low-energy sequences. Here, we evaluate computational predictions of sequence libraries from different protocols for modeling backbone flexibility using the complex between the therapeutic antibody Herceptin and its target human epidermal growth factor receptor 2 (HER2) as a model system. Within the program RosettaDesign, three methods are compared: The first two use ensembles of structures generated by Monte Carlo protocols for near-native conformational sampling: kinematic closure (KIC) and backrub, and the third method uses snapshots from molecular dynamics (MD) simulations. KIC or backrub methods were better able to identify the amino acid residues experimentally observed by phage display in the Herceptin-HER2 interface than MD snapshots, which generated much larger conformational and sequence diversity. KIC and backrub, as well as fixed backbone simulations, captured the key mutation Asp98Trp in Herceptin, which leads to a further threefold affinity improvement of the already subnanomolar parental Herceptin-HER2 interface. Modeling subtle backbone conformational changes may assist in the design of sequence libraries for improving the affinity of antibody-antigen interfaces and could be suitable for other protein complexes for which structural information is available.  相似文献   

20.
Fischer D 《Proteins》2003,51(3):434-441
To gain a better understanding of the biological role of proteins encoded in genome sequences, knowledge of their three-dimensional (3D) structure and function is required. The computational assignment of folds is becoming an increasingly important complement to experimental structure determination. In particular, fold-recognition methods aim to predict approximate 3D models for proteins bearing no sequence similarity to any protein of known structure. However, fully automated structure-prediction methods can currently produce reliable models for only a fraction of these sequences. Using a number of semiautomated procedures, human expert predictors are often able to produce more and better predictions than automated methods. We describe a novel, fully automatic, fold-recognition meta-predictor, named 3D-SHOTGUN, which incorporates some of the strategies human predictors have successfully applied. This new method is reminiscent of the so-called cooperative algorithms of Computer Vision. The input to 3D-SHOTGUN are the top models predicted by a number of independent fold-recognition servers. The meta-predictor consists of three steps: (i) assembly of hybrid models, (ii) confidence assignment, and (iii) selection. We have applied 3D-SHOTGUN to an unbiased test set of 77 newly released protein structures sharing no sequence similarity to proteins previously released. Forty-six correct rank-1 predictions were obtained, 30 of which had scores higher than that of the first incorrect prediction-a significant improvement over the performance of all individual servers. Furthermore, the predicted hybrid models were, on average, more similar to their corresponding native structures than those produced by the individual servers. This opens the possibility of generating more accurate, full-atom homology models for proteins with no sequence similarity to proteins of known structure. These improvements represent a step forward toward the wider applicability of fully automated structure-prediction methods at genome scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号