首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.  相似文献   

2.
Ab initio protein structure prediction using pathway models   总被引:1,自引:0,他引:1  
Ab initio prediction is the challenging attempt to predict protein structures based only on sequence information and without using templates. It is often divided into two distinct sub-problems: (a) the scoring function that can distinguish native, or native-like structures, from non-native ones; and (b) the method of searching the conformational space. Currently, there is no reliable scoring function that can always drive a search to the native fold, and there is no general search method that can guarantee a significant sampling of near-natives. Pathway models combine the scoring function and the search. In this short review, we explore some of the ways pathway models are used in folding, in published works since 2001, and present a new pathway model, HMMSTR-CM, that uses a fragment library and a set of nucleation/propagation-based rules. The new method was used for ab initio predictions as part of CASP5. This work was presented at the Winter School in Bioinformatics, Bologna, Italy, 10-14 February 2003.  相似文献   

3.
The structural genomics initiatives have begun with the aim to create a so-called "basic set library" of protein folds that will be used to improve protein prediction methods. Such a library is thought to require the determination of up to 10,000 new structures, including representative structures of several sequence variants from each protein fold. To meet this goal in a reasonable time frame and cost, automated systems must be utilized to clone and to identify the soluble recombinant proteins contained in multiple genomes. This paper presents such a system, developed using the genome of Caenorhabditis elegans (19,099 genes) as a model eukaryotic organism for structural genomics. This system successfully automates nearly all aspects of recombinant protein expression analysis including subcloning, bacterial growth, recombinant protein expression, protein purification, and scoring protein solubility.  相似文献   

4.
We report a novel computational procedure for determining protein native topology, or fold, by defining loop connectivity based on skeletons of secondary structures that can usually be obtained from low to intermediate-resolution density maps. The procedure primarily involves a knowledge-based geometry filter followed by an energetics-based evaluation. It was tested on a large set of skeletons covering a wide range of protein architecture, including one modeled from an experimentally determined 7.6A cryo-electron microscopy (cryo-EM) density map. The results showed that the new procedure could effectively deduce protein folds without high-resolution structural data, a feature that could also be used to recognize native fold in structure prediction and to interpret data in fields like structure genomics. Most importantly, in the energetics-based evaluation, it was revealed that, despite the inevitable errors in the artificially constructed structures and limited accuracy of knowledge-based potential functions, the average energy of an ensemble of structures with slightly different configurations around the native skeleton is a much more robust parameter for marking native topology than the energy of individual structures in the ensemble. This result implies that, among all the possible topology candidates for a given skeleton, evolution has selected the native topology as the one that can accommodate the largest structural variations, not the one rigidly trapped in a deep, but narrow, conformational energy well.  相似文献   

5.
MOTIVATION: Circular Dichroism (CD) spectroscopy is a long-established technique for studying protein secondary structures in solution. Empirical analyses of CD data rely on the availability of reference datasets comprised of far-UV CD spectra of proteins whose crystal structures have been determined. This article reports on the creation of a new reference dataset which effectively covers both secondary structure and fold space, and uses the higher information content available in synchrotron radiation circular dichroism (SRCD) spectra to more accurately predict secondary structure than has been possible with existing reference datasets. It also examines the effects of wavelength range, structural redundancy and different means of categorizing secondary structures on the accuracy of the analyses. In addition, it describes a novel use of hierarchical cluster analyses to identify protein relatedness based on spectral properties alone. The databases are shown to be applicable in both conventional CD and SRCD spectroscopic analyses of proteins. Hence, by combining new bioinformatics and biophysical methods, a database has been produced that should have wide applicability as a tool for structural molecular biology.  相似文献   

6.
QMEAN: A comprehensive scoring function for model quality assessment   总被引:3,自引:0,他引:3  
  相似文献   

7.
The problem of protein tertiary structure prediction from primary sequence can be separated into two subproblems: generation of a library of possible folds and specification of a best fold given the library. A distance geometry procedure based on random pairwise metrization with good sampling properties was used to generate a library of 500 possible structures for each of 11 small helical proteins. The input to distance geometry consisted of sets of restraints to enforce predicted helical secondary structure and a generic range of 5 to 11 A between predicted contact residues on all pairs of helices. For each of the 11 targets, the resulting library contained structures with low RMSD versus the native structure. Near-native sampling was enhanced by at least three orders of magnitude compared to a random sampling of compact folds. All library members were scored with a combination of an all-atom distance-dependent function, a residue pair-potential, and a hydrophobicity function. In six of the 11 cases, the best-ranking fold was considered to be near native. Each library was also reduced to a final ab initio prediction via consensus distance geometry performed over the 50 best-ranking structures from the full set of 500. The consensus results were of generally higher quality, yielding six predictions within 6.5 A of the native fold. These favorable predictions corresponded to those for which the correlation between the RMSD and the scoring function were highest. The advantage of the reported methodology is its extreme simplicity and potential for including other types of structural restraints.  相似文献   

8.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.  相似文献   

9.
When a protein sequence does not share any significant sequence similarity with a protein of known structure, homology modeling cannot be applied. However, many novel and interesting methods, such as secondary structure prediction, fold recognition, and prediction of long-range interactions, are being developed and have been shown to be reasonably successful in predicting protein structures from sequence data and evolutionary information. The a priori evaluation of the correctness of a prediction obtained by one of these methods is however often problematic. Consequently, it is important to use all available information provided by as many different methods as possible and all the available experimental data about the protein of interest, since the consistency of the results is indicative of the reliability of the prediction. Hence the need has arisen for suitable tools able to compare results provided by different methods and evaluate their consistency. We have therefore constructed GLASS, a general platform to read, visualize, compare, and evaluate prediction results from many different sources and to project these prediction results into three dimensions. In addition, GLASS allows the comparison of selected parameters calculated for a model with the distribution observed in real protein structures, thus providing an easy way to test new methods for evaluating the likelihood of different structural models. GLASS can be considered as a “workbench” for structural predictions useful to both experimentalists and theoreticians. Proteins 30:339–351, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

10.
11.
Kim D  Xu D  Guo JT  Ellrott K  Xu Y 《Protein engineering》2003,16(9):641-650
A new method for fold recognition is developed and added to the general protein structure prediction package PROSPECT (http://compbio.ornl.gov/PROSPECT/). The new method (PROSPECT II) has four key features. (i) We have developed an efficient way to utilize the evolutionary information for evaluating the threading potentials including singleton and pairwise energies. (ii) We have developed a two-stage threading strategy: (a) threading using dynamic programming without considering the pairwise energy and (b) fold recognition considering all the energy terms, including the pairwise energy calculated from the dynamic programming threading alignments. (iii) We have developed a combined z-score scheme for fold recognition, which takes into consideration the z-scores of each energy term. (iv) Based on the z-scores, we have developed a confidence index, which measures the reliability of a prediction and a possible structure-function relationship based on a statistical analysis of a large data set consisting of threadings of 600 query proteins against the entire FSSP templates. Tests on several benchmark sets indicate that the evolutionary information and other new features of PROSPECT II greatly improve the alignment accuracy. We also demonstrate that the performance of PROSPECT II on fold recognition is significantly better than any other method available at all levels of similarity. Improvement in the sensitivity of the fold recognition, especially at the superfamily and fold levels, makes PROSPECT II a reliable and fully automated protein structure and function prediction program for genome-scale applications.  相似文献   

12.
McGuffin LJ  Jones DT 《Proteins》2003,52(2):166-175
If secondary structure predictions are to be incorporated into fold recognition methods, an assessment of the effect of specific types of errors in predicted secondary structures on the sensitivity of fold recognition should be carried out. Here, we present a systematic comparison of different secondary structure prediction methods by measuring frequencies of specific types of error. We carry out an evaluation of the effect of specific types of error on secondary structure element alignment (SSEA), a baseline fold recognition method. The results of this evaluation indicate that missing out whole helix or strand elements, or predicting the wrong type of element, is more detrimental than predicting the wrong lengths of elements or overpredicting helix or strand. We also suggest that SSEA scoring is an effective method for assessing accuracy of secondary structure prediction and perhaps may also provide a more appropriate assessment of the "usefulness" and quality of predicted secondary structure, if secondary structure alignments are to be used in fold recognition.  相似文献   

13.
Protein sequence design based on the topology of the native state structure   总被引:1,自引:0,他引:1  
Computational design of sequences for a given structure is generally studied by exhaustively enumerating the sequence space or by searching in such a large space, which is prohibitively expensive. However, we point out that the protein topology has a wealth of information, which can be exploited to design sequences for a chosen structure. In this paper, we present a computationally efficient method for ranking the residue sites in a given native-state structure, which enables us to design sequences for a chosen structure. The premise for the method is that the topology of the graph representing the energetically interacting neighbours in a protein plays an important role in the inverse-folding problem. While our previous work (which was also based on topology) used eigenspectral analysis of the adjacency matrix of interactions for ranking the residue sites in a given chain, here we use a simple but effective way of assigning weights to the nodes on the basis of secondary connections, along with primary connections. This indirectly accounts for the edge weight in the graph and removes degeneracy in the degree. The new scheme needs only a few multiplications and additions to compute the preferred ranking of the residue sites even for structures of real proteins of sizes of a few hundred amino acid residues. We use HP lattice model examples (for which exhaustive enumeration of sequences is practical) to validate our ranking approach in obtaining sequences of lowest energy for any H-P residue composition for a given native-state structure. Some examples of native structures of real proteins are also included. Quantitative comparison of the efficacy of the new scheme with the earlier schemes is made. The new scheme consistently performs better and with much lower computational cost. An optimization procedure is added to work with the new scheme in a few rare cases wherein the new scheme fails to provide the best sequence, an optimization procedure is added to work with the new scheme.  相似文献   

14.
Protein fold recognition using sequence-derived predictions.   总被引:18,自引:9,他引:9       下载免费PDF全文
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.  相似文献   

15.
With the aim of studying the relationship between protein sequences and their native structures, we adopted vectorial representations for both sequence and structure. The structural representation was based on the principal eigenvector of the fold's contact matrix (PE). As has been recently shown, the latter encodes sufficient information for reconstructing the whole contact matrix. The sequence was represented through a hydrophobicity profile (HP), using a generalized hydrophobicity scale that we obtained from the principal eigenvector of a residue-residue interaction matrix, and denoted as interactivity scale. Using this novel scale, we defined the optimal HP of a protein fold, and, by means of stability arguments, predicted to be strongly correlated with the PE of the fold's contact matrix. This prediction was confirmed through an evolutionary analysis, which showed that the PE correlates with the HP of each individual sequence adopting the same fold and, even more strongly, with the average HP of this set of sequences. Thus, protein sequences evolve in such a way that their average HP is close to the optimal one, implying that neutral evolution can be viewed as a kind of motion in sequence space around the optimal HP. Our results indicate that the correlation coefficient between N-dimensional vectors constitutes a natural metric in the vectorial space in which we represent both protein sequences and protein structures, which we call vectorial protein space. In this way, we define a unified framework for sequence-to-sequence, sequence-to-structure and structure-to-structure alignments. We show that the interactivity scale is nearly optimal both for the comparison of sequences to sequences and sequences to structures.  相似文献   

16.

Background  

Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands.  相似文献   

17.
Recognizing the fold of a protein structure   总被引:3,自引:0,他引:3  
This paper reports a graph-theoretic program, GRATH, that rapidly, and accurately, matches a novel structure against a library of domain structures to find the most similar ones. GRATH generates distributions of scores by comparing the novel domain against the different types of folds that have been classified previously in the CATH database of structural domains. GRATH uses a measure of similarity that details the geometric information, number of secondary structures and number of residues within secondary structures, that any two protein structures share. Although GRATH builds on well established approaches for secondary structure comparison, a novel scoring scheme has been introduced to allow ranking of any matches identified by the algorithm. More importantly, we have benchmarked the algorithm using a large dataset of 1702 non-redundant structures from the CATH database which have already been classified into fold groups, with manual validation. This has facilitated introduction of further constraints, optimization of parameters and identification of reliable thresholds for fold identification. Following these benchmarking trials, the correct fold can be identified with the top score with a frequency of 90%. It is identified within the ten most likely assignments with a frequency of 98%. GRATH has been implemented to use via a server (http://www.biochem.ucl.ac.uk/cgi-bin/cath/Grath.pl). GRATH's speed and accuracy means that it can be used as a reliable front-end filter for the more accurate, but computationally expensive, residue based structure comparison algorithm SSAP, currently used to classify domain structures in the CATH database. With an increasing number of structures being solved by the structural genomics initiatives, the GRATH server also provides an essential resource for determining whether newly determined structures are related to any known structures from which functional properties may be inferred.  相似文献   

18.
Although most proteins conform to the classical one‐structure/one‐function paradigm, an increasing number of proteins with dual structures and functions have been discovered. In response to cellular stimuli, such proteins undergo structural changes sufficiently dramatic to remodel even their secondary structures and domain organization. This “fold‐switching” capability fosters protein multi‐functionality, enabling cells to establish tight control over various biochemical processes. Accurate predictions of fold‐switching proteins could both suggest underlying mechanisms for uncharacterized biological processes and reveal potential drug targets. Recently, we developed a prediction method for fold‐switching proteins using structure‐based thermodynamic calculations and discrepancies between predicted and experimentally determined protein secondary structure (Porter and Looger, Proc Natl Acad Sci U S A 2018; 115:5968–5973). Here we seek to leverage the negative information found in these secondary structure prediction discrepancies. To do this, we quantified secondary structure prediction accuracies of 192 known fold‐switching regions (FSRs) within solved protein structures found in the Protein Data Bank (PDB). We find that the secondary structure prediction accuracies for these FSRs vary widely. Inaccurate secondary structure predictions are strongly associated with fold‐switching proteins compared to equally long segments of non‐fold‐switching proteins selected at random. These inaccurate predictions are enriched in helix‐to‐strand and strand‐to‐coil discrepancies. Finally, we find that most proteins with inaccurate secondary structure predictions are underrepresented in the PDB compared with their alternatively folded cognates, suggesting that unequal representation of fold‐switching conformers within the PDB could be an important cause of inaccurate secondary structure predictions. These results demonstrate that inconsistent secondary structure predictions can serve as a useful preliminary marker of fold switching.  相似文献   

19.
Kifer I  Nussinov R  Wolfson HJ 《Proteins》2011,79(6):1759-1773
The pathways by which proteins fold into their specific native structure are still an unsolved mystery. Currently, many methods for protein structure prediction are available, and most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations toward each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method's abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the template-based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for template-based structure prediction, and in particular, the docking-basedranking technique presented here can be incorporated into any profile-profile comparison based method.  相似文献   

20.
Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein ‘structure prediction’ problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号