共查询到20条相似文献,搜索用时 0 毫秒
1.
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. 相似文献
2.
The pathways by which proteins fold into their specific native structure are still an unsolved mystery. Currently, many methods for protein structure prediction are available, and most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations toward each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method's abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the template-based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for template-based structure prediction, and in particular, the docking-basedranking technique presented here can be incorporated into any profile-profile comparison based method. 相似文献
3.
George A. Khoury Phanourios Tamamis Neesha Pinnaduwage James Smadbeck Chris A. Kieslich Christodoulos A. Floudas 《Proteins》2014,82(5):794-814
Protein structure refinement aims to perform a set of operations given a predicted structure to improve model quality and accuracy with respect to the native in a blind fashion. Despite the numerous computational approaches to the protein refinement problem reported in the previous three CASPs, an overwhelming majority of methods degrade models rather than improve them. We initially developed a method tested using blind predictions during CASP10 which was officially ranked in 5th place among all methods in the refinement category. Here, we present Princeton_TIGRESS, which when benchmarked on all CASP 7,8,9, and 10 refinement targets, simultaneously increased GDT_TS 76% of the time with an average improvement of 0.83 GDT_TS points per structure. The method was additionally benchmarked on models produced by top performing three‐dimensional structure prediction servers during CASP10. The robustness of the Princeton_TIGRESS protocol was also tested for different random seeds. We make the Princeton_TIGRESS refinement protocol freely available as a web server at http://atlas.princeton.edu/refinement . Using this protocol, one can consistently refine a prediction to help bridge the gap between a predicted structure and the actual native structure. Proteins 2014; 82:794–814. © 2013 Wiley Periodicals, Inc. 相似文献
4.
Rapid protein domain assignment from amino acid sequence using predicted secondary structure 总被引:8,自引:0,他引:8 下载免费PDF全文
Marsden RL McGuffin LJ Jones DT 《Protein science : a publication of the Protein Society》2002,11(12):2814-2824
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed. 相似文献
5.
Andrew W. Senior Richard Evans John Jumper James Kirkpatrick Laurent Sifre Tim Green Chongli Qin Augustin Žídek Alexander W. R. Nelson Alex Bridgland Hugo Penedones Stig Petersen Karen Simonyan Steve Crossan Pushmeet Kohli David T. Jones David Silver Koray Kavukcuoglu Demis Hassabis 《Proteins》2019,87(12):1141-1148
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods. 相似文献
6.
D. J. Ayers P. R. Gooley A. Widmer-Cooper A. E. Torda 《Protein science : a publication of the Protein Society》1999,8(5):1127-1133
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data. 相似文献
7.
8.
Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics 下载免费PDF全文
The most successful protein structure prediction methods to date have been template‐based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug‐design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr‐REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native‐like structures from a template and to provide a set of persistent contacts to be employed during re‐folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. Proteins 2015; 83:2279–2292. © 2015 Wiley Periodicals, Inc. 相似文献
9.
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near‐native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target‐specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all‐atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best‐refined models are selected with SMDR by testing their relative stability against gradual heating through all‐atom MD simulations. Through extensive testing we have found that Mufold‐MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions. Proteins 2015; 83:1823–1835. © 2015 Wiley Periodicals, Inc. 相似文献
10.
Background
Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. 相似文献11.
Bagaria A Jaravine V Huang YJ Montelione GT Güntert P 《Protein science : a publication of the Protein Society》2012,21(2):229-238
Large-scale initiatives for obtaining spatial protein structures by experimental or computational means have accentuated the need for the critical assessment of protein structure determination and prediction methods. These include blind test projects such as the critical assessment of protein structure prediction (CASP) and the critical assessment of protein structure determination by nuclear magnetic resonance (CASD-NMR). An important aim is to establish structure validation criteria that can reliably assess the accuracy of a new protein structure. Various quality measures derived from the coordinates have been proposed. A universal structural quality assessment method should combine multiple individual scores in a meaningful way, which is challenging because of their different measurement units. Here, we present a method based on a generalized linear model (GLM) that combines diverse protein structure quality scores into a single quantity with intuitive meaning, namely the predicted coordinate root-mean-square deviation (RMSD) value between the present structure and the (unavailable) "true" structure (GLM-RMSD). For two sets of structural models from the CASD-NMR and CASP projects, this GLM-RMSD value was compared with the actual accuracy given by the RMSD value to the corresponding, experimentally determined reference structure from the Protein Data Bank (PDB). The correlation coefficients between actual (model vs. reference from PDB) and predicted (model vs. "true") heavy-atom RMSDs were 0.69 and 0.76, for the two datasets from CASD-NMR and CASP, respectively, which is considerably higher than those for the individual scores (-0.24 to 0.68). The GLM-RMSD can thus predict the accuracy of protein structures more reliably than individual coordinate-based quality scores. 相似文献
12.
MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models 下载免费PDF全文
We present a Model Quality Assessment Program (MQAP), called MQAPsingle, for ranking and assessing the absolute global quality of single protein models. MQAPsingle is quasi single‐model MQAP, a method that combines advantages of both “pure” single‐model MQAPs and clustering MQAPs. This approach results in higher accuracy compared to the state‐of‐the‐art single‐model MQAPs. Notably, the prediction for a given model is the same regardless if this model is submitted to our server alone or together with other models. Proteins 2016; 84:1021–1028. © 2015 Wiley Periodicals, Inc. 相似文献
13.
14.
Computational methods that produce accurate protein structure models from limited experimental data, for example, from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity, and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predicted de novo using a two-stage protocol. First, low-resolution models were generated with the Rosetta de novo method guided by nonambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and nonambiguous NOE contacts and RDCs. Nine out of 16 of the Rosetta de novo models had the correct fold (global distance test total score > 45) and in three cases high-resolution models were achieved (root-mean-square deviation < 3.5 å). We also show that a meta-approach applying iterative Rosetta + NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR. 相似文献
15.
Evaluating or predicting the quality of protein models (i.e., predicted protein tertiary structures) without knowing their native structures is important for selecting and appropriately using protein models. We describe an iterative approach that improves the performances of protein Model Quality Assurance Programs (MQAPs). Given the initial quality scores of a list of models assigned by a MQAP, the method iteratively refines the scores until the ranking of the models does not change. We applied the method to the model quality assessment data generated by 30 MQAPs during the Eighth Critical Assessment of Techniques for Protein Structure Prediction. To various degrees, our method increased the average correlation between predicted and real quality scores of 25 out of 30 MQAPs and reduced the average loss (i.e., the difference between the top ranked model and the best model) for 28 MQAPs. Particularly, for MQAPs with low average correlations (<0.4), the correlation can be increased by several times. Similar experiments conducted on the CASP9 MQAPs also demonstrated the effectiveness of the method. Our method is a hybrid method that combines the original method of a MQAP and the pair-wise comparison clustering method. It can achieve a high accuracy similar to a full pair-wise clustering method, but with much less computation time when evaluating hundreds of models. Furthermore, without knowing native structures, the iterative refining method can evaluate the performance of a MQAP by analyzing its model quality predictions. 相似文献
16.
Knappenberger JA Kraemer-Pecore CM Lecomte JT 《Protein science : a publication of the Protein Society》2004,13(11):2899-2908
Under native conditions, apocytochrome b(5) exhibits a stable core and a disordered heme-binding region that refolds upon association with the cofactor. The termini of this flexible region are in close proximity, suggesting that loop closure may contribute to the thermodynamic properties of the apocytochrome. A chimeric protein containing 43 residues encompassing the cytochrome loop was constructed using the cyanobacterial photosystem I accessory protein E (PsaE) from Synechococcus sp. PCC 7002 as a structured scaffold. PsaE has the topology of an SH3 domain, and the insertion was engineered to replace its 14-residue CD loop. NMR and optical spectroscopies showed that the hybrid protein (named EbE1) was folded under native conditions and that it retained the characteristics of an SH3 domain. NMR spectroscopy revealed that structural and dynamic differences were confined near the site of loop insertion. Variable-temperature 1D NMR spectra of EbE1 confirmed the presence of a kinetic unfolding barrier. Thermal and chemical denaturations of PsaE and EbE1 demonstrated cooperative, two-state transitions; the stability of the PsaE scaffold was found only moderately compromised by the insertion, with a DeltaT(m) of 8.3 degrees C, a DeltaC(m) of 1.5 M urea, and a DeltaDeltaG degrees of 4.2 kJ/mole. The data implied that the penalty for constraining the ends of the inserted region was lower than the approximately 6.4 kJ/mole calculated for a self-avoiding chain. Extrapolation of these results to cytochrome b(5) suggested that the intrinsic stability of the folded portion of the apoprotein reflected only a small detrimental contribution from the large heme-binding domain. 相似文献
17.
Topological and sequence information predict that foldons organize a partially overlapped and hierarchical structure 下载免费PDF全文
It has been suggested that proteins have substructures, called foldons, which can cooperatively fold into the native structure. However, several prior investigations define foldons in various ways, citing different foldon characteristics, thereby making the concept of a foldon ambiguous. In this study, we perform a Gō model simulation and analyze the characteristics of substructures that cooperatively fold into the native‐like structure. Although some results do not agree well with the experimental evidence due to the simplicity of our coarse‐grained model, our results strongly suggest that cooperatively folding units sometimes organize a partially overlapped and hierarchical structure. This view makes us easy to interpret some different proposal about the foldon as a difference of the hierarchical structure. On the basis of this finding, we present a new method to assign foldons and their hierarchy, using structural and sequence information. The results show that the foldons assigned by our method correspond to the intermediate structures identified by some experimental techniques. The new method makes it easy to predict whether a protein folds sequentially into the native structure or whether some foldons fold into the native structure in parallel. Proteins 2015; 83:1900–1913. © 2015 Wiley Periodicals, Inc. 相似文献
18.
19.
20.
To study protein nascent chain folding during biosynthesis, we investigate the folding behavior of models of hydrophobic and polar (HP) chains at growing length using both two-dimensional square lattice model and an optimized three-dimensional 4-state discrete off-lattice model. After enumerating all possible sequences and conformations of HP heteropolymers up to length N = 18 and N = 15 in two and three-dimensional space, respectively, we examine changes in adopted structure, stability, and tolerance to single point mutation as the nascent chain grows. In both models, we find that stable model proteins have fewer folded nascent chains during growth, and often will only fold after reaching full length. For the few occasions where partial chains of stable proteins fold, these partial conformations on average are very similar to the corresponding parts of the final conformations at full length. Conversely, we find that sequences with fewer stable nascent chains and sequences with native-like folded nascent chains are more stable. In addition, these stable sequences in general can have many more point mutations and still fold into the same conformation as the wild type sequence. Our results suggest that stable proteins are less likely to be trapped in metastable conformations during biosynthesis, and are more resistant to point-mutations. Our results also imply that less stable proteins will require the assistance of chaperone and other factors during nascent chain folding. Taken together with other reported studies, it seems that cotranslational folding may not be a general mechanism of in vivo protein folding for small proteins, and in vitro folding studies are still relevant for understanding how proteins fold biologically. 相似文献