首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Evaluating or predicting the quality of protein models (i.e., predicted protein tertiary structures) without knowing their native structures is important for selecting and appropriately using protein models. We describe an iterative approach that improves the performances of protein Model Quality Assurance Programs (MQAPs). Given the initial quality scores of a list of models assigned by a MQAP, the method iteratively refines the scores until the ranking of the models does not change. We applied the method to the model quality assessment data generated by 30 MQAPs during the Eighth Critical Assessment of Techniques for Protein Structure Prediction. To various degrees, our method increased the average correlation between predicted and real quality scores of 25 out of 30 MQAPs and reduced the average loss (i.e., the difference between the top ranked model and the best model) for 28 MQAPs. Particularly, for MQAPs with low average correlations (<0.4), the correlation can be increased by several times. Similar experiments conducted on the CASP9 MQAPs also demonstrated the effectiveness of the method. Our method is a hybrid method that combines the original method of a MQAP and the pair-wise comparison clustering method. It can achieve a high accuracy similar to a full pair-wise clustering method, but with much less computation time when evaluating hundreds of models. Furthermore, without knowing native structures, the iterative refining method can evaluate the performance of a MQAP by analyzing its model quality predictions.  相似文献   

2.

Background  

In the area of protein structure prediction, recently a lot of effort has gone into the development of Model Quality Assessment Programs (MQAPs). MQAPs distinguish high quality protein structure models from inferior models. Here, we propose a new method to use an MQAP to improve the quality of models. With a given target sequence and template structure, we construct a number of different alignments and corresponding models for the sequence. The quality of these models is scored with an MQAP and used to choose the most promising model. An SVM-based selection scheme is suggested for combining MQAP partial potentials, in order to optimize for improved model selection.  相似文献   

3.

Background  

Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network.  相似文献   

4.
MOTIVATION: The ability of a simple method (MODCHECK) to determine the sequence-structure compatibility of a set of structural models generated by fold recognition is tested in a thorough benchmark analysis. Four Model Quality Assessment Programs (MQAPs) were tested on 188 targets from the latest LiveBench-9 automated structure evaluation experiment. We systematically test and evaluate whether the MQAP methods can successfully detect native-like models. RESULTS: We show that compared with the other three methods tested MODCHECK is the most reliable method for consistently performing the best top model selection and for ranking the models. In addition, we show that the choice of model similarity score used to assess a model's similarity to the experimental structure can influence the overall performance of these tools. Although these MQAP methods fail to improve the model selection performance for methods that already incorporate protein three dimension (3D) structural information, an improvement is observed for methods that are purely sequence-based, including the best profile-profile methods. This suggests that even the best sequence-based fold recognition methods can still be improved by taking into account the 3D structural information. CONTACT: d.jones@cs.ucl.ac.uk  相似文献   

5.
Sadowski MI  Jones DT 《Proteins》2007,69(3):476-485
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.  相似文献   

6.
Computational prediction of protein structures is a difficult task, which involves fast and accurate evaluation of candidate model structures. We propose to enhance single‐model quality assessment with a functionality evaluation phase for proteins whose quantitative functional characteristics are known. In particular, this idea can be applied to evaluation of structural models of ion channels, whose main function ‐ conducting ions ‐ can be quantitatively measured with the patch‐clamp technique providing the current–voltage characteristics. The study was performed on a set of KcsA channel models obtained from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for calculating the current–voltage characteristics of structural models. We found that the computed charge selectivity and total current were sensitive to structural and electrostatic quality of models. In practical terms, we show that evaluating predicted conductance values is an appropriate method to eliminate models with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in highly accurate models. Tests on three other ion channels indicate that, in addition to being a proof of the concept, our function‐oriented single‐model quality assessment method can be directly applied to evaluation of structural models of some classes of protein channels. Finally, our work raises an important question whether a computational validation of functionality should be included in the evaluation process of structural models, whenever possible. Proteins 2016; 84:217–231. © 2015 Wiley Periodicals, Inc.  相似文献   

7.
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT‐TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross‐validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template‐based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top‐ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

8.
9.
The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment. AVAILABILITY: http://www.biocentre.rdg.ac.uk/bioinformatics/ModFOLD/.  相似文献   

10.
An increasing number of cryo‐electron microscopy (cryo‐EM) density maps are being generated with suitable resolution to trace the protein backbone and guide sidechain placement. Generating and evaluating atomic models based on such maps would be greatly facilitated by independent validation metrics for assessing the fit of the models to the data. We describe such a metric based on the fit of atomic models with independent test maps from single particle reconstructions not used in model refinement. The metric provides a means to determine the proper balance between the fit to the density and model energy and stereochemistry during refinement, and is likely to be useful in determining values of model building and refinement metaparameters quite generally.  相似文献   

11.
Yantao Chen  Jiandong Ding 《Proteins》2010,78(9):2090-2100
To explore the role of non‐native interactions in the helix‐coil transition, a detailed comparison between a Gō‐like model and a non‐Gō model has been performed via lattice Monte Carlo simulations. Only native hydrogen bonding interactions occur in the Gō‐like model, and the non‐native ones with sequence interval more than 4 is also included into the non‐Gō model. Some significant differences between the results from those two models have been found. The non‐native hydrogen bonds were found most populated at temperature around the helix‐coil transition. The rearrangement of non‐native hydrogen bonds into native ones in the formation of α‐helix leads to the increase of susceptibility of chain conformation, and even two peaks of susceptibility of radius of gyration versus temperature exist in the case of non‐Gō model for a non‐short peptide, while just a single peak exists in the case of Gō model for a single polypeptide chain with various chain lengths. The non‐native hydrogen bonds have complicated the temperature‐dependence of Zimm‐Bragg nucleation constant. The increase of relative probability of non‐native hydrogen bonding for long polypeptide chains leads to non‐monotonous chain length effect on the transition temperature. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

12.
Bond‐orientational correlations for finite‐length homopolypeptides and a selected group of denatured proteins are obtained by numerical simulations using a polypeptide model with a potential of mean force. These correlations characterize the stiffness of the polypeptide backbone and are generally described by either an exponential or a power‐law decay in the asymptotic limit. However, for finite length polypeptides and unfolded proteins the correlations significantly deviate from either single exponential or power‐law behavior. A heuristic model is developed to analyze the correlations of homopolypeptides, which depends on the chain length and the side‐chain properties. The model contains power‐law and multi‐exponential terms, the latter which could be interpreted as local persistence lengths. In the asymptotic limit, the model reduces to the expected power‐law behavior. Simulations of denatured proteins show that the power‐law behavior of the correlations is significantly suppressed and only the multi‐exponential term is needed to model the correlations. In addition, average persistence lengths (ranging from 2.0 to 2.5 nm) are obtained from the correlations by fitting single exponentials and shown to be in general agreement with experiments, which also assume single exponential decay. © 2016 Wiley Periodicals, Inc. Biopolymers 105: 312–323, 2016.  相似文献   

13.
Protein–protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non‐energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support‐vector machine kernels. The Gaussian kernel model classified both high‐resolution complexes and designed nonbinders with 79–86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric‐dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types. Proteins 2013; 81:1919–1930. © 2013 Wiley Periodicals, Inc.  相似文献   

14.
Evaluation of protein models against the native structure is essential for the development and benchmarking of protein structure prediction methods. Although a number of evaluation scores have been proposed to date, many aspects of model assessment still lack desired robustness. In this study we present CAD‐score, a new evaluation function quantifying differences between physical contacts in a model and the reference structure. The new score uses the concept of residue–residue contact area difference (CAD) introduced by Abagyan and Totrov (J Mol Biol 1997; 268:678–685). Contact areas, the underlying basis of the score, are derived using the Voronoi tessellation of protein structure. The newly introduced CAD‐score is a continuous function, confined within fixed limits, free of any arbitrary thresholds or parameters. The built‐in logic for treatment of missing residues allows consistent ranking of models of any degree of completeness. We tested CAD‐score on a large set of diverse models and compared it to GDT‐TS, a widely accepted measure of model accuracy. Similarly to GDT‐TS, CAD‐score showed a robust performance on single‐domain proteins, but displayed a stronger preference for physically more realistic models. Unlike GDT‐TS, the new score revealed a balanced assessment of domain rearrangement, removing the necessity for different treatment of single‐domain, multi‐domain, and multi‐subunit structures. Moreover, CAD‐score makes it possible to assess the accuracy of inter‐domain or inter‐subunit interfaces directly. In addition, the approach offers an alternative to the superposition‐based model clustering. The CAD‐score implementation is available both as a web server and a standalone software package at http://www.ibt.lt/bioinformatics/cad‐score/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

15.
The equilibrium properties of a HIV‐1‐protease precursor are studied by means of an efficient molecular dynamics scheme, which allows for the simulation of the folding of the protein monomers and their dimerization into an active form and compare them with those of the mature protein. The results of the model provide, with atomic detail, an overall account of several experimental findings, including the NMR conformation of the mature dimer, the calorimetric properties of the system, the effects of the precursor tail on the dimerization constant, the secondary chemical shifts of the monomer, and the paramagnetic relaxation enhancement data associated with the conformations of the precursor. It is found that although the mature protein can dimerize in a unique, single way, the precursor populates several dimeric conformations in which monomers are always native‐like, but their binding can be non‐native. Proteins 2014; 82:633–639. © 2013 Wiley Periodicals, Inc.  相似文献   

16.
Growth of the young is an important part of the life history in birds. However, modelling methods have paid little attention to the choice of regression model used to describe its pattern. The aim of this study was to evaluate whether a single sigmoid model with an upper asymptote could describe avian growth adequately. We compared unified versions of five growth models of the Richards family (the four‐parameter U‐Richards and the three‐parameter U‐logistic, U‐Gompertz, U‐Bertalanffy and U4‐models) for three traits (body mass, tarsus‐length and wing‐length) for 50 passerine species, including species with varied morphologies and life histories. The U‐family models exhibit a unified set of parameters for all models. The four‐parameter U‐Richards model proved a good choice for fitting growth curves to various traits – its extra d‐parameter allows for a flexible placement of the inflection point. Which of the three‐parameter U‐models was the best performing varied greatly between species and between traits, as each three‐parameter model had a different fixed relative inflection value (fraction of the upper asymptote), implying a different growth pattern. Fixing the asymptotes to averages for adult trait value generally shifted the model preference towards one with lower relative inflection values. Our results illustrate an overlooked difficulty in the analysis of organismal growth, namely, that a single traditional three‐parameter model does not suit all growth data. This is mostly due to differences in inflection placement. Moreover, some biometric traits require more attention when estimating growth rates and other growth‐curve characteristics. We recommend fitting either several three‐parameter models from the U‐family, where the parameters are comparable between models, or only the U‐Richards model.  相似文献   

17.
Coarse‐grained Go models have been widely used for studying protein‐folding mechanisms. Despite the simplicity of the model, these can reproduce the essential features of the folding process of a protein. However, it is also known that side chains significantly contribute to the folding mechanism. Hence, it is desirable to incorporate the side chain effects into a coarse‐grained Go model. In this study, to distinguish the effects of side chain orientation and to understand how these effects contribute to folding mechanisms, we incorporate into a Cα Go model not only heterogeneous contact energies but also geometrical restraints around two Cα atoms in contact with each other. We confirm that the heterogeneity of contact energies governs the folding pathway of a protein and that the geometric constraints attributed to side chains reproduce cooperative transitions in folding. Proteins 2013; 81:1434–1445. © 2013 Wiley Periodicals, Inc.  相似文献   

18.
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge‐based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue‐residue or atom‐atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation‐based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single‐model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa . Proteins 2017; 85:1131–1145. © 2017 Wiley Periodicals, Inc.  相似文献   

19.
A two‐conformation, four‐state model has been proposed to describe protein adsorption and unfolding behavior on hydrophobic interaction chromatography (HIC) resins. In this work, we build upon previous study and application of a four‐state model to the effect of salt concentration on the adsorption and unfolding of the model protein α‐lactalbumin in HIC. Contributions to the apparent adsorption strength of the wild‐type protein from native and unfolded conformations, obtained using a deuterium labeling technique, reveal the free energy change and kinetics of unfolding on the resin, and demonstrate that surface unfolding is reversible. Additionally, variants of α‐lactalbumin in which one of the disulfide bonds is reduced were synthesized to examine the effects of conformational stability on apparent retention. Below the melting temperatures of the wild‐type protein and variants, reduction of a single disulfide bond significantly increases the apparent adsorption strength (~6–8 kJ/mol) due to increased instability of the protein. Finally, the four‐state model is used to accurately predict the apparent adsorption strength of a disulfide bond‐reduced variant. Biotechnol. Bioeng. 2009;102: 1416–1427. © 2008 Wiley Periodicals, Inc.  相似文献   

20.
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo , can be used with confidence by researchers, both for single and multilevel multiple imputation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号