首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge-based potentials are widely used in simulations of protein folding, structure prediction, and protein design. Their advantages include limited computational requirements and the ability to deal with low-resolution protein models compatible with long-scale simulations. Their drawbacks comprehend their dependence on specific features of the dataset from which they are derived, such as the size of the proteins it contains, and their physical meaning is still a subject of debate. We address these issues by probing the theoretical validity of these potentials as mean-force potentials that take the solvent implicitly into account and involve entropic contributions due to atomic degrees of freedom and solvation. The dependence on the size of the system is checked on distance-dependent amino acid pair potentials, derived from six protein structure sets containing proteins of increasing length N. For large inter-residue distances, they are found to display the theoretically predicted 1/N behavior weighted by a factor depending on the boundaries and the compressibility of the system. For short distances, different trends are observed according to the nature of the residue pairs and their ability to form, for example, electrostatic, cation-pi or pi-pi interactions, or hydrophobic packing. The results of this analysis are used to devise a novel protein size-dependent distance potential, which displays an improved performance in discriminating native sequence-structure matches among decoy models.  相似文献   

2.
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge‐based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue‐residue or atom‐atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation‐based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single‐model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa . Proteins 2017; 85:1131–1145. © 2017 Wiley Periodicals, Inc.  相似文献   

3.
Cheng J  Pei J  Lai L 《Biophysical journal》2007,92(11):3868-3877
Statistical potentials have been widely used in protein studies despite the much-debated theoretical basis. In this work, we have applied two physical reference states for deriving the statistical potentials based on protein structure features to achieve zero interaction and orthogonalization. The free-rotating chain-based potential applies a local free-rotating chain reference state, which could theoretically be described by the Gaussian distribution. The self-avoiding chain-based potential applies a reference state derived from a database of artificial self-avoiding backbones generated by Monte Carlo simulation. These physical reference states are independent of known protein structures and are based solely on the analytical formulation or simulation method. The new potentials performed better and yielded higher Z-scores and success rates compared to other statistical potentials. The end-to-end distance distribution produced by the self-avoiding chain model was similar to the distance distribution of protein atoms in structure database. This fact may partly explain the basis of the reference states that depend on the atom pair frequency observed in the protein database. The current study showed that a more physical reference model improved the performance of statistical potentials in protein fold recognition, which could also be extended to other types of applications.  相似文献   

4.
The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar properties on the residue-level, such as the distances between two residues and the angles formed by short sequences of residues, can be equally important for structural analysis and modeling, but these have not been examined and documented on a similar scale. While these properties are difficult to measure experimentally, they can be statistically estimated in meaningful ways based on their distributions in known proteins structures. Residue-level structural properties including various types of residue distances and angles are estimated statistically. A software package is built to provide direct access to the statistical data for the properties including some important correlations not previously investigated. The distributions of residue distances and angles may vary with varying sequences, but in most cases, are concentrated in some high probability ranges, corresponding to their frequent occurrences in either α-helices or β-sheets. Strong correlations among neighboring residue angles, similar to those between neighboring torsion angles at the atomic-level, are revealed based on their statistical measures. Residue-level statistical potentials can be defined using the statistical distributions and correlations of the residue distances and angles. Ramachandran-like plots for strongly correlated residue angles are plotted and analyzed. Their applications to structural evaluation and refinement are demonstrated. With the increase in both number and quality of known protein structures, many structural properties can be derived from sets of protein structures by statistical analysis and data mining, and these can even be used as a supplement to the experimental data for structure determinations. Indeed, the statistical measures on various types of residue distances and angles provide more systematic and quantitative assessments on these properties, which can otherwise be estimated only individually and qualitatively. Their distributions and correlations in known protein structures show their importance for providing insights into how proteins may fold naturally to various residue-level structures.  相似文献   

5.
The structure of thymidylate synthase (TS) from Escherichia coli was solved from cubic crystals with a = 133 A grown under reducing conditions at pH 7.0, and refined to R = 22% at 2.1 A resolution. The structure is compared with that from Lactobacillus casei solved to R = 21% at 2.3 A resolution. The structures are compared using a difference distance matrix, which identifies a common core of residues that retains the same relationship to one another in both species. After subtraction of the effects of a 50 amino acid insert present in Lactobacillus casei, differences in position of atoms correlate with temperature factors and with distance from the nearest substituted residue. The dependence of structural difference on thermal factor is parameterized and reflects both errors in coordinates that correlate with thermal factor, and the increased width of the energy well in which atoms of high thermal factor lie. The dependence of structural difference on distance from the nearest substitution also depends on thermal factors and shows an exponential dependence with half maximal effect at 3.0 A from the substitution. This represents the plastic accommodation of the protein which is parameterized in terms of thermal B factor and distance from a mutational change.  相似文献   

6.
The goal of controlling protein thermostability is tackled here through establishing, by in silico analyses, the relative weight of residue-residue interactions in proteins as a function of temperature. We have designed for that purpose a (melting-) temperature-dependent, statistical distance potential, where the interresidue distances are computed between the side-chain geometric centers or their functional centers. Their separate derivation from proteins of either high or low thermal resistance reveals the interactions that contribute most to stability in different temperature ranges. Thermostabilizing interactions include salt bridges and cation-π interactions (especially those involving arginine), aromatic interactions, and H-bonds between negatively charged and some aromatic residues. In contrast, H-bonds between two polar noncharged residues or between a polar noncharged residue and a negatively charged residue are relatively less stabilizing at high temperatures. An important observation is that it is necessary to consider both repulsive and attractive interactions in overall thermostabilization, as the degree of repulsion may also vary with temperature. These temperature-dependent potentials are not only useful for the identification of meso- and thermostabilizing pair interactions, but also exhibit predictive power, as illustrated by their ability to predict the melting temperature of a protein based on the melting temperature of homologous proteins.  相似文献   

7.
The dependence of the lateral distribution of membrane proteins on the size, protein/lipoid molar ratio, and the magnitude of the interaction potentials has been investigated by computer modeling protein-lipid distributions with Monte Carlo calculations. These results have allowed us to develop a quantitative characterization of the distribution of membrane proteins and to correlate these distributions with experimental observables. The topological arrangement of protein domains, protein plus annular lipid domains, and free lipid domains is described in terms of radial distribution, pair connectedness, and cluster distribution functions. The radial distribution functions are used to measure the distribution of intermolecular distances between protein molecules, whereas the pair connectedness functions are used to estimate the physical extension of compositional domains. It is shown that, at characteristic protein/lipid molar ratios, previously isolated domains become connected, forming domain networks that extend over the entire membrane surface. These changes in the lateral connectivity of compositional domains are paralleled by changes in the calculated lateral diffusion coefficients and might have important implications for the regulation of diffusion controlled processes within the membrane.  相似文献   

8.
Statistical potentials for fold assessment   总被引:3,自引:0,他引:3       下载免费PDF全文
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.  相似文献   

9.
A long standing goal in protein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experimental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge-based statistical pair potentials have attracted considerable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statistical pair potentials derived from protein three-dimensional structures, addressing in particular questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 Å). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary structure it contains. We find, for example, that potentials derived from a database containing α-proteins will only perform best on α-proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database. Proteins 31:139–149, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

10.
Post‐translational modifications (PTMs) represent an important regulatory layer influencing the structure and function of proteins. With broader availability of experimental information on the occurrences of different PTM types, the investigation of a potential “crosstalk” between different PTM types and combinatorial effects have moved into the research focus. Hypothesizing that relevant interferences between different PTM types and sites may become apparent when investigating their mutual physical distances, we performed a systematic survey of pairwise homo‐ and heterotypic distances of seven frequent PTM types considering their sequence and spatial distances in resolved protein structures. We found that actual PTM site distance distributions differ from random distributions with most PTM type pairs exhibiting larger than expected distances with the exception of homotypic phosphorylation site distances and distances between phosphorylation and ubiquitination sites that were found to be closer than expected by chance. Random reference distributions considering canonical acceptor amino acid residues only were found to be shifted to larger distances compared to distances between any amino acid residue type indicating an underlying tendency of PTM‐amenable residue types to be further apart than randomly expected. Distance distributions based on sequence separations were found largely consistent with their spatial counterparts suggesting a primary role of sequence‐based pairwise PTM‐location encoding rather than folding‐mediated effects. Our analysis provides a systematic and comprehensive overview of the characteristics of pairwise PTM site distances on proteins and reveals that, predominantly, PTM sites tend to avoid close proximity with the potential implication that an independent attachment or removal of PTMs remains possible. Proteins 2016; 85:78–92. © 2016 Wiley Periodicals, Inc.  相似文献   

11.
Statistical potentials are frequently engaged in the protein structural prediction and protein folding for conformational evaluation. Theoretically, to describe the many‐body effect, pairwise interaction between two atom groups should be corrected by their relative geometric orientation. The potential functions developed by this means are called orientation‐dependent statistical potentials and have exhibited substantially improved performance. However, none of the currently available orientation‐dependent statistical potentials use any reference state, which has been proven to greatly enhance the power of distance‐dependent statistical potentials in numerous previous studies. In this work, we designed a reasonable reference state for the orientation‐dependent statistical potentials: using the average geometric relationship between atom pairs in known structures by neglecting their residue identities. The statistical potential developed using this reference state (called ORDER_AVE) prevails most available rival potentials in a series of tests on the decoy sets, although the information of side chain atoms (except the β‐carbon) is absent in its construction. Proteins 2014; 82:2383–2393. © 2014 Wiley Periodicals, Inc.  相似文献   

12.
Anomalous small angle X-ray scattering can in principle be used to determine distances between metal label species on biological molecules. Previous experimental studies in the past were unable to distinguish the label-label scattering contribution from that of the molecule, because of the use of atomic labels; these labels contribute only a small proportion of the total scattering signal. However, with the development of nanocrystal labels (of 50–100 atoms) there is the possibility for a renewed attempt at applying anomalous small angle X-ray scattering for distance measurement. This is because the contribution to the scattered signal is necessarily considerably stronger than for atomic labels. Here we demonstrate through simulations, the feasibility of the technique to determine the end-to-end distances of labelled nucleic acid molecules as well as other internal distances mimicking a labelled DNA binding protein if the labels are dissimilar metal nanocrystals. Of crucial importance is the ratio of mass of the nanocrystals to that of the labelled macromolecule, as well as the level of statistical errors in the scattering intensity measurements. The mathematics behind the distance determination process is presented, along with a fitting routine than incorporates maximum entropy regularisation.  相似文献   

13.
Knowledge‐based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information‐theoretic method for detecting and quantifying distance‐dependent through‐space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico‐chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge‐based tools for the evaluation of protein structure. Proteins 2014; 82:3450–3465. © 2014 Wiley Periodicals, Inc.  相似文献   

14.
Unfolded proteins attract increasing attention nowadays because of the accumulation of experimental evidence that they play an important role in different biological processes. Therefore, studies of various statistical properties of flexible protein-like polypeptide chains are becoming increasingly important as well. This paper presents distributions (histograms) of distances between atoms of titratable residues for flexible polypeptide chains with various residue compositions and with the hard-spheres potential taken into consideration. The factors influencing the parameters of the obtained histograms have been identified and analyzed. It was found that the sensitivity of the distributions with respect to the internal structure of intermediate residues increases with the number of residues between the considered charged residues. It was shown that branching at C(beta) atoms of the side chains of the intermediate residues is among the most considerable factors influencing the shape of the distance distribution and the average distance between atoms in flexible chains. Despite the model simplicity, the results of the calculations can be applied for systems with other types of interactions presented, and this was demonstrated for the charge-charge interactions. In particular, it was shown that those interactions have a significant effect on distances between the unlike charges, while such an effect for the like charges is much less pronounced. The comparison of predictions made on the basis of the presented calculations to some experimental data is also given, and possible applications of the theoretical concept described in the paper are discussed.  相似文献   

15.
Radiation hybrid mapping has become an established tool for building physical maps. It represents a powerful way of constructing YAC contigs and high-resolution maps for positional cloning experiments. Ideally, radiation hybrids should not only provide support for the true order of the markers, but also accurate estimates of the physical distances between them. Statistical analysis of radiation hybrids has proved difficult because of the number of parameters (representing the fragment retention probabilities) that must be estimated, and simplifying assumptions are needed to analyze large numbers of markers simultaneously. The ramifications of these assumptions for the calculation of physical distances are investigated. A simple two-locus model is presented to demonstrate that variation in marker retention can lead to distortions in the estimates of distance. Multilocus simulations show that, when marker retention is constant across the chromosome, good estimates of physical distance can be derived using simple models of retention. However, further simulations exploring variable retention schemes demonstrate that significant errors in the estimates of map distances can occur. Ways of minimizing these distortions are discussed.  相似文献   

16.
Solis AD  Rackovsky S 《Proteins》2008,71(3):1071-1087
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.  相似文献   

17.
sGAL is a computer program designed to find pairs of sites suitable for introducing chemical cross-links into proteins. sGAL takes a protein structure file in PDB format as input, truncates each residue sequentially to its gamma side chain atom to mimic mutation to Cys, and calculates the exposed surface area of the gamma atom. The user then inputs the minimum and maximum lengths of the cross-linker. sGAL provides as output pairs of residues that would have exposed gamma atom separations that fall within this range. Furthermore, if a line joining the pair of gamma atoms contacts more than a given number of buried atoms, that pair is discarded. In this way, sites for which the protein would sterically interfere with cross-linking are avoided. AVAILABILITY: http://www.chem.utoronto.ca/staff/GAW/links.html; (Surface Racer is also required see: http://monte.biochem.wisc.edu/~tsodikov/surface.html).  相似文献   

18.
Lu H  Lu L  Skolnick J 《Biophysical journal》2003,84(3):1895-1901
A residue-based and a heavy atom-based statistical pair potential are developed for use in assessing the strength of protein-protein interactions. To ensure the quality of the potentials, a nonredundant, high-quality dimer database is constructed. The protein complexes in this dataset are checked by a literature search to confirm that they form multimers, and the pairwise amino acid preference to interact across a protein-protein interface is analyzed and pair potentials constructed. The performance of the residue-based potentials is evaluated by using four jackknife tests and by assessing the potentials' ability to select true protein-protein interfaces from false ones. Compared to potentials developed for monomeric protein structure prediction, the interdomain potential performs much better at distinguishing protein-protein interactions. The potential developed from homodimer interfaces is almost the same as that developed from heterodimer interfaces with a correlation coefficient of 0.92. The residue-based potential is well suited for genomic scale protein interaction prediction and analysis, such as in a recently developed threading-based algorithm, MULTIPROSPECTOR. However, the more time-consuming atom-based potential performs better in identifying near-native structures from docking generated decoys.  相似文献   

19.
Residue contacts in protein structures and implications for protein folding   总被引:3,自引:0,他引:3  
The preferential association of amino acid side groups with specific side chain atoms are examined in 44 known protein structures. The resulting association potentials among residue side groups are used to detect structural homology in proteins displaying little or no homology in their primary sequences. Suggestions are also made regarding the nature of the protein folding process. They are based on statistical observations that delineate the extent of short and long range interactions and that display side group bias in association with other side chain atoms on their N-terminal side.  相似文献   

20.
Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号