首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1–0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.  相似文献   

2.
3.
4.
Knowing the coordination number and relative solvent accessibility of all the residues in a protein is crucial for deriving constraints useful in modeling protein folding and protein structure and in scoring remote homology searches. We develop ensembles of bidirectional recurrent neural network architectures to improve the state of the art in both contact and accessibility prediction, leveraging a large corpus of curated data together with evolutionary information. The ensembles are used to discriminate between two different states of residue contacts or relative solvent accessibility, higher or lower than a threshold determined by the average value of the residue distribution or the accessibility cutoff. For coordination numbers, the ensemble achieves performances ranging within 70.6-73.9% depending on the radius adopted to discriminate contacts (6A-12A). These performances represent gains of 16-20% over the baseline statistical predictor, always assigning an amino acid to the largest class, and are 4-7% better than any previous method. A combination of different radius predictors further improves performance. For accessibility thresholds in the relevant 15-30% range, the ensemble consistently achieves a performance above 77%, which is 10-16% above the baseline prediction and better than other existing predictors, by up to several percentage points. For both problems, we quantify the improvement due to evolutionary information in the form of PSI-BLAST-generated profiles over BLAST profiles. The prediction programs are implemented in the form of two web servers, CONpro and ACCpro, available at http://promoter.ics.uci.edu/BRNN-PRED/.  相似文献   

5.
《Journal of molecular biology》2019,431(12):2320-2330
Short insertions and deletions (InDels) are a common type of mutation found in nature and a useful source of variation in protein engineering. InDel events have important consequences in protein evolution, often opening new pathways for adaptation. However, much less is known about the effects of InDels compared to point mutations and amino acid substitutions. In particular, deep mutagenesis studies on the distribution of fitness effects of mutations have focused almost exclusively on amino acid substitutions. Here, we present a near-comprehensive analysis of the fitness effects of single amino acid InDels in TEM-1 β-lactamase. While we found InDels to be largely deleterious, partially overlapping deletion-tolerant and insertion-tolerant regions were observed throughout the protein, especially in unstructured regions and at the end of helices. The signal sequence of TEM-1 tolerated InDels more than the mature protein. Most regions of the protein tolerated insertions more than deletions, but a few regions tolerated deletions more than insertions. We examined the relationship between InDel tolerance and a variety of measures to help understand its origin. These measures included evolutionary variation in β-lactamases, secondary structure identity, tolerance to amino acid substitutions, solvent accessibility, and side-chain weighted contact number. We found secondary structure, weighted contact number, and evolutionary variation in class A beta-lactamases to be the somewhat predictive of InDel fitness effects.  相似文献   

6.
We developed a model of macromolecular interfaces based on the Voronoi diagram and the related alpha-complex, and we tested its properties on a set of 96 protein-protein complexes taken from the Protein Data Bank. The Voronoi model provides a natural definition of the interfaces, and it yields values of the number of interface atoms and of the interface area that have excellent correlation coefficients with those of the classical model based on solvent accessibility. Nevertheless, some atoms that do not lose solvent accessibility are part of the interface defined by the Voronoi model. The Voronoi model provides robust definitions of the curvature and of the connectivity of the interfaces, and leads to estimates of these features that generally agree with other approaches. Our implementation of the model allows an analysis of protein-water contacts that highlights the role of structural water molecules at protein-protein interfaces.  相似文献   

7.
8.
An essential step in understanding the molecular basis of protein-protein interactions is the accurate identification of inter-protein contacts. We evaluate a number of common methods used in analyzing protein-protein interfaces: a Voronoi polyhedra-based approach, changes in solvent accessible surface area (DeltaSASA) and various radial cutoffs (closest atom, Cbeta, and centroid). First, we compared the Voronoi polyhedra-based analysis to the DeltaSASA and show that using Voronoi polyhedra finds knob-in-hole contacts. To assess the accuracy between the Voronoi polyhedra-based approach and the various radial cutoff methods, two sets of data were used: a small set of 75 experimental mutants and a larger one of 592 structures of protein-protein interfaces. In an assessment using the small set, the Voronoi polyhedra-based methods, a solvent accessible surface area method, and the closest atom radial method identified 100% of the direct contacts defined by mutagenesis data, but only the Voronoi polyhedra-based method found no false positives. The other radial methods were not able to find all of the direct contacts even using a cutoff of 9A. With the larger set of structures, we compared the overall number contacts using the Voronoi polyhedra-based method as a standard. All the radial methods using a 6-A cutoff identified more interactions, but these putative contacts included many false positives as well as missed many false negatives. While radial cutoffs are quicker to calculate as well as to implement, this result highlights why radial cutoff methods do not have the proper resolution to detail the non-homogeneous packing within protein interfaces, and suggests an inappropriate bias in pair-wise contact potentials. Of the radial cutoff methods, using the closest atom approach exhibits the best approximation to the more intensive Voronoi calculation. Our version of the Voronoi polyhedra-based method QContacts is available at .  相似文献   

9.
A three-dimensional Voronoi tessellation of folded proteins is used to analyze geometrical and topological properties of a set of proteins. To each amino acid is associated a central point surrounded by a Voronoi cell. Voronoi cells describe the packing of the amino acids. Special attention is given to reproduction of the protein surface. Once the Voronoi cells are built, a lot of tools from geometrical analysis can be applied to investigate the protein structure; volume of cells, number of faces per cell, and number of sides per face are the usual signatures of the protein structure. A distinct difference between faces related to primary, secondary, and tertiary structures has been observed. Faces threaded by the main-chain have on average more than six edges, whereas those related to helical packing of the amino acid chain have less than five edges. The faces on the protein surface have on average five edges within 1% error. The average number of faces on the protein surface for a given type of amino acid brings a new point of view in the characterization of the exposition to the solvent and the classification of amino acid as hydrophilic or hydrophobic. It may be a convenient tool for model validation.  相似文献   

10.
MOTIVATION: Geometric representations of proteins and ligands, including atom volumes, atom-atom contacts and solvent accessible surfaces, can be used to characterize interactions between and within proteins, ligands and solvent. Voronoi algorithms permit quantification of these properties by dividing structures into cells with a one-to-one correspondence with constituent atoms. As there is no generally accepted measure of atom-atom contacts, a continuous analytical representation of inter-atomic contacts will be useful. Improved geometric algorithms will also be helpful in increasing the speed and accuracy of iterative modeling algorithms. RESULTS: We present computational methods based on the Voronoi procedure that provide rapid and exact solutions to solvent accessible surfaces, volumes, and atom contacts within macromolecules. Furthermore, we define a measure of atom-atom contact that is consistent with the calculation of solvent accessible surfaces, allowing the integration of solvent accessibility and inter-atomic contacts into a continuous measure. The speed and accuracy of the algorithm is compared to existing methods for calculating solvent accessible surfaces and volumes. The presented algorithm has a reduced execution time and greater accuracy compared to numerical and approximate analytical surface calculation algorithms, and a reduced execution time and similar accuracy to existing Voronoi procedures for calculating atomic surfaces and volumes.  相似文献   

11.
Protein structure prediction methods typically use statistical potentials, which rely on statistics derived from a database of know protein structures. In the vast majority of cases, these potentials involve pairwise distances or contacts between amino acids or atoms. Although some potentials beyond pairwise interactions have been described, the formulation of a general multibody potential is seen as intractable due to the perceived limited amount of data. In this article, we show that it is possible to formulate a probabilistic model of higher order interactions in proteins, without arbitrarily limiting the number of contacts. The success of this approach is based on replacing a naive table‐based approach with a simple hierarchical model involving suitable probability distributions and conditional independence assumptions. The model captures the joint probability distribution of an amino acid and its neighbors, local structure and solvent exposure. We show that this model can be used to approximate the conditional probability distribution of an amino acid sequence given a structure using a pseudo‐likelihood approach. We verify the model by decoy recognition and site‐specific amino acid predictions. Our coarse‐grained model is compared to state‐of‐art methods that use full atomic detail. This article illustrates how the use of simple probabilistic models can lead to new opportunities in the treatment of nonlocal interactions in knowledge‐based protein structure prediction and design. Proteins 2013; 81:1340–1350. © 2013 Wiley Periodicals, Inc.  相似文献   

12.
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein–protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker’s yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.  相似文献   

13.
Protein-protein crystal-packing contacts.   总被引:3,自引:1,他引:2       下载免费PDF全文
Protein-protein contacts in monomeric protein crystal structures have been analyzed and compared to the physiological protein-protein contacts in oligomerization. A number of features differentiate the crystal-packing contacts from the natural contacts occurring in multimeric proteins. The area of the protein surface patches involved in packing contacts is generally smaller and its amino acid composition is indistinguishable from that of the protein surface accessible to the solvent. The fraction of protein surface in crystal contacts is very variable and independent of the number of packing contacts. The thermal motion at the crystal packing interface and that of the protein core, even for large packing interfaces, though the tendency is to be closer to that of the core. These results suggest that protein crystallization depends on random protein-protein interactions, which have little in common with physiological protein-protein recognition processes, and that the possibility of engineering macromolecular crystallization to improve crystal quality could be widened.  相似文献   

14.
To adequately deal with the inherent complexity of interactions between protein side-chains, we develop and describe here a novel method for characterizing protein packing within a fold family. Instead of approaching side-chain interactions absolutely from one residue to another, we instead consider the relative interactions of contacting residue pairs. The basic element, the pair-wise relative contact, is constructed from a sequence alignment and contact analysis of a set of structures and consists of a cluster of similarly oriented, interacting, side-chain pairs. To demonstrate this construct's usefulness in analyzing protein structure, we used the pair-wise relative contacts to analyze two sets of protein structures as defined by SCOP: the diverse globin-like superfamily (126 structures) and the more uniform heme binding globin family (a 94 structure subset of the globin-like superfamily). The superfamily structure set produced 1266 unique pair-wise relative contacts, whereas the family structure subset gave 1001 unique pair-wise relative contacts. For both sets, we show that these constructs can be used to accurately and automatically differentiate between fold classes. Furthermore, these pair-wise relative contacts correlate well with sequence identity and thus provide a direct relationship between changes in sequence and changes in structure. To capture the complexity of protein packing, these pair-wise relative contacts can be superimposed around a single residue to create a multi-body construct called a relative packing group. Construction of convex hulls around the individual packing groups provides a measure of the variation in packing around a residue and defines an approximate volume of space occupied by the groups interacting with a residue. We find that these relative packing groups are useful in understanding the structural quality of sequence or structure alignments. Moreover, they provide context to calculate a value for structural randomness, which is important in properly assessing the quality of a structural alignment. The results of this study provide the framework for future analysis for correlating sequence changes to specific structure changes.  相似文献   

15.
Wang JY  Lee HM  Ahmad S 《Proteins》2007,68(1):82-91
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized.  相似文献   

16.
To investigate the evolutionary impact of protein structure, the experimentally determined tertiary structure and the protein-coding DNA sequence were collected for each of 1,195 genes. These genes were studied via a model of sequence change that explicitly incorporates effects on evolutionary rates due to protein tertiary structure. In the model, these effects act via the solvent accessibility environments and pairwise amino acid interactions that are induced by tertiary structure. To compare the hypotheses that structure does and does not have a strong influence on evolution, Bayes factors were estimated for each of the 1,195 sequences. Most of the Bayes factors strongly support the hypothesis that protein structure affects protein evolution. Furthermore, both solvent accessibility and pairwise interactions among amino acids are inferred to have important roles in protein evolution. Our results also indicate that the strength of the relationship between tertiary structure and evolution has a weak but real correlation to the annotation information in the Gene Ontology database. Although their influences on rates of evolution vary among protein families, we find that the mean impacts of solvent accessibility and pairwise interactions are about the same.  相似文献   

17.
Afonnikov  D. A.  Morozov  A. V.  Kolchanov  N. A. 《Biophysics》2008,51(1):56-60

The profile of contact numbers of amino acid residues in proteins contains important information about the protein structure and is connected with the accessibility of residues to solvent. Here we propose a method for predicting the profile of contact numbers of residues in protein from its amino acid sequence. The method is based on regression using a neural network algorithm. The algorithm predicts two types of profiles, namely, the total number of contacts and the number of close contacts with the neighbors in the chain. The Pearson coefficient of correlation between the actual and predicted values of total contact numbers amounted to 0.526–0.703. As for the number of close contacts, this coefficient was higher (0.662–0.743) for all the considered threshold contact distances (6, 8, 10, and 12 Å). The program for prediction of contact numbers CONNP is available at http://wwwmgs2.bionet.nsc.ru/reloaded.

  相似文献   

18.
In the structural models determined by X‐ray crystallography, contacts between molecules can be divided into two categories: biologically relevant contacts and crystal packing contacts. With the growth in the number and quality of available large crystal packing contacts structures, distinguishing crystal packing contacts from biologically relevant contacts remains a difficult task, which can lead to wrong interpretation of structural models. In this study, we performed a systematic analysis on the biologically relevant contacts and crystal packing contacts. The analysis results reveal that biologically contacts are more tightly packed than crystal packing contacts. This property of biologically contacts may contribute to the formation of their interfacial core region. Meanwhile, the differences between the core and surface region of biologically contacts in amino acid composition and evolutionary measure are more dramatic than crystal packing contacts and these differences appear to be useful in distinguishing these two categories of contacts. On the basis of the features derived from our analysis, we developed a random forest model to classify biological relevant contacts and crystal packing contacts. Our method can achieve a high receiver operating curve of 0.923 in the 5‐fold cross‐validation and accuracies of 91.4% and 91.7% for two different test sets. Moreover, in a comparison study, our model outperforms other existing methods, such as DiMoVo, Pita, Pisa, and Eppic. We believe that this study will provide useful help in the validation of oligomeric proteins and protein complexes. The model and all data used in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/bio‐cry.zip . Proteins 2014; 82:3090–3100. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
We have recently showed that the weighted contact number profiles (or the packing density profiles) of proteins are well correlated with those of the corresponding sequence conservation profiles. The results suggest that a protein structure may contain sufficient information about sequence conservation comparable to that derived from multiple homologous sequences. However, there are ambiguities concerning how to compute the packing density of the subunit of a protein complex. For the subunits of a complex, there are different ways to compute its packing density – one including the packing contributions of the other subunits and the other one excluding their contributions. Here we selected two sets of enzyme complexes. Set A contains complexes with the active sites comprising residues from multiple subunits, while set B contains those with the active sites residing on single subunits. In Set A, if the packing density profile of a subunit is computed considering the contributions of the other subunits of the complex, it will agree better with the sequence conservation profile. But in Set B the situations are reversed. The results may be due to the stronger functional and structural constraints on the evolution processes on the complexes of Set A than those of Set B to maintain the enzymatic functions of the complexes. The comparison of the packing density and the sequence conservation profiles may provide a simple yet potentially useful way to understanding the structural and evolutionary couplings between the subunits of protein complexes. Proteins 2013; 81:1192–1199. © 2013 Wiley Periodicals, Inc.  相似文献   

20.
Markovian models of protein evolution that relax the assumption of independent change among codons are considered. With this comparatively realistic framework, an evolutionary rate at a site can depend both on the state of the site and on the states of surrounding sites. By allowing a relatively general dependence structure among sites, models of evolution can reflect attributes of tertiary structure. To quantify the impact of protein structure on protein evolution, we analyze protein-coding DNA sequence pairs with an evolutionary model that incorporates effects of solvent accessibility and pairwise interactions among amino acid residues. By explicitly considering the relationship between nonsynonymous substitution rates and protein structure, this approach can lead to refined detection and characterization of positive selection. Analyses of simulated sequence pairs indicate that parameters in this evolutionary model can be well estimated. Analyses of lysozyme c and annexin V sequence pairs yield the biologically reasonable result that amino acid replacement rates are higher when the replacements lead to energetically favorable proteins than when they destabilize the proteins. Although the focus here is evolutionary dependence among codons that is associated with protein structure, the statistical approach is quite general and could be applied to diverse cases of evolutionary dependence where surrogates for sequence fitness can be measured or modeled.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号