首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
BackgroundAtomic Force Microscopy (AFM) is an experimental technique to study structure-function relationship of biomolecules. AFM provides images of biomolecules at nanometer resolution. High-speed AFM experiments produce a series of images following dynamics of biomolecules. To further understand biomolecular functions, information on three-dimensional (3D) structures is beneficial.MethodWe aim to recover 3D information from an AFM image by computational modeling. The AFM image includes only low-resolution representation of a molecule; therefore we represent the structures by a coarse grained model (Gaussian mixture model). Using Monte-Carlo sampling, candidate models are generated to increase similarity between AFM images simulated from the models and target AFM image.ResultsThe algorithm was tested on two proteins to model their conformational transitions. Using a simulated AFM image as reference, the algorithm can produce a low-resolution 3D model of the target molecule. Effect of molecular orientations captured in AFM images on the 3D modeling performance was also examined and it is shown that similar accuracy can be obtained for many orientations.ConclusionsThe proposed algorithm can generate 3D low-resolution protein models, from which conformational transitions observed in AFM images can be interpreted in more detail.General significanceHigh-speed AFM experiments allow us to directly observe biomolecules in action, which provides insights on biomolecular function through dynamics. However, as only partial structural information can be obtained from AFM data, this new AFM based hybrid modeling method would be useful to retrieve 3D information of the entire biomolecule.  相似文献   

2.
A genetic algorithm-based computational method for the ab initio phasing of diffraction data from crystals of symmetric macromolecular structures, such as icosahedral viruses, has been implemented and applied to authentic data from the P1/Mahoney strain of poliovirus. Using only single-wavelength native diffraction data, the method is shown to be able to generate correct phases, and thus electron density, to 3.0 A resolution. Beginning with no advance knowledge of the shape of the virus and only approximate knowledge of its size, the method uses a genetic algorithm to determine coarse, low-resolution (here, 20.5 A) models of the virus that obey the known non-crystallographic symmetry (NCS) constraints. The best scoring of these models are subjected to refinement and NCS-averaging, with subsequent phase extension to high resolution (3.0 A). Initial difficulties in phase extension were overcome by measuring and including all low-resolution terms in the transform. With the low-resolution data included, the method was successful in generating essentially correct phases and electron density to 6.0 A in every one of ten trials from different models identified by the genetic algorithm. Retrospective analysis revealed that these correct high-resolution solutions converged from a range of significantly different low-resolution phase sets (average differences of 59.7 degrees below 24 A). This method represents an efficient way to determine phases for icosahedral viruses, and has the advantage of producing phases free from model bias. It is expected that the method can be extended to other protein systems with high NCS.  相似文献   

3.
Single-particle cryo-electron microscopy is widely used to study the structure of macromolecular assemblies. Tens of thousands of noisy two-dimensional images of the macromolecular assembly viewed from different directions are used to infer its three-dimensional structure. The first step is to estimate a low-resolution initial model and initial image orientations. This is a challenging global optimization problem with many unknowns, including an unknown orientation for each two-dimensional image. Obtaining a good initial model is crucial for the success of the subsequent refinement step. We introduce a probabilistic algorithm for estimating an initial model. The algorithm is fast, has very few algorithmic parameters, and yields information about the precision of estimated model parameters in addition to the parameters themselves. Our algorithm uses a pseudo-atomic model to represent the low-resolution three-dimensional structure, with isotropic Gaussian components as moveable pseudo-atoms. This leads to a significant reduction in the number of parameters needed to represent the three-dimensional structure, and a simplified way of computing two-dimensional projections. It also contributes to the speed of the algorithm. We combine the estimation of the unknown three-dimensional structure and image orientations in a Bayesian framework. This ensures that there are very few parameters to set, and specifies how to combine different types of prior information about the structure with the given data in a systematic way. To estimate the model parameters we use Markov chain Monte Carlo sampling. The advantage is that instead of just obtaining point estimates of model parameters, we obtain an ensemble of models revealing the precision of the estimated parameters. We demonstrate the algorithm on both simulated and real data.  相似文献   

4.
We predicted structures for all seven targets in the CAPRI experiment using a new method in development at the time of the challenge. The technique includes a low-resolution rigid body Monte Carlo search followed by high-resolution refinement with side-chain conformational changes and rigid body minimization. Decoys (approximately 10(6) per target) were discriminated using a scoring function including van der Waals and solvation interactions, hydrogen bonding, residue-residue pair statistics, and rotamer probabilities. Decoys were ranked, clustered, manually inspected, and selected. The top ranked model for target 6 predicted the experimental structure to 1.5 A RMSD and included 48 of 65 correct residue-residue contacts. Target 7 was predicted at 5.3 A RMSD with 22 of 37 correct residue-residue contacts using a homology model from a known complex structure. Using a preliminary version of the protocol in round 1, target 1 was predicted within 8.8 A although few contacts were correct. For targets 2 and 3, the interface locations and a small fraction of the contacts were correctly identified.  相似文献   

5.
The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins.  相似文献   

6.
7.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

8.
Shape information about macromolecules is increasingly available but is difficult to use in modeling efforts. We demonstrate that shape information alone can often distinguish structural models of biological macromolecules. By using a data structure called a surface envelope (SE) to represent the shape of the molecule, we propose a method that generates a fitness score for the shape of a particular molecular model. This score correlates well with root mean squared deviation (RMSD) of the model to the known test structures and can be used to filter models in decoy sets. The scoring method requires both alignment of the model to the SE in three-dimensional space and assessment of the degree to which atoms in the model fill the SE. Alignment combines a hybrid algorithm using principal components and a previously published iterated closest point algorithm. We test our method against models generated from random atom perturbation from crystal structures, published decoy sets used in structure prediction, and models created from the trajectories of atoms in molecular modeling runs. We also test our alignment algorithm against experimental electron microscopic data from rice dwarf virus. The alignment performance is reliable, and we show a high correlation between model RMSD and score function. This correlation is stronger for molecular models with greater oblong character (as measured by the ratio of largest to smallest principal component).  相似文献   

9.
We recently developed the Rosetta algorithm for ab initio protein structure prediction, which generates protein structures from fragment libraries using simulated annealing. The scoring function in this algorithm favors the assembly of strands into sheets. However, it does not discriminate between different sheet motifs. After generating many structures using Rosetta, we found that the folding algorithm predominantly generates very local structures. We surveyed the distribution of beta-sheet motifs with two edge strands (open sheets) in a large set of non-homologous proteins. We investigated how much of that distribution can be accounted for by rules previously published in the literature, and developed a filter and a scoring method that enables us to improve protein structure prediction for beta-sheet proteins. Proteins 2002;48:85-97.  相似文献   

10.
Yang YD  Park C  Kihara D 《Proteins》2008,73(3):581-596
Optimizing weighting factors for a linear combination of terms in a scoring function is a crucial step for success in developing a threading algorithm. Usually weighting factors are optimized to yield the highest success rate on a training dataset, and the determined constant values for the weighting factors are used for any target sequence. Here we explore completely different approaches to handle weighting factors for a scoring function of threading. Throughout this study we use a model system of gapless threading using a scoring function with two terms combined by a weighting factor, a main chain angle potential and a residue contact potential. First, we demonstrate that the optimal weighting factor for recognizing the native structure differs from target sequence to target sequence. Then, we present three novel threading methods which circumvent training dataset-based weighting factor optimization. The basic idea of the three methods is to employ different weighting factor values and finally select a template structure for a target sequence by examining characteristics of the distribution of scores computed by using the different weighting factor values. Interestingly, the success rate of our approaches is comparable to the conventional threading method where the weighting factor is optimized based on a training dataset. Moreover, when the size of the training set available for the conventional threading method is small, our approach often performs better. In addition, we predict a target-specific weighting factor optimal for a target sequence by an artificial neural network from features of the target sequence. Finally, we show that our novel methods can be used to assess the confidence of prediction of a conventional threading with an optimized constant weighting factor by considering consensus prediction between them. Implication to the underlined energy landscape of protein folding is discussed.  相似文献   

11.
Bowman GR  Pande VS 《Proteins》2009,74(3):777-788
Rosetta is a structure prediction package that has been employed successfully in numerous protein design and other applications.1 Previous reports have attributed the current limitations of the Rosetta de novo structure prediction algorithm to inadequate sampling, particularly during the low-resolution phase.2-5 Here, we implement the Simulated Tempering (ST) sampling algorithm67 in Rosetta to address this issue. ST is intended to yield canonical sampling by inducing a random walk in temperatures space such that broad sampling is achieved at high temperatures and detailed exploration of local free energy minima is achieved at low temperatures. ST should therefore visit basins in accordance with their free energies rather than their energies and achieve more global sampling than the localized scheme currently implemented in Rosetta. However, we find that ST does not improve structure prediction with Rosetta. To understand why, we carried out a detailed analysis of the low-resolution scoring functions and find that they do not provide a strong bias towards the native state. In addition, we find that both ST and standard Rosetta runs started from the native state are biased away from the native state. Although the low-resolution scoring functions could be improved, we propose that working entirely at full-atom resolution is now possible and may be a better option due to superior native-state discrimination at full-atom resolution. Such an approach will require more attention to the kinetics of convergence, however, as functions capable of native state discrimination are not necessarily capable of rapidly guiding non-native conformations to the native state.  相似文献   

12.
The rise in the number of functionally uncharacterized protein structures is increasing the demand for structure-based methods for functional annotation. Here, we describe a method for predicting the location of a binding site of a given type on a target protein structure. The method begins by constructing a scoring function, followed by a Monte Carlo optimization, to find a good scoring patch on the protein surface. The scoring function is a weighted linear combination of the z-scores of various properties of protein structure and sequence, including amino acid residue conservation, compactness, protrusion, convexity, rigidity, hydrophobicity, and charge density; the weights are calculated from a set of previously identified instances of the binding-site type on known protein structures. The scoring function can easily incorporate different types of information useful in localization, thus increasing the applicability and accuracy of the approach. To test the method, 1008 known protein structures were split into 20 different groups according to the type of the bound ligand. For nonsugar ligands, such as various nucleotides, binding sites were correctly identified in 55%-73% of the cases. The method is completely automated (http://salilab.org/patcher) and can be applied on a large scale in a structural genomics setting.  相似文献   

13.
Huang SY  Zou X 《Proteins》2011,79(9):2648-2661
In this study, we have developed a statistical mechanics-based iterative method to extract statistical atomic interaction potentials from known, nonredundant protein structures. Our method circumvents the long-standing reference state problem in deriving traditional knowledge-based scoring functions, by using rapid iterations through a physical, global convergence function. The rapid convergence of this physics-based method, unlike other parameter optimization methods, warrants the feasibility of deriving distance-dependent, all-atom statistical potentials to keep the scoring accuracy. The derived potentials, referred to as ITScore/Pro, have been validated using three diverse benchmarks: the high-resolution decoy set, the AMBER benchmark decoy set, and the CASP8 decoy set. Significant improvement in performance has been achieved. Finally, comparisons between the potentials of our model and potentials of a knowledge-based scoring function with a randomized reference state have revealed the reason for the better performance of our scoring function, which could provide useful insight into the development of other physical scoring functions. The potentials developed in this study are generally applicable for structural selection in protein structure prediction.  相似文献   

14.
Kawabata T 《Biophysical journal》2008,95(10):4643-4658
Recently, electron microscopy measurement of single particles has enabled us to reconstruct a low-resolution 3D density map of large biomolecular complexes. If structures of the complex subunits can be solved by x-ray crystallography at atomic resolution, fitting these models into the 3D density map can generate an atomic resolution model of the entire large complex. The fitting of multiple subunits, however, generally requires large computational costs; therefore, development of an efficient algorithm is required. We developed a fast fitting program, “gmfit”, which employs a Gaussian mixture model (GMM) to represent approximated shapes of the 3D density map and the atomic models. A GMM is a distribution function composed by adding together several 3D Gaussian density functions. Because our model analytically provides an integral of a product of two distribution functions, it enables us to quickly calculate the fitness of the density map and the atomic models. Using the integral, two types of potential energy function are introduced: the attraction potential energy between a 3D density map and each subunit, and the repulsion potential energy between subunits. The restraint energy for symmetry is also employed to build symmetrical origomeric complexes. To find the optimal configuration of subunits, we randomly generated initial configurations of subunit models, and performed a steepest-descent method using forces and torques of the three potential energies. Comparison between an original density map and its GMM showed that the required number of Gaussian distribution functions for a given accuracy depended on both resolution and molecular size. We then performed test fitting calculations for simulated low-resolution density maps of atomic models of homodimer, trimer, and hexamer, using different search parameters. The results indicated that our method was able to rebuild atomic models of a complex even for maps of 30 Å resolution if sufficient numbers (eight or more) of Gaussian distribution functions were employed for each subunit, and the symmetric restraints were assigned for complexes with more than three subunits. As a more realistic test, we tried to build an atomic model of the GroEL/ES complex by fitting 21-subunit atomic models into the 3D density map obtained by cryoelectron microscopy using the C7 symmetric restraints. A model with low root mean-square deviations (14.7 Å) was obtained as the lowest-energy model, showing that our fitting method was reasonably accurate. Inclusion of other restraints from biological and biochemical experiments could further enhance the accuracy.  相似文献   

15.
Bayesian adaptive sequence alignment algorithms   总被引:3,自引:1,他引:2  
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. We describe here an algorithm, the 'Bayes block aligner, which bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest. Furthermore, instead of returning the single best alignment for the chosen parameter settings, this algorithm returns the posterior distribution of all alignments considering the full range of gapping and scoring matrices selected, weighing each in proportion to its probability based on the data. We compared the Bayes aligner with the popular Smith-Waterman algorithm with parameter settings from the literature which had been optimized for the identification of structural neighbors, and found that the Bayes aligner correctly identified more structural neighbors. In a detailed examination of the alignment of a pair of kinase and a pair of GTPase sequences, we illustrate the algorithm's potential to identify subsequences that are conserved to different degrees. In addition, this example shows that the Bayes aligner returns an alignment-free assessment of the distance between a pair of sequences.   相似文献   

16.
17.
Zhiqiang Yan  Jin Wang 《Proteins》2015,83(9):1632-1642
Solvation effect is an important factor for protein–ligand binding in aqueous water. Previous scoring function of protein–ligand interactions rarely incorporates the solvation model into the quantification of protein–ligand interactions, mainly due to the immense computational cost, especially in the structure‐based virtual screening, and nontransferable application of independently optimized atomic solvation parameters. In order to overcome these barriers, we effectively combine knowledge‐based atom–pair potentials and the atomic solvation energy of charge‐independent implicit solvent model in the optimization of binding affinity and specificity. The resulting scoring functions with optimized atomic solvation parameters is named as specificity and affinity with solvation effect (SPA‐SE). The performance of SPA‐SE is evaluated and compared to 20 other scoring functions, as well as SPA. The comparative results show that SPA‐SE outperforms all other scoring functions in binding affinity prediction and “native” pose identification. Our optimization validates that solvation effect is an important regulator to the stability and specificity of protein–ligand binding. The development strategy of SPA‐SE sets an example for other scoring function to account for the solvation effect in biomolecular recognitions. Proteins 2015; 83:1632–1642. © 2015 Wiley Periodicals, Inc.  相似文献   

18.
Pairs of helices in transmembrane (TM) proteins are often tightly packed. We present a scoring function and a computational methodology for predicting the tertiary fold of a pair of alpha-helices such that its chances of being tightly packed are maximized. Since the number of TM protein structures solved to date is small, it seems unlikely that a reliable scoring function derived statistically from the known set of TM protein structures will be available in the near future. We therefore constructed a scoring function based on the qualitative insights gained in the past two decades from the solved structures of TM and soluble proteins. In brief, we reward the formation of contacts between small amino acid residues such as Gly, Cys, and Ser, that are known to promote dimerization of helices, and penalize the burial of large amino acid residues such as Arg and Trp. As a case study, we show that our method predicts the native structure of the TM homodimer glycophorin A (GpA) to be, in essence, at the global score optimum. In addition, by correlating our results with empirical point mutations on this homodimer, we demonstrate that our method can be a helpful adjunct to mutation analysis. We present a data set of canonical alpha-helices from the solved structures of TM proteins and provide a set of programs for analyzing it (http://ashtoret.tau.ac.il/~sarel). From this data set we derived 11 helix pairs, and conducted searches around their native states as a further test of our method. Approximately 73% of our predictions showed a reasonable fit (RMS deviation <2A) with the native structures compared to the success rate of 8% expected by chance. The search method we employ is less effective for helix pairs that are connected via short loops (<20 amino acid residues), indicating that short loops may play an important role in determining the conformation of alpha-helices in TM proteins.  相似文献   

19.
Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.  相似文献   

20.
Specification of the three dimensional structure of a protein from its amino acid sequence, also called a “Grand Challenge” problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD < = 5 ?) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP—largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号