首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The importance of RNA tertiary structure is evident from the growing number of published high resolution NMR and X-ray crystallographic structures of RNA molecules. These structures provide insights into function and create a knowledge base that is leveraged by programs such as Assemble, ModeRNA, RNABuilder, NAST, FARNA, Mc-Sym, RNA2D3D, and iFoldRNA for tertiary structure prediction and design. While these methods sample native-like RNA structures during simulations, all struggle to capture the native RNA conformation after scoring. We propose RSIM, an improved RNA fragment assembly method that preserves RNA global secondary structure while sampling conformations. This approach enhances the quality of predicted RNA tertiary structure, provides insights into the native state dynamics, and generates a powerful visualization of the RNA conformational space. RSIM is available for download from http://www.github.com/jpbida/rsim.  相似文献   

2.
Zhao F  Li S  Sterner BW  Xu J 《Proteins》2008,73(1):228-240
Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.  相似文献   

3.
We suggest a new approach to the generation of candidate structures (decoys) for ab initio prediction of protein structures. Our method is based on random sampling of conformation space and subsequent local energy minimization. At the core of this approach lies the design of a novel type of energy function. This energy function has local minima with native structure characteristics and wide basins of attraction. The current work presents our motivation for deriving such an energy function and also tests the derived energy function.Our approach is novel in that it takes advantage of the inherently rough energy landscape of proteins, which is generally considered a major obstacle for protein structure prediction. When local minima have wide basins of attraction, the protein's conformation space can be greatly reduced by the convergence of large regions of the space into single points, namely the local minima corresponding to these funnels. We have implemented this concept by an iterative process. The potential is first used to generate decoy sets and then we study these sets of decoys to guide further development of the potential. A key feature of our potential is the use of cooperative multi-body interactions that mimic the role of the entropic and solvent contributions to the free energy.The validity and value of our approach is demonstrated by applying it to 14 diverse, small proteins. We show that, for these proteins, the size of conformation space is considerably reduced by the new energy function. In fact, the reduction is so substantial as to allow efficient conformational sampling. As a result we are able to find a significant number of near-native conformations in random searches performed with limited computational resources.  相似文献   

4.
5.
A 3D model of RNA structure can provide information about its function and regulation that is not possible with just the sequence or secondary structure. Current models suffer from low accuracy and long running times and either neglect or presume knowledge of the long-range interactions which stabilize the tertiary structure. Our coarse-grained, helix-based, tertiary structure model operates with only a few degrees of freedom compared with all-atom models while preserving the ability to sample tertiary structures given a secondary structure. It strikes a balance between the precision of an all-atom tertiary structure model and the simplicity and effectiveness of a secondary structure representation. It provides a simplified tool for exploring global arrangements of helices and loops within RNA structures. We provide an example of a novel energy function relying only on the positions of stems and loops. We show that coupling our model to this energy function produces predictions as good as or better than the current state of the art tools. We propose that given the wide range of conformational space that needs to be explored, a coarse-grain approach can explore more conformations in less iterations than an all-atom model coupled to a fine-grain energy function. Finally, we emphasize the overarching theme of providing an ensemble of predicted structures, something which our tool excels at, rather than providing a handful of the lowest energy structures.  相似文献   

6.
There are several knowledge-based energy functions that can distinguish the native fold from a pool of grossly misfolded decoys for a given sequence of amino acids. These decoys, which are typically generated by mounting, or “threading”, the sequence onto the backbones of unrelated protein structures, tend to be non-compact and quite different from the native structure: the root-mean-squared (RMS) deviations from the native are commonly in the range of 15 to 20 Å. Effective energy functions should also demonstrate a similar recognition capability when presented with compact decoys that depart only slightly in conformation from the correct structure (i.e. those with RMS deviations of ∼5 Å or less). Recently, we developed a simple yet powerful method for native fold recognition based on the tendency for native folds to form hydrophobic cores. Our energy measure, which we call the hydrophobic fitness score, is challenged to recognize the native fold from 2000 near-native structures generated for each of five small monomeric proteins. First, 1000 conformations for each protein were generated by molecular dynamics simulation at room temperature. The average RMS deviation of this set of 5000 was 1.5 Å. A total of 323 decoys had energies lower than native; however, none of these had RMS deviations greater than 2 Å. Another 1000 structures were generated for each at high temperature, in which a greater range of conformational space was explored (4.3 Å average RMS deviation). Out of this set, only seven decoys were misrecognized. The hydrophobic fitness energy of a conformation is strongly dependent upon the RMS deviation. On average our potential yields energy values which are lowest for the population of structures generated at room temperature, intermediate for those produced at high temperature and highest for those constructed by threading methods. In general, the lowest energy decoy conformations have backbones very close to native structure. The possible utility of our method for screening backbone candidates for the purpose of modelling by side-chain packing optimization is discussed.  相似文献   

7.
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.  相似文献   

8.
9.
The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.  相似文献   

10.
MOTIVATION: This paper investigates the sequence-structure specificity of a representative knowledge based energy function by applying it to threading at the level of secondary structures of proteins. Assessing the strengths and weaknesses of an energy function at this fundamental level provides more detailed and insightful information than at the tertiary structure level and the results obtained can be useful in tertiary level threading. RESULTS: We threaded each of the 293 non-redundant proteins onto the secondary structures contained in its respective native protein (host template). We also used 68 pairs of proteins with similar folds and low sequence identity. For each pair, we threaded the sequence of one protein onto the secondary structures of the other protein. The discerning power of the total energy function and its one-body, pairwise, and mutation components is studied. We then applied our energy function to a recent study which demonstrated how a designed 11-amino acid sequence can replace distinct segments (one segment is an alpha-helix, the other is a beta-sheet) of a protein without changing its fold. We conducted random mutations of the designed sequence to determine the patterns for favorable mutations. We also studied the sequence-structure specificity at the boundaries of a secondary structure. Finally, we demonstrated how to speed up tertiary level threading by filtering out alignments found to be energetically unfavorable during the secondary structure threading. AVAILABILITY: The program is available on request from the authors. CONTACT: xud@ornl.gov  相似文献   

11.
Kolodny R  Levitt M 《Biopolymers》2003,68(3):278-285
A small set of protein fragments can represent adequately all known local protein structure. This set of fragments, along with a construction scheme that assembles these fragments into structures, defines a discrete (relatively small) conformation space, which approximates protein structures accurately. We generate protein decoys by sampling geometrically valid structures from this conformation space, biased by the secondary structure prediction for the protein. Unlike other methods, secondary structure prediction is the only protein-specific information used for generating the decoys. Nevertheless, these decoys are qualitatively similar to those found by others. The method works well for all-alpha proteins, and shows promising results for alpha and beta proteins.  相似文献   

12.
MOTIVATION: Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS: We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.  相似文献   

13.
Huang SW  Hwang JK 《Proteins》2005,59(4):802-809
A complete protein sequence can usually determine a unique conformation; however, the situation is different for shorter subsequences--some of them are able to adopt unique conformations, independent of context; while others assume diverse conformations in different contexts. The conformations of subsequences are determined by the interplay between local and nonlocal interactions. A quantitative measure of such structural conservation or variability will be useful in the understanding of the sequence-structure relationship. In this report, we developed an approach using the support vector machine method to compute the conformational variability directly from sequences, which is referred to as the sequence structural entropy. As a practical application, we studied the relationship between sequence structural entropy and the hydrogen exchange for a set of well-studied proteins. We found that the slowest exchange cores usually comprise amino acids of the lowest sequence structural entropy. Our results indicate that structural conservation is closely related to the local structural stability. This relationship may have interesting implications in the protein folding processes, and may be useful in the study of the sequence-structure relationship.  相似文献   

14.
A novel method of parameter optimization is proposed. It makes use of large sets of decoys generated for six nonhomologous proteins with different architecture. Parameter optimization is achieved by creating a free energy gap between sets of nativelike and nonnative conformations. The method is applied to optimize the parameters of a physics-based scoring function consisting of the all-atom ECEPP05 force field coupled with an implicit solvent model (a solvent-accessible surface area model). The optimized force field is able to discriminate near-native from nonnative conformations of the six training proteins when used either for local energy minimization or for short Monte Carlo simulated annealing runs after local energy minimization. The resulting force field is validated with an independent set of six nonhomologous proteins, and appears to be transferable to proteins not included in the optimization; i.e., for five out of the six test proteins, decoys with 1.7- to 4.0-Å all-heavy-atom root mean-square deviations emerge as those with the lowest energy. In addition, we examined the set of misfolded structures created by Park and Levitt using a four-state reduced model. The results from these additional calculations confirm the good discriminative ability of the optimized force field obtained with our decoy sets.  相似文献   

15.
The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.  相似文献   

16.
In CAPRI rounds 6-12, RosettaDock successfully predicted 2 of 5 unbound-unbound targets to medium accuracy. Improvement over the previous method was achieved with computational mutagenesis to select decoys that match the energetics of experimentally determined hot spots. In the case of Target 21, Orc1/Sir1, this resulted in a successful docking prediction where RosettaDock alone or with simple site constraints failed. Experimental information also helped limit the interacting region of TolB/Pal, producing a successful prediction of Target 26. In addition, we docked multiple loop conformations for Target 20, and we developed a novel flexible docking algorithm to simultaneously optimize backbone conformation and rigid-body orientation to generate a wide diversity of conformations for Target 24. Continued challenges included docking of homology targets that differ substantially from their template (sequence identity <50%) and accounting for large conformational changes upon binding. Despite a larger number of unbound-unbound and homology model binding targets, Rounds 6-12 reinforced that RosettaDock is a powerful algorithm for predicting bound complex structures, especially when combined with experimental data.  相似文献   

17.
18.
Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.  相似文献   

19.
Lee J  Kim SY  Joo K  Kim I  Lee J 《Proteins》2004,56(4):704-714
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.  相似文献   

20.
The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号