首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules. Even if the parameters of our grammar are chosen with respect to structural properties of native molecules only (and therefore, independent of molecules' free energy), we prove formulae for the expected minimum free energy and the corresponding variance as functions of the molecule's size which perfectly fit the native behavior of free energies. This gives proof for a high quality of our stochastic model making it a handy tool for further investigations. In fact, the stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of random RNA secondary structures.  相似文献   

2.
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner''s energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner''s model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner''s rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.  相似文献   

3.
Hausmann NZ  Znosko BM 《Biochemistry》2012,51(26):5359-5368
To better elucidate RNA structure-function relationships and to improve the design of pharmaceutical agents that target specific RNA motifs, an understanding of RNA primary, secondary, and tertiary structure is necessary. The prediction of RNA secondary structure from sequence is an intermediate step in predicting RNA three-dimensional structure. RNA secondary structure is typically predicted using a nearest neighbor model based on free energy parameters. The current free energy parameters for 2 × 3 nucleotide loops are based on a 23-member data set of 2 × 3 loops and internal loops of other sizes. A database of representative RNA secondary structures was searched to identify 2 × 3 nucleotide loops that occur in nature. Seventeen of the most frequent 2 × 3 nucleotide loops in this database were studied by optical melting experiments. Fifteen of these loops melted in a two-state manner, and the associated experimental ΔG°(37,2×3) values are, on average, 0.6 and 0.7 kcal/mol different from the values predicted for these internal loops using the predictive models proposed by Lu, Turner, and Mathews [Lu, Z. J., Turner, D. H., and Mathews, D. H. (2006) Nucleic Acids Res. 34, 4912-4924] and Chen and Turner [Chen, G., and Turner, D. H. (2006) Biochemistry 45, 4025-4043], respectively. These new ΔG°(37,2×3) values can be used to update the current algorithms that predict secondary structure from sequence. To improve free energy calculations for duplexes containing 2 × 3 nucleotide loops that still do not have experimentally determined free energy contributions, an updated predictive model was derived. This new model resulted from a linear regression analysis of the data reported here combined with 31 previously studied 2 × 3 nucleotide internal loops. Most of the values for the parameters in this new predictive model are within experimental error of those of the previous models, suggesting that approximations and assumptions associated with the derivation of the previous nearest neighbor parameters were valid. The updated predictive model predicts free energies of 2 × 3 nucleotide internal loops within 0.4 kcal/mol, on average, of the experimental free energy values. Both the experimental values and the updated predictive model can be used to improve secondary structure prediction from sequence.  相似文献   

4.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

5.
We here present a dynamic programming algorithm which is capable of calculating arbitrary moments of the Boltzmann distribution for RNA secondary structures. We have implemented the algorithm in a program called RNA-VARIANCE and investigate the difference between the Boltzmann distribution of biological and random RNA sequences. We find that the minimum free energy structure of biological sequences has a higher probability in the Boltzmann distribution than random sequences. Moreover, we show that the free energies of biological sequences have a smaller variance than random sequences and that the minimum free energy of biological sequences is closer to the expected free energy of the rest of the structures than that of random sequences. These results suggest that biologically functional RNA sequences not only require a thermodynamically stable minimum free energy structure, but also an ensemble of structures whose free energies are close to the minimum free energy.  相似文献   

6.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

7.
MOTIVATION: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. RESULTS: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods. AVAILABILITY: Our CG implementation is available at http://www.rnasoft.ca/CG/.  相似文献   

8.
MOTIVATION: A k-point mutant of a given RNA sequence s = s(1), ..., s(n) is an RNA sequence s' = s'(1),..., s'(n) obtained by mutating exactly k-positions in s; i.e. Hamming distance between s and s' equals k. To understand the effect of pointwise mutation in RNA, we consider the distribution of energies of all secondary structures of k-point mutants of a given RNA sequence. RESULTS: Here we describe a novel algorithm to compute the mean and standard deviation of energies of all secondary structures of k-point mutants of a given RNA sequence. We then focus on the tail of the energy distribution and compute, using the algorithm AMSAG, the k-superoptimal structure; i.e. the secondary structure of a < or =k-point mutant having least free energy over all secondary structures of all k'-point mutants of a given RNA sequence, for k' < or = k. Evidence is presented that the k-superoptimal secondary structure is often closer, as measured by base pair distance and two additional distance measures, to the secondary structure derived by comparative sequence analysis than that derived by the Zuker minimum free energy structure of the original (wild type or unmutated) RNA.  相似文献   

9.
RNA folding free energy change parameters are widely used to predict RNA secondary structure and to design RNA sequences. These parameters include terms for the folding free energies of helices and loops. Although the full set of parameters has only been traditionally available for the four common bases and backbone, it is well known that covalent modifications of nucleotides are widespread in natural RNAs. Covalent modifications are also widely used in engineered sequences. We recently derived a full set of nearest neighbor terms for RNA that includes N6-methyladenosine (m6A). In this work, we test the model using 98 optical melting experiments, matching duplexes with or without N6-methylation of A. Most experiments place RRACH, the consensus site of N6-methylation, in a variety of contexts, including helices, bulge loops, internal loops, dangling ends, and terminal mismatches. For matched sets of experiments that include either A or m6A in the same context, we find that the parameters for m6A are as accurate as those for A. Across all experiments, the root mean squared deviation between estimated and experimental free energy changes is 0.67 kcal/mol. We used the new experimental data to refine the set of nearest neighbor parameter terms for m6A. These parameters enable prediction of RNA secondary structures including m6A, which can be used to model how N6-methylation of A affects RNA structure.  相似文献   

10.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

11.
A crucial step in the determination of the three-dimensional native structures of RNA is the prediction of their secondary structures, which are stable independent of the tertiary fold. Accurate prediction of the secondary structure requires context-dependent estimates of the interaction parameters. We have exploited the growing database of natively folded RNA structures in the Protein Data Bank (PDB) to obtain stacking interaction parameters using a knowledge-based approach. Remarkably, the calculated values of the resulting statistical potentials (SPs) are in excellent agreement with the parameters determined using measurements in small oligonucleotides. We validate the SPs by predicting 74% of the base-pairs in a dataset of structures using the ViennaRNA package. Interestingly, this number is similar to that obtained using the measured thermodynamic parameters. We also tested the efficacy of the SP in predicting secondary structure by using gapless threading, which we advocate as an alternative method for rapidly predicting RNA structures. For RNA molecules with less than 700 nucleotides, about 70% of the native base-pairs are correctly predicted. As a further validation of the SPs we calculated Z-scores, which measure the relative stability of the native state with respect to a manifold of higher free energy states. The computed Z-scores agree with estimates made using calorimetric measurements for a few RNA molecules. Structural analysis was used to rationalize the success and failures of SP and experimentally determined parameters. First, from the near perfect linear relationship between the number of native base-pairs and sequence length, we show that nearly 46% of nucleotides are not in stacks. Second, by analyzing the suboptimal structures that are generated in gapless threading we show that the SPs and experimentally determined parameters are most successful in predicting stacks that end in hairpins. These results show that further improvement in secondary structure prediction requires reliable estimates of interaction parameters for loops, bulges, and stacks that do not end in hairpins.  相似文献   

12.
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.  相似文献   

13.
RNA pseudoknot prediction in energy-based models.   总被引:11,自引:0,他引:11  
RNA molecules are sequences of nucleotides that serve as more than mere intermediaries between DNA and proteins, e.g., as catalytic molecules. Computational prediction of RNA secondary structure is among the few structure prediction problems that can be solved satisfactorily in polynomial time. Most work has been done to predict structures that do not contain pseudoknots. Allowing pseudoknots introduces modeling and computational problems. In this paper we consider the problem of predicting RNA secondary structures with pseudoknots based on free energy minimization. We first give a brief comparison of energy-based methods for predicting RNA secondary structures with pseudoknots. We then prove that the general problem of predicting RNA secondary structures containing pseudoknots is NP complete for a large class of reasonable models of pseudoknots.  相似文献   

14.
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.  相似文献   

15.
Cao S  Chen SJ 《RNA (New York, N.Y.)》2011,17(12):2130-2143
We develop a statistical mechanical model to predict the structure and folding stability of the RNA/RNA kissing-loop complex. One of the key ingredients of the theory is the conformational entropy for the RNA/RNA kissing complex. We employ the recently developed virtual bond-based RNA folding model (Vfold model) to evaluate the entropy parameters for the different types of kissing loops. A benchmark test against experiments suggests that the entropy calculation is reliable. As an application of the model, we apply the model to investigate the structure and folding thermodynamics for the kissing complex of the HIV-1 dimerization initiation signal. With the physics-based energetic parameters, we compute the free energy landscape for the HIV-1 dimer. From the energy landscape, we identify two minimal free energy structures, which correspond to the kissing-loop dimer and the extended-duplex dimer, respectively. The results support the two-step dimerization process for the HIV-1 replication cycle. Furthermore, based on the Vfold model and energy minimization, the theory can predict the native structure as well as the local minima in the free energy landscape. The root-mean-square deviations (RMSDs) for the predicted kissing-loop dimer and extended-duplex dimer are ∼3.0 Å. The method developed here provides a new method to study the RNA/RNA kissing complex.  相似文献   

16.
We have applied the Pipas-McMahon algorithm based on free energy calculations to the search for a 5S RNA base-pair structure common to all known sequences. We find that a 'Y' shaped model is consistently among the structures having the lowest free energy using 5S RNA sequences from either eukaryotic or prokaryotic sources. Compaison of this 'Y' structure with models which have recently been proposed show these models to be remarkably similar, and the minor differences are explicable based on the technique used to obtain the model. That prokaryotic and eukaryotic 5S RNA can adopt a similar secondary structure is strong support for its resistance to change during evolution.  相似文献   

17.
S Y Le  J H Chen    J V Maizel  Jr 《Nucleic acids research》1993,21(9):2173-2178
In this paper we present a new method for predicting a set of RNA secondary structures that are thermodynamically favored in RNA folding simulations. This method uses a large number of 'simulated energy rules' (SER) generated by perturbing the free energy parameters derived experimentally within the range of the experimental errors. The structure with the lowest free energy is computed for each SER. Structural comparisons are used to avoid multiple generation of similar structures. Computed structures are evaluated using the energy distribution of the lowest free energy structures derived in the simulation. Predicted be graphically displayed with their occurring frequencies in the simulation by dot-plot representations. On average, about 90% of phylogenetic helixes in the known models of tRNA, Group I self-splicing intron, and Escherichia coli 16 S rRNA, were predicted using the method.  相似文献   

18.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.  相似文献   

19.
Nanopore translocation experiments are increasingly applied to probe the secondary structures of RNA and DNA molecules. Here, we report two vital steps toward establishing nanopore translocation as a tool for the systematic and quantitative analysis of polynucleotide folding: 1), Using α-hemolysin pores and a diverse set of different DNA hairpins, we demonstrate that backward nanopore force spectroscopy is particularly well suited for quantitative analysis. In contrast to forward translocation from the vestibule side of the pore, backward translocation times do not appear to be significantly affected by pore-DNA interactions. 2), We develop and verify experimentally a versatile mesoscopic theoretical framework for the quantitative analysis of translocation experiments with structured polynucleotides. The underlying model is based on sequence-dependent free energy landscapes constructed using the known thermodynamic parameters for polynucleotide basepairing. This approach limits the adjustable parameters to a small set of sequence-independent parameters. After parameter calibration, the theoretical model predicts the translocation dynamics of new sequences. These predictions can be leveraged to generate a baseline expectation even for more complicated structures where the assumptions underlying the one-dimensional free energy landscape may no longer be satisfied. Taken together, backward translocation through α-hemolysin pores combined with mesoscopic theoretical modeling is a promising approach for label-free single-molecule analysis of DNA and RNA folding.  相似文献   

20.
Secondary structure of messenger RNA plays an important role in the bio-synthesis of proteins. Its negative impact on translation can reduce the yield of protein by slowing or blocking the initiation and movement of ribosomes along the mRNA, becoming a major factor in the regulation of gene expression. Several algorithms can predict the formation of secondary structures by calculating the minimum free energy of RNA sequences, or perform the inverse process of obtaining an RNA sequence for a given structure. However, there is still no approach to redesign an mRNA to achieve minimal secondary structure without affecting the amino acid sequence. Here we present the first strategy to optimize mRNA secondary structures, to increase (or decrease) the minimum free energy of a nucleotide sequence, without changing its resulting polypeptide, in a time-efficient manner, through a simplistic approximation to hairpin formation. Our data show that this approach can efficiently increase the minimum free energy by >40%, strongly reducing the strength of secondary structures. Applications of this technique range from multi-objective optimization of genes by controlling minimum free energy together with CAI and other gene expression variables, to optimization of secondary structures at the genomic level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号