首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences.  相似文献   

2.
Motivated by recent work in parametric sequence alignment, we study the parameter space for scoring RNA folds and construct an RNA polytope. A vertex of this polytope corresponds to RNA secondary structures with common branching. We use this polytope and its normal fan to study the effect of varying three parameters in the free energy model that are not determined experimentally. Our results indicate that variation of these specific parameters does not have a dramatic effect on the structures predicted by the free energy model. We additionally map a collection of known RNA secondary structures to the RNA polytope.  相似文献   

3.
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.  相似文献   

4.
MOTIVATION: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS: In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY: Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.  相似文献   

5.
Hausmann NZ  Znosko BM 《Biochemistry》2012,51(26):5359-5368
To better elucidate RNA structure-function relationships and to improve the design of pharmaceutical agents that target specific RNA motifs, an understanding of RNA primary, secondary, and tertiary structure is necessary. The prediction of RNA secondary structure from sequence is an intermediate step in predicting RNA three-dimensional structure. RNA secondary structure is typically predicted using a nearest neighbor model based on free energy parameters. The current free energy parameters for 2 × 3 nucleotide loops are based on a 23-member data set of 2 × 3 loops and internal loops of other sizes. A database of representative RNA secondary structures was searched to identify 2 × 3 nucleotide loops that occur in nature. Seventeen of the most frequent 2 × 3 nucleotide loops in this database were studied by optical melting experiments. Fifteen of these loops melted in a two-state manner, and the associated experimental ΔG°(37,2×3) values are, on average, 0.6 and 0.7 kcal/mol different from the values predicted for these internal loops using the predictive models proposed by Lu, Turner, and Mathews [Lu, Z. J., Turner, D. H., and Mathews, D. H. (2006) Nucleic Acids Res. 34, 4912-4924] and Chen and Turner [Chen, G., and Turner, D. H. (2006) Biochemistry 45, 4025-4043], respectively. These new ΔG°(37,2×3) values can be used to update the current algorithms that predict secondary structure from sequence. To improve free energy calculations for duplexes containing 2 × 3 nucleotide loops that still do not have experimentally determined free energy contributions, an updated predictive model was derived. This new model resulted from a linear regression analysis of the data reported here combined with 31 previously studied 2 × 3 nucleotide internal loops. Most of the values for the parameters in this new predictive model are within experimental error of those of the previous models, suggesting that approximations and assumptions associated with the derivation of the previous nearest neighbor parameters were valid. The updated predictive model predicts free energies of 2 × 3 nucleotide internal loops within 0.4 kcal/mol, on average, of the experimental free energy values. Both the experimental values and the updated predictive model can be used to improve secondary structure prediction from sequence.  相似文献   

6.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.  相似文献   

7.
Computer simulation results of folding linear RNA moleculesinto secondaty structures are presented. The structure is formedby two interacting processes: the RNA molecular chain growth(beginning from an initial length, Lo), and the structuring(secondary structure sequential growth in the region of theexisting molecular chain, based on the local free energy minimizationby sequential addition of elementary substruc tures-stems).It was found that the final secondary structure formation isgreatly influenced by the ‘structuring period’ T(the ratio of the molecular chain growth rate to the structuringrate), and the direction of RNA synthesis. The computer simulationhas been performed for 219 and 906 tRNA genes from two publishedcatalogues, on the whale two-dimensional domain (T,L0) parameters,by using four known free-energy models. Minimwn stem lengthand molecular chain growth direction have been also varied Thecalculated secondary structures have been compared to the naturaltRNA structures given in the catalogues, and the region of bestcoincidence for the model parameters has been determined. Ithas been proved that, on average, >86% of the paired basesof natural tRNA structures appear in the folding simulation.  相似文献   

8.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

9.
Thermodynamic parameters for internal loops of unpaired adenosines in oligoribonucleotides have been measured by optical melting studies. Comparisons are made between helices containing symmetric and asymmetric loops. Asymmetric loops destabilize a helix more than symmetric loops. The differences in free energy between symmetric and asymmetric loops are roughly half the magnitude suggested from a study of parameters required to give accurate predictions of RNA secondary structure [Papanicolaou, C., Gouy, M., & Ninio, J. (1984) Nucleic Acids Res. 12, 31-44]. Circular dichroism spectra indicate no major structural difference between helices containing symmetric and asymmetric loops. The measured sequence dependence of internal loop stability is not consistent with approximations used in current algorithms for predicting RNA secondary structure.  相似文献   

10.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

11.
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.  相似文献   

12.
There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules. Even if the parameters of our grammar are chosen with respect to structural properties of native molecules only (and therefore, independent of molecules' free energy), we prove formulae for the expected minimum free energy and the corresponding variance as functions of the molecule's size which perfectly fit the native behavior of free energies. This gives proof for a high quality of our stochastic model making it a handy tool for further investigations. In fact, the stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of random RNA secondary structures.  相似文献   

13.
The algorithm and the program for the prediction of RNA secondary structure with pseudoknot formation have been proposed. The algorithm simulates stepwise folding by generating random structures using Monte Carlo method, followed by the selection of helices to final structure on the basis of both their probabilities of occurrence in a random structure and free energy parameters. The program versions have been tested on ribosomal RNA structures and on RNAs with pseudoknots evidenced by experimental data. It is shown that the simulation of folding during RNA synthesis improves the results. The introduction of pseudoknot formation permits to predict the pseudoknotted structures and to improve the prediction of long-range interactions. The computer program is rather fast and allows to predict the structures for long RNAs without using large memory volumes in usual personal computer.  相似文献   

14.
The mean free energy generated from the secondary structure of RNA sequences of varying length and composition has been studied by way of probability theory. The expected boundaries or maximal and minimal values of a given distribution are explored and a method for estimating error as a function of the number of shuffled sequences is also examined. For typical nucleotide sequences found in biologically active organisms, the mean free energy, free energy distributions and errors appear to be scalable in terms of a fixed set of algorithm-dependent parameters and the nucleotide composition of the particular sequence under evaluation. In addition, a general semi-analytical formula for predicting the mean free energy is proposed which, at least to first-order approximation, can be used to rapidly predict the mean free energy of any sequence length and composition of RNA. The general methodology appears to be algorithm independent. The results are expected to provide a reference point for certain types of analysis related to structure of RNA or DNA sequences and to assist in measuring the somewhat related matter of complexity in algorithm development. Some related applications are discussed.  相似文献   

15.
DsrA RNA is an 87-nucleotide regulatory non-protein-coding RNA of Escherichia coli for which two secondary structure models (I and II) have been proposed. We have compared these models by the energy calculations, which revealed that the currently accepted model II should be rejected on the basis of thermodynamics. Here we provide new results of nuclease footprinting analysis and the application of RNA technologies that have not previously been used for DsrA RNA structural studies, such as hydrolysis with RNase H, DNAzyme, hydroxyl radicals and lead. These approaches together with bioinformatics calculations provided strong arguments for a new model III. This model clearly shows that the long U-rich region between hairpins 1 and 2 is double-stranded. These findings shed new light on DsrA RNA-Hfq interactions.  相似文献   

16.
Prediction of RNA secondary structure based on helical regions distribution   总被引:5,自引:0,他引:5  
MOTIVATION: RNAs play an important role in many biological processes and knowing their structure is important in understanding their function. Due to difficulties in the experimental determination of RNA secondary structure, the methods of theoretical prediction for known sequences are often used. Although many different algorithms for such predictions have been developed, this problem has not yet been solved. It is thus necessary to develop new methods for predicting RNA secondary structure. The most-used at present is Zuker's algorithm which can be used to determine the minimum free energy secondary structure. However many RNA secondary structures verified by experiments are not consistent with the minimum free energy secondary structures. In order to solve this problem, a method used to search a group of secondary structures whose free energy is close to the global minimum free energy was developed by Zuker in 1989. When considering a group of secondary structures, if there is no experimental data, we cannot tell which one is better than the others. This case also occurs in combinatorial and heuristic methods. These two kinds of methods have several weaknesses. Here we show how the central limit theorem can be used to solve these problems. RESULTS: An algorithm for predicting RNA secondary structure based on helical regions distribution is presented, which can be used to find the most probable secondary structure for a given RNA sequence. It consists of three steps. First, list all possible helical regions. Second, according to central limit theorem, estimate the occurrence probability of every helical region based on the Monte Carlo simulation. Third, add the helical region with the biggest probability to the current structure and eliminate the helical regions incompatible with the current structure. The above processes can be repeated until no more helical regions can be added. Take the current structure as the final RNA secondary structure. In order to demonstrate the confidence of the program, a test on three RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena ribosomal RNA intervening sequence, is performed. AVAILABILITY: The program is written in Turbo Pascal 7.0. The source code is available upon request. CONTACT: Wujj@nic.bmi.ac.cn or Liwj@mail.bmi.ac.cn   相似文献   

17.
A crucial step in the determination of the three-dimensional native structures of RNA is the prediction of their secondary structures, which are stable independent of the tertiary fold. Accurate prediction of the secondary structure requires context-dependent estimates of the interaction parameters. We have exploited the growing database of natively folded RNA structures in the Protein Data Bank (PDB) to obtain stacking interaction parameters using a knowledge-based approach. Remarkably, the calculated values of the resulting statistical potentials (SPs) are in excellent agreement with the parameters determined using measurements in small oligonucleotides. We validate the SPs by predicting 74% of the base-pairs in a dataset of structures using the ViennaRNA package. Interestingly, this number is similar to that obtained using the measured thermodynamic parameters. We also tested the efficacy of the SP in predicting secondary structure by using gapless threading, which we advocate as an alternative method for rapidly predicting RNA structures. For RNA molecules with less than 700 nucleotides, about 70% of the native base-pairs are correctly predicted. As a further validation of the SPs we calculated Z-scores, which measure the relative stability of the native state with respect to a manifold of higher free energy states. The computed Z-scores agree with estimates made using calorimetric measurements for a few RNA molecules. Structural analysis was used to rationalize the success and failures of SP and experimentally determined parameters. First, from the near perfect linear relationship between the number of native base-pairs and sequence length, we show that nearly 46% of nucleotides are not in stacks. Second, by analyzing the suboptimal structures that are generated in gapless threading we show that the SPs and experimentally determined parameters are most successful in predicting stacks that end in hairpins. These results show that further improvement in secondary structure prediction requires reliable estimates of interaction parameters for loops, bulges, and stacks that do not end in hairpins.  相似文献   

18.
RNA二级结构预测系统构建   总被引:9,自引:0,他引:9  
运用下列RNA二级结构预测算法:碱基最大配对方法、Zuker极小化自由能方法、螺旋区最优堆积、螺旋区随机堆积和所有可能组合方法与基于一级螺旋区的RNA二级结构绘图技术, 构建了RNA二级结构预测系统Rnafold. 另外, 通过随机选取20个tRNA序列, 从自由能和三叶草结构两个方面比较了前4种二级结构预测算法, 并运用t检验方法分析了自由能的统计学差别. 从三叶草结构来看, 以随机堆积方法最好, 其次是螺旋区最优堆积方法和Zuker算法, 以碱基最大配对方法最差. 最后, 分析了两种极小化自由能方法之间的差别.  相似文献   

19.
Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.  相似文献   

20.
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号