首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
A partition function calculation for RNA secondary structure is presented that uses a current set of nearest neighbor parameters for conformational free energy at 37 degrees C, including coaxial stacking. For a diverse database of RNA sequences, base pairs in the predicted minimum free energy structure that are predicted by the partition function to have high base pairing probability have a significantly higher positive predictive value for known base pairs. For example, the average positive predictive value, 65.8%, is increased to 91.0% when only base pairs with probability of 0.99 or above are considered. The quality of base pair predictions can also be increased by the addition of experimentally determined constraints, including enzymatic cleavage, flavin mono-nucleotide cleavage, and chemical modification. Predicted secondary structures can be color annotated to demonstrate pairs with high probability that are therefore well determined as compared to base pairs with lower probability of pairing.  相似文献   

Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules. Even if the parameters of our grammar are chosen with respect to structural properties of native molecules only (and therefore, independent of molecules' free energy), we prove formulae for the expected minimum free energy and the corresponding variance as functions of the molecule's size which perfectly fit the native behavior of free energies. This gives proof for a high quality of our stochastic model making it a handy tool for further investigations. In fact, the stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of random RNA secondary structures.  相似文献   



Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.  相似文献   

We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences.  相似文献   

This article describes the latest version of an RNA folding algorithm that predicts both optimal and suboptimal solutions based on free energy minimization. A number of RNA's with known structures deduced from comparative sequence analysis are folded to test program performance. The group of solutions obtained for each molecule is analysed to determine how many of the known helixes occur in the optimal solution and in the best suboptimal solution. In most cases, a structure about 80% correct is found with a free energy within 2% of the predicted lowest free energy structure.  相似文献   



A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1.


The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases.


Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.  相似文献   

Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.  相似文献   

A computer program is presented which determines the secondary structure of linear RNA molecules by simulating a hypothetical process of folding. This process implies the concept of 'nucleation centres', regions in RNA which locally trigger the folding. During the simulation, the RNA is allowed to fold into pseudoknotted structures, unlike all other programs predicting RNA secondary structure. The simulation uses published, experimentally determined free energy values for nearest neighbour base pair stackings and loop regions, except for new extrapolated values for loops larger than seven nucleotides. The free energy value for a loop arising from pseudoknot formation is set to a single, estimated value of 4.2 kcal/mole. Especially in the case of long RNA sequences, our program appears superior to other secondary structure predicting programs described so far, as tests on tRNAs, the LSU intron of Tetrahymena thermophila and a number of plant viral RNAs show. In addition, pseudoknotted structures are often predicted successfully. The program is written in mainframe APL and is adapted to run on IBM compatible PCs, Atari ST and Macintosh personal computers. On an 8 MHz 8088 standard PC without coprocessor, using STSC APL, it folds a sequence of 700 nucleotides in one and a half hour.  相似文献   

Prediction of RNA secondary structure based on helical regions distribution   总被引:5,自引:0,他引:5  
MOTIVATION: RNAs play an important role in many biological processes and knowing their structure is important in understanding their function. Due to difficulties in the experimental determination of RNA secondary structure, the methods of theoretical prediction for known sequences are often used. Although many different algorithms for such predictions have been developed, this problem has not yet been solved. It is thus necessary to develop new methods for predicting RNA secondary structure. The most-used at present is Zuker's algorithm which can be used to determine the minimum free energy secondary structure. However many RNA secondary structures verified by experiments are not consistent with the minimum free energy secondary structures. In order to solve this problem, a method used to search a group of secondary structures whose free energy is close to the global minimum free energy was developed by Zuker in 1989. When considering a group of secondary structures, if there is no experimental data, we cannot tell which one is better than the others. This case also occurs in combinatorial and heuristic methods. These two kinds of methods have several weaknesses. Here we show how the central limit theorem can be used to solve these problems. RESULTS: An algorithm for predicting RNA secondary structure based on helical regions distribution is presented, which can be used to find the most probable secondary structure for a given RNA sequence. It consists of three steps. First, list all possible helical regions. Second, according to central limit theorem, estimate the occurrence probability of every helical region based on the Monte Carlo simulation. Third, add the helical region with the biggest probability to the current structure and eliminate the helical regions incompatible with the current structure. The above processes can be repeated until no more helical regions can be added. Take the current structure as the final RNA secondary structure. In order to demonstrate the confidence of the program, a test on three RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena ribosomal RNA intervening sequence, is performed. AVAILABILITY: The program is written in Turbo Pascal 7.0. The source code is available upon request. CONTACT: Wujj@nic.bmi.ac.cn or Liwj@mail.bmi.ac.cn   相似文献   

顾倜  蔡磊鑫  王帅  吕强 《生物信息学》2017,15(3):142-148
假结是RNA中一种重要的结构,由于建模的困难导致它更难被预测。通过碱基之间的配对概率来预测含假结RNA二级结构的Prob Knot算法具有很高的精度,但该算法仅用了配对概率作为预测依据,导致阴性配对大量出现,因此精度中的特异性较低。实验结合Prob Knot算法中碱基配对概率模型,通过使用多目标遗传算法,从而提高预测含假结RNA二级结构的特异性,以此促进总体精度的提高。实验过程中,首先计算出每个碱基成为单链的概率,作为新增的预测依据,然后使用遗传算法对RNA二级结构进行交叉、变异和迭代,最后得到Pareto最优解,进一步得出最高的最大期望精度。实验结果表明,在使用的RNA案例中,采用该方法比现有方法精度平均提高约4%。  相似文献   

An interactive microcomputer program for the rapid computation of the free (ΔG) of the secondary structure of RNA molecules is presented. The program assigns free energies (in kcal/mol) to helices; bulge, internal, and multi-branch loops; hairpins; unparied, and G:C termini according to Ninio's rules, and displays a running total during computation. It is written in ‘Microsoft Basic’ and is apllicable to virtually any Basic system with no modification. The program enables rapid bench-top determinations of free energies attributable to secondary structure features of RNA oligomers, and eliminates both the tedium and risk of error associated with manual calculations.  相似文献   

G Vanderkooi 《Biochemistry》1991,30(44):10760-10768
Complete energy minimization was carried out on the multibilayer crystal structure of 1,2-dimyristoyl-sn-glycero-3-phosphocholine dihydrate (DMPC.2H2O), starting from the X-ray structure determination reported by Pearson and Pascher (1979) Nature 281, 499-501. The asymmetric unit contains two nonidentical DMPC molecules and four water molecules. Minimization removed the acyl chain disorder present in the X-ray structure and caused the carbon planes of the acyl chains to become mutually parallel. Two energy-minimized structures (structures I and II) were found which mainly differed in the hydrogen-bonding arrangement of the waters of hydration. In structure I as in the X-ray structure, one of the water molecules forms a hydrogen-bonded bridge between successive bilayers; but in structure II, all hydrogen bonds are satisfied on the same bilayer. Structure II corresponds to the global energy minimum and is also a suitable structure for single bilayers. The lattice constants and cell volume of the minimized structures are close to the experimental values. The electrostatic force between DMPC bilayers is attractive. The mean hydration energy of the water is -14.2 kcal/mol, which is 2.5 kcal/mol lower than the binding energy of ice.  相似文献   

Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner''s energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner''s model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner''s rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.  相似文献   

Insight into the functions and interactions of proteins may be gained by correlating a variety of types of experimental data (including kinetics, spectroscopy, biophysical measurements, among others) with three-dimensional structural models displayed and manipulated using interactive computer graphics. Although tertiary structures have been determined for a large number of proteins, one limiting factor in structure-function studies is the lack of availability of the structural coordinates of specific proteins for which other types of detailed experimental data are known. However, as the data base of known structures grows, it becomes more and more likely that the structure of a closely related protein will be available. Here we present a method for predicting structures by ( 1 ) careful alteration of a known structure of a homologous, functionally analogous protein followed by (2) energy minimization to optimize the predicted structure. This method provides a rapid and effective solution to the initial problem of obtaining a working structure for modeling studies.  相似文献   

MOTIVATION: Function derives from structure, therefore, there is need for methods to predict functional RNA structures. RESULTS: The Dynalign algorithm, which predicts the lowest free energy secondary structure common to two unaligned RNA sequences, is extended to the prediction of a set of low-energy structures. Dot plots can be drawn to show all base pairs in structures within an energy increment. Dynalign predicts more well-defined structures than structure prediction using a single sequence; in 5S rRNA sequences, the average number of base pairs in structures with energy within 20% of the lowest energy structure is 317 using Dynalign, but 569 using a single sequence. Structure prediction with Dynalign can also be constrained according to experiment or comparative analysis. The accuracy, measured as sensitivity and positive predictive value, of Dynalign is greater than predictions with a single sequence. AVAILABILITY: Dynalign can be downloaded at http://rna.urmc.rochester.edu  相似文献   

Inhibition of polyadenylation by stable RNA secondary structure.   总被引:7,自引:2,他引:5       下载免费PDF全文

An RNA secondary structure workbench   总被引:6,自引:4,他引:2  
A multiple approach to the study of RNA secondary structure is described which provides for the independent drawing of structures using base-pairing lists, for the generation of local structures in the form of hairpins, and for the generation of global structures by both Monte Carlo and dynamic programming methodologies. User-adjustable parameters provide for limiting the size of hairpin loops, bulges and inner loops, and constraints can be imposed relative to position-dependent base pairing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号