首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A partition function calculation for RNA secondary structure is presented that uses a current set of nearest neighbor parameters for conformational free energy at 37 degrees C, including coaxial stacking. For a diverse database of RNA sequences, base pairs in the predicted minimum free energy structure that are predicted by the partition function to have high base pairing probability have a significantly higher positive predictive value for known base pairs. For example, the average positive predictive value, 65.8%, is increased to 91.0% when only base pairs with probability of 0.99 or above are considered. The quality of base pair predictions can also be increased by the addition of experimentally determined constraints, including enzymatic cleavage, flavin mono-nucleotide cleavage, and chemical modification. Predicted secondary structures can be color annotated to demonstrate pairs with high probability that are therefore well determined as compared to base pairs with lower probability of pairing.  相似文献   

2.

Background

The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented.

Results

TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms.

Conclusions

TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.  相似文献   

3.
Vanegas PL  Horwitz TS  Znosko BM 《Biochemistry》2012,51(11):2192-2198
Currently, several models for predicting the secondary structure of RNA exist, one of which is free energy minimization using the Nearest Neighbor Model. This model predicts the lowest-free energy secondary structure from a primary sequence by summing the free energy contributions of the Watson-Crick nearest neighbor base pair combinations and any noncanonical secondary structure motif. The Nearest Neighbor Model also assumes that the free energy of the secondary structure motif is dependent solely on the identities of the nucleotides within the motif and the motif's nearest neighbors. To test the current assumption of the Nearest Neighbor Model that the non-nearest neighbors do not affect the stability of the motif, we optically melted different stem-loop oligonucleotides to experimentally determine their thermodynamic parameters. In each of these oligonucleotides, the hairpin loop sequence and the adjacent base pairs were held constant, while the first or second non-nearest neighbors were varied. The experimental results show that the thermodynamic contributions of the hairpin loop were dependent upon the identity of the first non-nearest neighbor, while the second non-nearest neighbor had a less obvious effect. These results were then used to create an updated model for predicting the thermodynamic contributions of a hairpin loop to the overall stability of the stem-loop structure.  相似文献   

4.
We present a machine learning method (a hierarchical network of k-nearest neighbor classifiers) that uses an RNA sequence alignment in order to predict a consensus RNA secondary structure. The input to the network is the mutual information, the fraction of complementary nucleotides, and a novel consensus RNAfold secondary structure prediction of a pair of alignment columns and its nearest neighbors. Given this input, the network computes a prediction as to whether a particular pair of alignment columns corresponds to a base pair. By using a comprehensive test set of 49 RFAM alignments, the program KNetFold achieves an average Matthews correlation coefficient of 0.81. This is a significant improvement compared with the secondary structure prediction methods PFOLD and RNAalifold. By using the example of archaeal RNase P, we show that the program can also predict pseudoknot interactions.  相似文献   

5.
Lorenz WA  Clote P 《PloS one》2011,6(1):e16178
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in O(n3) time and O(n2) space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures--indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.  相似文献   

6.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

7.
Probabilities of disorder for FlgM proteins of 39 species whose optimal growth temperature ranges from 273 K (0°C) to 368 K (95°C) were predicted by a newly developed method called Sequence-based Prediction with Integrated NEural networks for Disorder (SPINE-D). We showed that the temperature-dependent behavior of FlgM proteins could be separated into two subgroups according to their sequence lengths. Only shorter sequences evolved to adapt to high temperatures (>318 K or 45°C). Their ability to adapt to high temperatures was achieved through a transition from a fully disordered state with little secondary structure to a semidisordered state with high predicted helical probability at the N-terminal region. The predicted results are consistent with available experimental data. An analysis of all orthologous protein families in 39 species suggests that such a transition from a fully disordered state to semidisordered and/or ordered states is one of the strategies employed by nature for adaptation to high temperatures.  相似文献   

8.
Results from optical melting studies of Watson–Crick complementary heteroduplexes formed between 2′-O-methyl RNA and RNA oligonucleotides are used to determine nearest neighbor thermodynamic parameters for predicting the stabilities of such duplexes. The results are consistent with the physical model assumed by the individual nearest neighbor-hydrogen bonding model, which contains terms for helix initiation, base pair stacking and base pair composition. The sequence dependence is similar to that for Watson–Crick complementary RNA/RNA duplexes, which suggests that the sequence dependence may also be similar to that for other backbones that favor A-form RNA conformations.  相似文献   

9.
RNA folding free energy change parameters are widely used to predict RNA secondary structure and to design RNA sequences. These parameters include terms for the folding free energies of helices and loops. Although the full set of parameters has only been traditionally available for the four common bases and backbone, it is well known that covalent modifications of nucleotides are widespread in natural RNAs. Covalent modifications are also widely used in engineered sequences. We recently derived a full set of nearest neighbor terms for RNA that includes N6-methyladenosine (m6A). In this work, we test the model using 98 optical melting experiments, matching duplexes with or without N6-methylation of A. Most experiments place RRACH, the consensus site of N6-methylation, in a variety of contexts, including helices, bulge loops, internal loops, dangling ends, and terminal mismatches. For matched sets of experiments that include either A or m6A in the same context, we find that the parameters for m6A are as accurate as those for A. Across all experiments, the root mean squared deviation between estimated and experimental free energy changes is 0.67 kcal/mol. We used the new experimental data to refine the set of nearest neighbor parameter terms for m6A. These parameters enable prediction of RNA secondary structures including m6A, which can be used to model how N6-methylation of A affects RNA structure.  相似文献   

10.
Hausmann NZ  Znosko BM 《Biochemistry》2012,51(26):5359-5368
To better elucidate RNA structure-function relationships and to improve the design of pharmaceutical agents that target specific RNA motifs, an understanding of RNA primary, secondary, and tertiary structure is necessary. The prediction of RNA secondary structure from sequence is an intermediate step in predicting RNA three-dimensional structure. RNA secondary structure is typically predicted using a nearest neighbor model based on free energy parameters. The current free energy parameters for 2 × 3 nucleotide loops are based on a 23-member data set of 2 × 3 loops and internal loops of other sizes. A database of representative RNA secondary structures was searched to identify 2 × 3 nucleotide loops that occur in nature. Seventeen of the most frequent 2 × 3 nucleotide loops in this database were studied by optical melting experiments. Fifteen of these loops melted in a two-state manner, and the associated experimental ΔG°(37,2×3) values are, on average, 0.6 and 0.7 kcal/mol different from the values predicted for these internal loops using the predictive models proposed by Lu, Turner, and Mathews [Lu, Z. J., Turner, D. H., and Mathews, D. H. (2006) Nucleic Acids Res. 34, 4912-4924] and Chen and Turner [Chen, G., and Turner, D. H. (2006) Biochemistry 45, 4025-4043], respectively. These new ΔG°(37,2×3) values can be used to update the current algorithms that predict secondary structure from sequence. To improve free energy calculations for duplexes containing 2 × 3 nucleotide loops that still do not have experimentally determined free energy contributions, an updated predictive model was derived. This new model resulted from a linear regression analysis of the data reported here combined with 31 previously studied 2 × 3 nucleotide internal loops. Most of the values for the parameters in this new predictive model are within experimental error of those of the previous models, suggesting that approximations and assumptions associated with the derivation of the previous nearest neighbor parameters were valid. The updated predictive model predicts free energies of 2 × 3 nucleotide internal loops within 0.4 kcal/mol, on average, of the experimental free energy values. Both the experimental values and the updated predictive model can be used to improve secondary structure prediction from sequence.  相似文献   

11.
RNA is known to be involved in several cellular processes; however, it is only active when it is folded into its correct 3D conformation. The folding, bending and twisting of an RNA molecule is dependent upon the multitude of canonical and non-canonical secondary structure motifs. These motifs contribute to the structural complexity of RNA but also serve important integral biological functions, such as serving as recognition and binding sites for other biomolecules or small ligands. One of the most prevalent types of RNA secondary structure motifs are single mismatches, which occur when two canonical pairs are separated by a single non-canonical pair. To determine sequence–structure relationships and to identify structural patterns, we have systematically located, annotated and compared all available occurrences of the 30 most frequently occurring single mismatch-nearest neighbor sequence combinations found in experimentally determined 3D structures of RNA-containing molecules deposited into the Protein Data Bank. Hydrogen bonding, stacking and interaction of nucleotide edges for the mismatched and nearest neighbor base pairs are described and compared, allowing for the identification of several structural patterns. Such a database and comparison will allow researchers to gain insight into the structural features of unstudied sequences and to quickly look-up studied sequences.  相似文献   

12.
Badhwar J  Karri S  Cass CK  Wunderlich EL  Znosko BM 《Biochemistry》2007,46(50):14715-14724
Thermodynamic data for RNA 1 x 2 nucleotide internal loops are lacking. Thermodynamic data that are available for 1 x 2 loops, however, are for loops that rarely occur in nature. In order to identify the most frequently occurring 1 x 2 nucleotide internal loops, a database of 955 RNA secondary structures was compiled and searched. Twenty-four RNA duplexes containing the most common 1 x 2 nucleotide loops were optically melted, and the thermodynamic parameters DeltaH degrees , DeltaS degrees , DeltaG degrees 37, and TM for each duplex were determined. This data set more than doubles the number of 1 x 2 nucleotide loops previously studied. A table of experimental free energy contributions for frequently occurring 1 x 2 nucleotide loops (as opposed to a predictive model) is likely to result in better prediction of RNA secondary structure from sequence. In order to improve free energy calculations for duplexes containing 1 x 2 nucleotide loops that do not have experimental free energy contributions, the data collected here were combined with data from 21 previously studied 1 x 2 loops. Using linear regression, the entire dataset was used to derive nearest neighbor parameters that can be used to predict the thermodynamics of previously unmeasured 1 x 2 nucleotide loops. The DeltaG degrees 37,loop and DeltaH degrees loop nearest neighbor parameters derived here were compared to values that were published previously for 1 x 2 nucleotide loops but were derived from either a significantly smaller dataset of 1 x 2 nucleotide loops or from internal loops of various sizes [Lu, Z. J., Turner, D. H., and Mathews, D. H. (2006) Nucleic Acids Res. 34, 4912-4924]. Most of these values were found to be within experimental error, suggesting that previous approximations and assumptions associated with the derivation of those nearest neighbor parameters were valid. DeltaS degrees loop nearest neighbor parameters are also reported for 1 x 2 nucleotide loops. Both the experimental thermodynamics and the nearest neighbor parameters reported here can be used to improve secondary structure prediction from sequence.  相似文献   

13.
RNA hairpin loop stability depends on closing base pair.   总被引:7,自引:4,他引:3       下载免费PDF全文
Thermodynamic parameters are reported for hairpin formation in 1 M NaCl by RNA sequences of the type GGXAUAAUAYCC, where X and Y are CG, GC, AU, UA, GU, or UG. A nearest neighbor analysis of the data indicates the free energy change for loop formation at 37 degrees C, delta degrees Gl,37, averages 3.4 kcal/mol for hairpin loops closed with C.G, G.C, and G.U pairs. In contrast, delta G degree l,37 averages 4.6 kcal/mol for loops closed with A.U, U.A, or U.G pairs. Thus the stability of an RNA hairpin depends on the closing base pair. The hairpin with a GA mismatch that is formed by GGCGUAAUAGCC is more stable than the corresponding hairpin with an AA mismatch. Thus hairpin stability also depends on loop sequence. These effects are not included in current algorithms for prediction of RNA structure from sequence.  相似文献   

14.
Thermodynamic parameters are reported for duplex formation of 40 self-complementary RNA duplexes containing wobble terminal base pairs with all possible 3′ single and double-nucleotide overhangs, mimicking the structures of short interfering RNAs (siRNA) and microRNAs (miRNA). Based on nearest neighbor analysis, the addition of a single 3′ dangling nucleotide increases the stability of duplex formation up to 1 kcal/mol in a sequence-dependent manner. The addition of a second dangling nucleotide increases the stability of duplexes closed with wobble base pairs in an idiosyncratic manner. The results allow for the development of a nearest neighbor model, which improves the predication of free energy and melting temperature for duplexes closed by wobble base pairs with 3′ single or double-nucleotide overhangs. Phylogenetic analysis of naturally occurring miRNAs was performed. Selection of the effector miR strand of the mature miRNA duplex appears to be dependent on the orientation of the GU closing base pair rather than the identity of the 3′ double-nucleotide overhang. Thermodynamic parameters for the 5′ single terminal overhangs adjacent to wobble closing base pairs are also presented.  相似文献   

15.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

16.
Several cellular processes involve alignment of three nucleic acids strands, in which the third strand (DNA or RNA) is identical and in a parallel orientation to one of the DNA duplex strands. Earlier, using 2-aminopurine as a fluorescent reporter base, we demonstrated that a self-folding oligonucleotide forms a recombination-like structure consistent with the R-triplex. Here, we extended this approach, placing the reporter 2-aminopurine either in the 5′- or 3′-strand. We obtained direct evidence that the 3′-strand forms a stable duplex with the complementary central strand, while the 5′-strand participates in non-Watson–Crick interactions. Substituting 2,6-diaminopurine or 7-deazaadenine for adenine, we tested and confirmed the proposed hydrogen bonding scheme of the A*(T·A) R-type triplet. The adenine substitutions expected to provide additional H-bonds led to triplex structures with increased stability, whereas the substitutions consistent with a decrease in the number of H-bonds destabilized the triplex. The triplex formation enthalpies and free energies exhibited linear dependences on the number of H-bonds predicted from the A*(T·A) triplet scheme. The enthalpy of the 10 nt long intramolecular triplex of −100 kJ·mol−1 demonstrates that the R-triplex is relatively unstable and thus an ideal candidate for a transient intermediate in homologous recombination, t-loop formation at the mammalian telomere ends, and short RNA invasion into a duplex. On the other hand, the impact of a single H-bond, 18 kJ·mol−1, is high compared with the overall triplex formation enthalpy. The observed energy advantage of a ‘correct’ base in the third strand opposite the Watson–Crick base pair may be a powerful mechanism for securing selectivity of recognition between the single strand and the duplex.  相似文献   

17.
RNA二级结构预测系统构建   总被引:9,自引:0,他引:9  
运用下列RNA二级结构预测算法:碱基最大配对方法、Zuker极小化自由能方法、螺旋区最优堆积、螺旋区随机堆积和所有可能组合方法与基于一级螺旋区的RNA二级结构绘图技术, 构建了RNA二级结构预测系统Rnafold. 另外, 通过随机选取20个tRNA序列, 从自由能和三叶草结构两个方面比较了前4种二级结构预测算法, 并运用t检验方法分析了自由能的统计学差别. 从三叶草结构来看, 以随机堆积方法最好, 其次是螺旋区最优堆积方法和Zuker算法, 以碱基最大配对方法最差. 最后, 分析了两种极小化自由能方法之间的差别.  相似文献   

18.
Davis AR  Znosko BM 《Biochemistry》2008,47(38):10178-10187
Due to their prevalence and roles in biological systems, single mismatches adjacent to G-U pairs are important RNA structural elements. Since there are only limited experimental values for the stability of single mismatches adjacent to G-U pairs, current algorithms using free energy minimization to predict RNA secondary structure from sequence assign predicted thermodynamic values to these types of single mismatches. Here, thermodynamic data are reported for frequently occurring single mismatches adjacent to at least one G-U pair. This experimental data can be used in place of predicted thermodynamic values in algorithms that predict secondary structure from sequence using free energy minimization. When predicting the thermodynamic contributions of previously unmeasured single mismatches, most algorithms apply the same thermodynamic penalty for an A-U pair adjacent to a single mismatch and a G-U pair adjacent to a single mismatch. A recent study, however, suggests that the penalty for a G-U pair adjacent to a tandem mismatch should be 1.2 +/- 0.1 kcal/mol, and the penalty for an A-U pair adjacent to a tandem mismatch should be 0.5 +/- 0.2 kcal/mol [Christiansen, M. E. and Znosko, B. M. (2008) Biochemistry 47, 4329-4336]. Therefore, the data reported here are combined with the existing thermodynamic dataset of single mismatches, and nearest neighbor parameters are derived for an A-U pair adjacent to a single mismatch (1.1 +/- 0.1 kcal/mol) and a G-U pair adjacent to a single mismatch (1.4 +/- 0.1 kcal/mol).  相似文献   

19.
Zhu J  Wartell RM 《Biochemistry》1999,38(48):15986-15993
Forty-eight RNA duplexes were constructed that contained all common single base bulges at six different locations. The stabilities of the RNAs were determined by temperature gradient gel electrophoresis (TGGE). The relative stability of a single base bulge was dependent on both base identity and the nearest neighbor context. The single base bulges were placed into two categories. A bulged base with no identical neighboring base was defined as a Group I base bulge. Group II-bulged bases had at least one neighboring base identical to it. Group II bulges were generally more stable than Group I bulges in the same nearest neighbor environments. This indicates that position degeneracy of an unpaired base enhances stability. Differences in the mobility transition temperatures between the RNA fragments with bulges and the completely base-paired reference RNAs were related to free energy differences. Simple models for estimating the free energy contribution of single base bulges were evaluated from the free energy difference data. The contribution of a Group I bulge 5'-(XNZ)-3'.5'-(Z'-X')-3' where N is the unpaired base and X.X' and Z.Z' the neighboring base pairs, could be well-represented (+/-0.34 kcal/mol) by the equation, DeltaG((X)(N)()(Z))(.)((Z)(')(-)(X)(')()) = 3.11 + 0. 40DeltaG(s)()((XZ))(.)((Z)(')(X)(')()). DeltaG(s)()((XZ))(. )((Z)(')(X)(')()) is the stacking energy of the closing base pair doublet. By adding a constant term, delta = -0.3 kcal/mol, to the right side of the above equation, free energies of Group II bulges could also be predicted with the same accuracy. The term delta represents the stabilizing effect due to position degeneracy. A similar equation/model was applied to previous data from 32 DNA fragments with single base bulges. It predicted the free energy differences with a similar standard deviation.  相似文献   

20.
Nanoscale α-hemolysin pores can be used to analyze individual DNA or RNA molecules. Serial examination of hundreds to thousands of molecules per minute is possible using ionic current impedance as the measured property. In a recent report, we showed that a nanopore device coupled with machine learning algorithms could automatically discriminate among the four combinations of Watson–Crick base pairs and their orientations at the ends of individual DNA hairpin molecules. Here we use kinetic analysis to demonstrate that ionic current signatures caused by these hairpin molecules depend on the number of hydrogen bonds within the terminal base pair, stacking between the terminal base pair and its nearest neighbor, and 5′ versus 3′ orientation of the terminal bases independent of their nearest neighbors. This report constitutes evidence that single Watson–Crick base pairs can be identified within individual unmodified DNA hairpin molecules based on their dynamic behavior in a nanoscale pore.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号