首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Commonly used RNA folding programs compute the minimum free energy structure of a sequence under the pseudoknot exclusion constraint. They are based on Zuker's algorithm which runs in time O(n(3)). Recently, it has been claimed that RNA folding can be achieved in average time O(n(2)) using a sparsification technique. A proof of quadratic time complexity was based on the assumption that computational RNA folding obeys the "polymer-zeta property". Several variants of sparse RNA folding algorithms were later developed. Here, we present our own version, which is readily applicable to existing RNA folding programs, as it is extremely simple and does not require any new data structure. We applied it to the widely used Vienna RNAfold program, to create sibRNAfold, the first public sparsified version of a standard RNA folding program. To gain a better understanding of the time complexity of sparsified RNA folding in general, we carried out a thorough run time analysis with synthetic random sequences, both in the context of energy minimization and base pairing maximization. Contrary to previous claims, the asymptotic time complexity of a sparsified RNA folding algorithm using standard energy parameters remains O(n(3)) under a wide variety of conditions. Consistent with our run-time analysis, we found that RNA folding does not obey the "polymer-zeta property" as claimed previously. Yet, a basic version of a sparsified RNA folding algorithm provides 15- to 50-fold speed gain. Surprisingly, the same sparsification technique has a different effect when applied to base pairing optimization. There, its asymptotic running time complexity appears to be either quadratic or cubic depending on the base composition. The code used in this work is available at: .  相似文献   

2.
Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.  相似文献   

3.

Background

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data.

Results

We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars.

Conclusions

The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.  相似文献   

4.
Fast evaluation of internal loops in RNA secondary structure prediction.   总被引:7,自引:0,他引:7  
MOTIVATION: Though not as abundant in known biological processes as proteins, RNA molecules serve as more than mere intermediaries between DNA and proteins. Research in the last 15 years demonstrates that RNA molecules serve in many roles, including catalysis. Furthermore, RNA secondary structure prediction based on free energy rules for stacking and loop formation remains one of the few major breakthroughs in the field of structure prediction, as minimum free energy structures and related quantities can be computed with full mathematical rigor. However, with the current energy parameters, the algorithms used hitherto suffer the disadvantage of either employing heuristics that risk (though highly unlikely) missing the optimal structure or becoming prohibitively time consuming for moderate to large sequences. RESULTS: We present a new method to evaluate internal loops utilizing currently used energy rules. This method reduces the time complexity of this part of the structure prediction from O(n4) to O(n3), thus reducing the overall complexity to O(n3). Even when the size of evaluated internal loops is bounded by k (a commonly used heuristic), the method presented has a competitive edge by reducing the time complexity of internal loop evaluation from O(k2n2) to O(kn2). The method also applies to the calculation of the equilibrium partition function. AVAILABILITY: Source code for an RNA secondary structure prediction program implementing this method is available at ftp://www.ibc.wustl.edu/pub/zuker/zuker .tar.Z  相似文献   

5.
Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2 x m2 x max(n,m)) and a space complexity of only O(n x m). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment.  相似文献   

6.
Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m 相似文献   

7.
The study of protein folding mechanisms continues to be one of the most challenging problems in computational biology. Currently, the protein folding mechanism is often characterized by calculating the free energy landscape versus various reaction coordinates, such as the fraction of native contacts, the radius of gyration, RMSD from the native structure, and so on. In this paper, we present a combinatorial pattern discovery approach toward understanding the global state changes during the folding process. This is a first step toward an unsupervised (and perhaps eventually automated) approach toward identification of global states. The approach is based on computing biclusters (or patterned clusters)-each cluster is a combination of various reaction coordinates, and its signature pattern facilitates the computation of the Z-score for the cluster. For this discovery process, we present an algorithm of time complexity c in RO((N + nm) log n), where N is the size of the output patterns and (n x m) is the size of the input with n time frames and m reaction coordinates. To date, this is the best time complexity for this problem. We next apply this to a beta-hairpin folding trajectory and demonstrate that this approach extracts crucial information about protein folding intermediate states and mechanism. We make three observations about the approach: (1) The method recovers states previously obtained by visually analyzing free energy surfaces. (2) It also succeeds in extracting meaningful patterns and structures that had been overlooked in previous works, which provides a better understanding of the folding mechanism of the beta-hairpin. These new patterns also interconnect various states in existing free energy surfaces versus different reaction coordinates. (3) The approach does not require calculating the free energy values, yet it offers an analysis comparable to, and sometimes better than, the methods that use free energy landscapes, thus validating the choice of reaction coordinates. (An abstract version of this work was presented at the 2005 Asia Pacific Bioinformatics Conference [1].).  相似文献   

8.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

9.
MOTIVATION: S-attributed grammars (a generalization of classical Context-Free grammars) provide a versatile formalism for sequence analysis which allows to express long range constraints: the RNA folding problem is a typical example of application. Efficient algorithms have been developed to solve problems expressed with these tools, which generally compute the optimal attribute of the sequence w.r.t. the grammar. However, it is often more meaningful and/or interesting from the biological point of view to consider almost optimal attributes as well as approximate sequences; we thus need more flexible and powerful algorithms able to perform these generalized analyses. RESULTS: In this paper we present a basic algorithm which, given a grammar G and a sequence omega, computes the optimal attribute for all (approximate) strings omega(') in L(G) such that d(omega, omega(')) < or = M, and whose complexity is O(n(r + 1)) in time and O(n(2)) in space (r is the maximal length of the right-hand side of any production of G). We will also give some extensions and possible improvements of this algorithm.  相似文献   

10.
We present an algorithm that calculates the optimal binding conformation and free energy of two RNA molecules, one or both oligomeric. This algorithm has applications to modeling DNA microarrays, RNA splice-site recognitions and other antisense problems. Although other recent algorithms perform the same calculation in time proportional to the sum of the lengths cubed, O((N1 + N2)3), our oligomer binding algorithm, called bindigo, scales as the product of the sequence lengths, O(N1*N2). The algorithm performs well in practice with the aid of a heuristic for large asymmetric loops. To demonstrate its speed and utility, we use bindigo to investigate the binding proclivities of U1 snRNA to mRNA donor splice sites.  相似文献   

11.
The problem of protein structure prediction in the hydrophobic-polar (HP) lattice model is the prediction of protein tertiary structure. This problem is usually referred to as the protein folding problem. This paper presents a method for the application of an enhanced hybrid search algorithm to the problem of protein folding prediction, using the three dimensional (3D) HP lattice model. The enhanced hybrid search algorithm is a combination of the particle swarm optimizer (PSO) and tabu search (TS) algorithms. Since the PSO algorithm entraps local minimum in later evolution extremely easily, we combined PSO with the TS algorithm, which has properties of global optimization. Since the technologies of crossover and mutation are applied many times to PSO and TS algorithms, so enhanced hybrid search algorithm is called the MCMPSO-TS (multiple crossover and mutation PSO-TS) algorithm. Experimental results show that the MCMPSO-TS algorithm can find the best solutions so far for the listed benchmarks, which will help comparison with any future paper approach. Moreover, real protein sequences and Fibonacci sequences are verified in the 3D HP lattice model for the first time. Compared with the previous evolutionary algorithms, the new hybrid search algorithm is novel, and can be used effectively to predict 3D protein folding structure. With continuous development and changes in amino acids sequences, the new algorithm will also make a contribution to the study of new protein sequences.  相似文献   

12.
Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.  相似文献   

13.
Predicting RNA secondary structure is often the first step to determining the structure of RNA. Prediction approaches have historically avoided searching for pseudoknots because of the extreme combinatorial and time complexity of the problem. Yet neglecting pseudoknots limits the utility of such approaches. Here, an algorithm utilizing structure mapping and thermodynamics is introduced for RNA pseudoknot prediction that finds the minimum free energy and identifies information about the flexibility of the RNA. The heuristic approach takes advantage of the 5' to 3' folding direction of many biological RNA molecules and is consistent with the hierarchical folding hypothesis and the contact order model. Mapping methods are used to build and analyze the folded structure for pseudoknots and to add important 3D structural considerations. The program can predict some well known pseudoknot structures correctly. The results of this study suggest that many functional RNA sequences are optimized for proper folding. They also suggest directions we can proceed in the future to achieve even better results.  相似文献   

14.
Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F?-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.  相似文献   

15.
The prediction of RNA secondary structure including pseudoknots remains a challenge due to the intractable computation of the sequence conformation from nucleotide interactions under free energy models. Optimal algorithms often assume a restricted class for the predicted RNA structures and yet still require a high-degree polynomial time complexity, which is too expensive to use. Heuristic methods may yield time-efficient algorithms but they do not guarantee optimality of the predicted structure. This paper introduces a new and efficient algorithm for the prediction of RNA structure with pseudoknots for which the structure is not restricted. Novel prediction techniques are developed based on graph tree decomposition. In particular, based on a simplified energy model, stem overlapping relationships are defined with a graph, in which a specialized maximum independent set corresponds to the desired optimal structure. Such a graph is tree decomposable; dynamic programming over a tree decomposition of the graph leads to an efficient optimal algorithm. The final structure predictions are then based on re-ranking a list of suboptimal structures under a more comprehensive free energy model. The new algorithm is evaluated on a large number of RNA sequence sets taken from diverse resources. It demonstrates overall sensitivity and specificity that outperforms or is comparable with those of previous optimal and heuristic algorithms yet it requires significantly less time than the compared optimal algorithms. The preliminary version of this paper appeared in the proceedings of the 6th Workshop on Algorithms for Bioinformatics (WABI 2006).  相似文献   

16.
Deleterious mutation prediction in the secondary structure of RNAs   总被引:1,自引:0,他引:1       下载免费PDF全文
Barash D 《Nucleic acids research》2003,31(22):6578-6584
  相似文献   

17.
Sequential folding of a messenger RNA molecule   总被引:5,自引:0,他引:5  
The existence of a new, efficient algorithm for secondary structure prediction enables us to study the folding pattern of a messenger RNA chain. Our results indicate that successively longer RNA sequences with the same 5′-ends fold sequentially, usually keeping the stable close-range hairpin loops and rearranging the long-range stems. This path will shorten the time the messenger RNA molecule needs in order to attain its preferred structure. It can also align splicing sites in a favorable orientation before the whole molecule is synthesized.Our studies were carried out on the simian virus 40 late precursor and processed mRNA.  相似文献   

18.
19.
王金华  骆志刚  管乃洋  严繁妹  靳新  张雯 《遗传》2007,29(7):889-897
多数RNA分子的结构在进化中是高度保守的, 其中很多包含伪结。而RNA伪结的预测一直是一个棘手问题, 很多RNA 二级结构预测算法都不能预测伪结。文章提出一种基于迭代法预测带伪结RNA 二级结构的新方法。该方法在给潜在碱基对打分时综合了热力学和协变信息, 通过基于最小自由能RNA折叠算法的多次迭代选出所有的碱基对。测试结果表明: 此方法几乎能预测到所有的伪结。与其他方法相比, 敏感度接近最优, 而特异性达到最优。  相似文献   

20.
RNA molecules with novel functions have revived interest in the accurate prediction of RNA three-dimensional (3D) structure and folding dynamics. However, existing methods are inefficient in automated 3D structure prediction. Here, we report a robust computational approach for rapid folding of RNA molecules. We develop a simplified RNA model for discrete molecular dynamics (DMD) simulations, incorporating base-pairing and base-stacking interactions. We demonstrate correct folding of 150 structurally diverse RNA sequences. The majority of DMD-predicted 3D structures have <4 A deviations from experimental structures. The secondary structures corresponding to the predicted 3D structures consist of 94% native base-pair interactions. Folding thermodynamics and kinetics of tRNA(Phe), pseudoknots, and mRNA fragments in DMD simulations are in agreement with previous experimental findings. Folding of RNA molecules features transient, non-native conformations, suggesting non-hierarchical RNA folding. Our method allows rapid conformational sampling of RNA folding, with computational time increasing linearly with RNA length. We envision this approach as a promising tool for RNA structural and functional analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号