共查询到20条相似文献,搜索用时 15 毫秒
1.
HotKnots: heuristic prediction of RNA secondary structures including pseudoknots 总被引:6,自引:1,他引:6 下载免费PDF全文
We present HotKnots, a new heuristic algorithm for the prediction of RNA secondary structures including pseudoknots. Based on the simple idea of iteratively forming stable stems, our algorithm explores many alternative secondary structures, using a free energy minimization algorithm for pseudoknot free secondary structures to identify promising candidate stems. In an empirical evaluation of the algorithm with 43 sequences taken from the Pseudobase database and from the literature on pseudoknotted structures, we found that overall, in terms of the sensitivity and specificity of predictions, HotKnots outperforms the well-known Pseudoknots algorithm of Rivas and Eddy and the NUPACK algorithm of Dirks and Pierce, both based on dynamic programming approaches for limited classes of pseudoknotted structures. It also outperforms the heuristic Iterated Loop Matching algorithm of Ruan and colleagues, and in many cases gives better results than the genetic algorithm from the STAR package of van Batenburg and colleagues and the recent pknotsRG-mfe algorithm of Reeder and Giegerich. The HotKnots algorithm has been implemented in C/C++ and is available from http://www.cs.ubc.ca/labs/beta/Software/HotKnots. 相似文献
2.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold. 相似文献
3.
An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots 总被引:14,自引:0,他引:14
MOTIVATION: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases. RESULTS: Here we present an algorithm, iterated loop matching, for reliably and efficiently predicting RNA secondary structures including pseudoknots. The method can utilize either thermodynamic or comparative information or both, thus is able to predict pseudoknots for both aligned and individual sequences. We have tested the algorithm on a number of RNA families. Using 8-12 homologous sequences, the algorithm correctly identifies more than 90% of base-pairs for short sequences and 80% overall. It correctly predicts nearly all pseudoknots and produces very few spurious base-pairs for sequences without pseudoknots. Comparisons show that our algorithm is both more sensitive and more specific than the maximum weighted matching method. In addition, our algorithm has high-prediction accuracy on individual sequences, comparable with the PKNOTS algorithm, while using much less computational resources. AVAILABILITY: The program has been implemented in ANSI C and is freely available for academic use at http://www.cse.wustl.edu/~zhang/projects/rna/ilm/ Supplementary information: http://www.cse.wustl.edu/~zhang/projects/rna/ilm/ 相似文献
4.
The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a
stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some
RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free
grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA
secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according
to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context,
because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples
with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold. 相似文献
5.
Witwer C Hofacker IL Stadler PF 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(2):66-77
Most functional RNA molecules have characteristic structures that are highly conserved in evolution. Many of them contain pseudoknots. Here, we present a method for computing the consensus structures including pseudoknots based on alignments of a few sequences. The algorithm combines thermodynamic and covariation information to assign scores to all possible base pairs, the base pairs are chosen with the help of the maximum weighted matching algorithm. We applied our algorithm to a number of different types of RNA known to contain pseudoknots. All pseudoknots were predicted correctly and more than 85 percent of the base pairs were identified. 相似文献
6.
It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N2) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at: http://rna.urmc.rochester.edu/RNAstructure.html. 相似文献
7.
Peter Clote Stefan Dobrev Ivan Dotu Evangelos Kranakis Danny Krizanc Jorge Urrutia 《Journal of mathematical biology》2012,65(6-7):1337-1357
Let ${\mathcal {S}}$ denote the set of (possibly noncanonical) base pairs {i, j} of an RNA tertiary structure; i.e. ${\{i, j\} \in \mathcal {S}}$ if there is a hydrogen bond between the ith and jth nucleotide. The page number of ${\mathcal {S}}$ , denoted ${\pi(\mathcal {S})}$ , is the minimum number k such that ${\mathcal {S}}$ can be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ${\omega(\mathcal {S}) \leq \pi(\mathcal {S}) \leq \omega(\mathcal {S}) \cdot \log n}$ , where the clique number of ${\mathcal {S}, \omega(\mathcal {S})}$ , denotes the maximum number of base pairs that pairwise cross each other. 相似文献
8.
9.
10.
11.
We present TT2NE, a new algorithm to predict RNA secondary structures with pseudoknots. The method is based on a classification of RNA structures according to their topological genus. TT2NE is guaranteed to find the minimum free energy structure regardless of pseudoknot topology. This unique proficiency is obtained at the expense of the maximum length of sequences that can be treated, but comparison with state-of-the-art algorithms shows that TT2NE significantly improves the quality of predictions. Analysis of TT2NE's incorrect predictions sheds light on the need to study how sterical constraints limit the range of pseudoknotted structures that can be formed from a given sequence. An implementation of TT2NE on a public server can be found at http://ipht.cea.fr/rna/tt2ne.php. 相似文献
12.
13.
RNA二级结构的预测算法研究已有近40年的发展历程,研究假结也将近30年的历史。在此期间,RNA二级结构的预测算法取得了很大进步,但假结预测的正确率依然偏低。其中启发式算法能较好地处理复杂假结,使其成为率先解决假结预测难题可能性最大的算法。迄今为止,未见系统地专门总结预测假结的各种启发式算法及其优点与缺点的报道。本文详细介绍了近年来国际上流行的贪婪算法、遗传算法、ILM算法、HotKnots算法以及FlexStem算法等五种算法,并总结分析了每种算法的优点与不足,最后提出在未来一段时期内,利用启发式算法提高假结预测准确度应从建立更完善的假结模型、加入更多影响因素、借鉴不同算法的优势等方面入手。为含假结RNA二级结构预测的研究提供参考。 相似文献
14.
RNA secondary structures and their prediction 总被引:1,自引:0,他引:1
This is a review of past and present attempts to predict the secondary structure of ribonucleic acids (RNAs) through mathematical
and computer methods. Related areas covering classification, enumeration and graphical representations of structures are also
covered. Various general prediction techniques are discussed, especially the use of thermodynamic criteria to construct an
optimal structure. The emphasis in this approach is on the use of dynamic programming algorithms to minimize free energy.
One such algorithm is introduced which comprises existing ones as special cases.
Issued as NRCC No. 23684. 相似文献
15.
16.
Samuel Ieong Ming-Yang Kao Tak-Wah Lam Wing-Kin Sung Siu-Ming Yiu 《Journal of computational biology》2003,10(6):981-995
The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can handle only limited types of pseudoknots. In this paper, we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. For an RNA sequence of n bases, the approximation algorithm for planar secondary structures runs in O(n(3)) time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions. 相似文献
17.
P E Auron W P Rindone C P Vary J J Celentano J N Vournakis 《Nucleic acids research》1982,10(1):403-419
A brief survey of computer algorithms that have been developed to generate predictions of the secondary structures of RNA molecules is presented. Two particular methods are described in some detail. The first utilizes a thermodynamic energy minimization algorithm that takes into account the likelihood that short-range folding tends to be favored over long-range interactions. The second utilizes an interactive computer graphic modelling algorithm that enables the user to consider thermodynamic criteria as well as structural data obtained by nuclease susceptibility, chemical reactivity and phylogenetic studies. Examples of structures for prokaryotic 16S and 23S ribosomal RNAs, several eukaryotic 5S ribosomal RNAs and rabbit beta-globin messenger RNA are presented as case studies in order to describe the two techniques. Anm argument is made for integrating the two approaches presented in this paper, enabling the user to generate proposed structures using thermodynamic criteria, allowing interactive refinement of these structures through the application of experimentally derived data. 相似文献
18.
In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%. 相似文献
19.
Mironov AA 《Molekuliarnaia biologiia》2007,41(4):711-718
The RNA secondary structure prediction is a classical problem in bioinformatics. The most efficient approach to this problem is based on the idea of a comparative analysis. In this approach the algorithms utilize multiple alignment of the RNA sequences and find common RNA structure. This paper describes a new algorithm for this task. This algorithm does not require predefined multiple alignment. The main idea of the algorithm is based on MEME-like iterative searching of abstract profile on different levels. On the first level the algorithm searches the common blocks in the RNA sequences and creates chain of this blocks. On the next step the algorithm refines the chain of common blocks. On the last stage the algorithm searches sets of common helices that have consistent locations relative to common blocks. The algorithm was tested on sets of tRNA with a subset of junk sequences and on RFN riboswitches. The algorithm is implemented as a web server (http://bioinf.fbb.msu.ru/RNAAlign/). 相似文献
20.
PseudoBase is a database containing structural, functional and sequence data related to RNA pseudo-knots. It can be reached at http://wwwbio.Leiden Univ.nl/ approximately Batenburg/PKB.html. This page will direct the user to a retrieval page from where a particular pseudoknot can be chosen, or to a submission page which enables the user to add pseudoknot information to the database or to an informative page that elaborates on the various aspects of the database. For each pseudoknot, 12 items are stored, e.g. the nucleotides of the region that contains the pseudoknot, the stem positions of the pseudoknot, the EMBL accession number of the sequence that contains this pseudoknot and the support that can be given regarding the reliability of the pseudoknot. Access is via a small number of steps, using 16 different categories. The development process was done by applying the evolutionary methodology for software development rather than by applying the methodology of the classical waterfall model or the more modern spiral model. 相似文献