首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We present HotKnots, a new heuristic algorithm for the prediction of RNA secondary structures including pseudoknots. Based on the simple idea of iteratively forming stable stems, our algorithm explores many alternative secondary structures, using a free energy minimization algorithm for pseudoknot free secondary structures to identify promising candidate stems. In an empirical evaluation of the algorithm with 43 sequences taken from the Pseudobase database and from the literature on pseudoknotted structures, we found that overall, in terms of the sensitivity and specificity of predictions, HotKnots outperforms the well-known Pseudoknots algorithm of Rivas and Eddy and the NUPACK algorithm of Dirks and Pierce, both based on dynamic programming approaches for limited classes of pseudoknotted structures. It also outperforms the heuristic Iterated Loop Matching algorithm of Ruan and colleagues, and in many cases gives better results than the genetic algorithm from the STAR package of van Batenburg and colleagues and the recent pknotsRG-mfe algorithm of Reeder and Giegerich. The HotKnots algorithm has been implemented in C/C++ and is available from http://www.cs.ubc.ca/labs/beta/Software/HotKnots.  相似文献   

2.
顾倜  蔡磊鑫  王帅  吕强 《生物信息学》2017,15(3):142-148
假结是RNA中一种重要的结构,由于建模的困难导致它更难被预测。通过碱基之间的配对概率来预测含假结RNA二级结构的Prob Knot算法具有很高的精度,但该算法仅用了配对概率作为预测依据,导致阴性配对大量出现,因此精度中的特异性较低。实验结合Prob Knot算法中碱基配对概率模型,通过使用多目标遗传算法,从而提高预测含假结RNA二级结构的特异性,以此促进总体精度的提高。实验过程中,首先计算出每个碱基成为单链的概率,作为新增的预测依据,然后使用遗传算法对RNA二级结构进行交叉、变异和迭代,最后得到Pareto最优解,进一步得出最高的最大期望精度。实验结果表明,在使用的RNA案例中,采用该方法比现有方法精度平均提高约4%。  相似文献   

3.
It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N2) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at: http://rna.urmc.rochester.edu/RNAstructure.html.  相似文献   

4.
The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.  相似文献   

5.
Let ${\mathcal {S}}$ denote the set of (possibly noncanonical) base pairs {i, j} of an RNA tertiary structure; i.e. ${\{i, j\} \in \mathcal {S}}$ if there is a hydrogen bond between the ith and jth nucleotide. The page number of ${\mathcal {S}}$ , denoted ${\pi(\mathcal {S})}$ , is the minimum number k such that ${\mathcal {S}}$ can be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ${\omega(\mathcal {S}) \leq \pi(\mathcal {S}) \leq \omega(\mathcal {S}) \cdot \log n}$ , where the clique number of ${\mathcal {S}, \omega(\mathcal {S})}$ , denotes the maximum number of base pairs that pairwise cross each other.  相似文献   

6.
Gupta A  Rahman R  Li K  Gribskov M 《RNA biology》2012,9(2):187-199
The close relationship between RNA structure and function underlines the significance of accurately predicting RNA structures from sequence information. Structural topologies such as pseudoknots are of particular interest due to their ubiquity and direct involvement in RNA function, but identifying pseudoknots is a computationally challenging problem and existing heuristic approaches usually perform poorly for RNA sequences of even a few hundred bases. We survey the performance of pseudoknot prediction methods on a data set of full-length RNA sequences representing varied sequence lengths, and biological RNA classes such as RNase P RNA, Group I Intron, tmRNA and tRNA. Pseudoknot prediction methods are compared with minimum free energy and suboptimal secondary structure prediction methods in terms of correct base-pairs, stems and pseudoknots and we find that the ensemble of suboptimal structure predictions succeeds in identifying correct structural elements in RNA that are usually missed in MFE and pseudoknot predictions. We propose a strategy to identify a comprehensive set of non-redundant stems in the suboptimal structure space of a RNA molecule by applying heuristics that reduce the structural redundancy of the predicted suboptimal structures by merging slightly varying stems that are predicted to form in local sequence regions. This reduced-redundancy set of structural elements consistently outperforms more specialized approaches.in data sets. Thus, the suboptimal folding space can be used to represent the structural diversity of an RNA molecule more comprehensively than optimal structure prediction approaches alone.  相似文献   

7.
8.

Background  

The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process.  相似文献   

9.
MOTIVATION: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases. RESULTS: Here we present an algorithm, iterated loop matching, for reliably and efficiently predicting RNA secondary structures including pseudoknots. The method can utilize either thermodynamic or comparative information or both, thus is able to predict pseudoknots for both aligned and individual sequences. We have tested the algorithm on a number of RNA families. Using 8-12 homologous sequences, the algorithm correctly identifies more than 90% of base-pairs for short sequences and 80% overall. It correctly predicts nearly all pseudoknots and produces very few spurious base-pairs for sequences without pseudoknots. Comparisons show that our algorithm is both more sensitive and more specific than the maximum weighted matching method. In addition, our algorithm has high-prediction accuracy on individual sequences, comparable with the PKNOTS algorithm, while using much less computational resources. AVAILABILITY: The program has been implemented in ANSI C and is freely available for academic use at http://www.cse.wustl.edu/~zhang/projects/rna/ilm/ Supplementary information: http://www.cse.wustl.edu/~zhang/projects/rna/ilm/  相似文献   

10.
We present TT2NE, a new algorithm to predict RNA secondary structures with pseudoknots. The method is based on a classification of RNA structures according to their topological genus. TT2NE is guaranteed to find the minimum free energy structure regardless of pseudoknot topology. This unique proficiency is obtained at the expense of the maximum length of sequences that can be treated, but comparison with state-of-the-art algorithms shows that TT2NE significantly improves the quality of predictions. Analysis of TT2NE's incorrect predictions sheds light on the need to study how sterical constraints limit the range of pseudoknotted structures that can be formed from a given sequence. An implementation of TT2NE on a public server can be found at http://ipht.cea.fr/rna/tt2ne.php.  相似文献   

11.
MOTIVATION: RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS: We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY: The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html  相似文献   

12.
13.
The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can handle only limited types of pseudoknots. In this paper, we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. For an RNA sequence of n bases, the approximation algorithm for planar secondary structures runs in O(n(3)) time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions.  相似文献   

14.
15.
16.
MOTIVATION: Recently novel classes of functional RNAs, most prominently the miRNAs have been discovered, strongly suggesting that further types of functional RNAs are still hidden in the recently completed genomic DNA sequences. Only few techniques are known, however, to survey genomes for such RNA genes. When sufficiently similar sequences are not available for comparative approaches the only known remedy is to search directly for structural features. RESULTS: We present here efficient algorithms for computing locally stable RNA structures at genome-wide scales. Both the minimum energy structure and the complete matrix of base pairing probabilities can be computed in theta(N x L2) time and theta(N + L2) memory in terms of the length N of the genome and the size L of the largest secondary structure motifs of interest. In practice, the 100 Mb of the complete genome of Caenorhabditis elegans can be folded within about half a day on a modern PC with a search depth of L = 100. This is sufficient example for a survey for miRNAs. AVAILABILITY: The software described in this contribution will be available for download at http://www.tbi.univie.ac.at/~ivo/RNA/ as part of the Vienna RNA Package.  相似文献   

17.
A method for assessing the preserved stem - loops of RNA secondarystructures is presented. Frequently recurring helical stemsin a set of secondary structures resulting from the simulatedfolding process of a given RNA are assessed and consensus structuralmotifs can then be selected to construct a secondary structureof the RNA. Alternatively, it can be applied to a series of‘optimal’ and ‘suboptimal’ secondarystructures computed using the dynamic program developed by Williamsand Tinoco. To demonstrate the power and the usefulness of theprogram we give examples of this procedure. Received on October 28, 1987; accepted on April 2, 1989  相似文献   

18.
王金华  骆志刚  管乃洋  严繁妹  靳新  张雯 《遗传》2007,29(7):889-897
多数RNA分子的结构在进化中是高度保守的, 其中很多包含伪结。而RNA伪结的预测一直是一个棘手问题, 很多RNA 二级结构预测算法都不能预测伪结。文章提出一种基于迭代法预测带伪结RNA 二级结构的新方法。该方法在给潜在碱基对打分时综合了热力学和协变信息, 通过基于最小自由能RNA折叠算法的多次迭代选出所有的碱基对。测试结果表明: 此方法几乎能预测到所有的伪结。与其他方法相比, 敏感度接近最优, 而特异性达到最优。  相似文献   

19.
A computer program is presented which determines the secondary structure of linear RNA molecules by simulating a hypothetical process of folding. This process implies the concept of 'nucleation centres', regions in RNA which locally trigger the folding. During the simulation, the RNA is allowed to fold into pseudoknotted structures, unlike all other programs predicting RNA secondary structure. The simulation uses published, experimentally determined free energy values for nearest neighbour base pair stackings and loop regions, except for new extrapolated values for loops larger than seven nucleotides. The free energy value for a loop arising from pseudoknot formation is set to a single, estimated value of 4.2 kcal/mole. Especially in the case of long RNA sequences, our program appears superior to other secondary structure predicting programs described so far, as tests on tRNAs, the LSU intron of Tetrahymena thermophila and a number of plant viral RNAs show. In addition, pseudoknotted structures are often predicted successfully. The program is written in mainframe APL and is adapted to run on IBM compatible PCs, Atari ST and Macintosh personal computers. On an 8 MHz 8088 standard PC without coprocessor, using STSC APL, it folds a sequence of 700 nucleotides in one and a half hour.  相似文献   

20.
Functionally homologous RNA sequences can substantially diverge in their primary sequences but it can be reasonably assumed that they are related in their higher-degree structures. The problem to find such structures and simultaneously satisfy as far as possible the free-energy-minimization criterion, is considered here in two aspects. Firstly a quantitative measure of the folding consensus among secondary structures is defined, translating each structure into a linear representation and using the correlation theorem to compare them. Secondly an algorithm for the parallel search for secondary structures according to the free-energy-minimization criterion, but with a filtering action on the basis of the folding consensus measure is presented. The method is tested on groups of RNA sequences different in origin and in functions, for which proposals of homologous secondary structures based on experimental data exist. A comparison of the results with a blank consisting of a search on the basis of the free energy minimization alone is always performed. In these tests the method shows its ability in obtaining, from different sequences, secondary structures characterized by a high-folding consensus measure also when lower free energy but not homologous structures are possible. Two applications are also shown. The first demonstrates the transfer of experimental data available for one sequence, to a functionally related and therefore homologous one. The second application is the possibility of using a topological probe in the search for precise structural motifs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号