首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An RNA secondary structure is saturated if no base pairs can be added without violating the definition of secondary structure. Here we describe a new algorithm, RNAsat, which for a given RNA sequence a, an integral temperature 0 相似文献   

2.
Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.  相似文献   

3.
MOTIVATION: Function derives from structure, therefore, there is need for methods to predict functional RNA structures. RESULTS: The Dynalign algorithm, which predicts the lowest free energy secondary structure common to two unaligned RNA sequences, is extended to the prediction of a set of low-energy structures. Dot plots can be drawn to show all base pairs in structures within an energy increment. Dynalign predicts more well-defined structures than structure prediction using a single sequence; in 5S rRNA sequences, the average number of base pairs in structures with energy within 20% of the lowest energy structure is 317 using Dynalign, but 569 using a single sequence. Structure prediction with Dynalign can also be constrained according to experiment or comparative analysis. The accuracy, measured as sensitivity and positive predictive value, of Dynalign is greater than predictions with a single sequence. AVAILABILITY: Dynalign can be downloaded at http://rna.urmc.rochester.edu  相似文献   

4.
The number of base pairs in the denatured “B” form of E. coli 5S RNA has been determined directly from 400 MHz high resolution proton nuclear magnetic resonance spectroscopy. The experimental NMR spectrum from ?11.6 to ?14.5 ppm from a sodium 2,2-dimethyl-2-silapentane sulfonate reference can be simulated by a theoretical spectrum consisting of 33 Lorentzian lines of equal width (corresponding to 33 base pairs) at 26°C. This result is inconsistent with previously proposed secondary structures of 17 and 23 base pairs, but is readily adapted to the Luoma-Marshall cloverleaf secondary structure.  相似文献   

5.
Most of the hairpin, internal and junction loops that appear single-stranded in standard RNA secondary structures form recurrent 3D motifs, where non-Watson–Crick base pairs play a central role. Non-Watson–Crick base pairs also play crucial roles in tertiary contacts in structured RNA molecules. We previously classified RNA base pairs geometrically so as to group together those base pairs that are structurally similar (isosteric) and therefore able to substitute for each other by mutation without disrupting the 3D structure. Here, we introduce a quantitative measure of base pair isostericity, the IsoDiscrepancy Index (IDI), to more accurately determine which base pair substitutions can potentially occur in conserved motifs. We extract and classify base pairs from a reduced-redundancy set of RNA 3D structures from the Protein Data Bank (PDB) and calculate centroids (exemplars) for each base combination and geometric base pair type (family). We use the exemplars and IDI values to update our online Basepair Catalog and the Isostericity Matrices (IM) for each base pair family. From the database of base pairs observed in 3D structures we derive base pair occurrence frequencies for each of the 12 geometric base pair families. In order to improve the statistics from the 3D structures, we also derive base pair occurrence frequencies from rRNA sequence alignments.  相似文献   

6.
A partition function calculation for RNA secondary structure is presented that uses a current set of nearest neighbor parameters for conformational free energy at 37 degrees C, including coaxial stacking. For a diverse database of RNA sequences, base pairs in the predicted minimum free energy structure that are predicted by the partition function to have high base pairing probability have a significantly higher positive predictive value for known base pairs. For example, the average positive predictive value, 65.8%, is increased to 91.0% when only base pairs with probability of 0.99 or above are considered. The quality of base pair predictions can also be increased by the addition of experimentally determined constraints, including enzymatic cleavage, flavin mono-nucleotide cleavage, and chemical modification. Predicted secondary structures can be color annotated to demonstrate pairs with high probability that are therefore well determined as compared to base pairs with lower probability of pairing.  相似文献   

7.
RNA secondary structure prediction using free energy minimization is one method to gain an approximation of structure. Constraints generated by enzymatic mapping or chemical modification can improve the accuracy of secondary structure prediction. We report a facile method that identifies single-stranded regions in RNA using short, randomized DNA oligonucleotides and RNase H cleavage. These regions are then used as constraints in secondary structure prediction. This method was used to improve the secondary structure prediction of Escherichia coli 5S rRNA. The lowest free energy structure without constraints has only 27% of the base pairs present in the phylogenetic structure. The addition of constraints from RNase H cleavage improves the prediction to 100% of base pairs. The same method was used to generate secondary structure constraints for yeast tRNAPhe, which is accurately predicted in the absence of constraints (95%). Although RNase H mapping does not improve secondary structure prediction, it does eliminate all other suboptimal structures predicted within 10% of the lowest free energy structure. The method is advantageous over other single-stranded nucleases since RNase H is functional in physiological conditions. Moreover, it can be used for any RNA to identify accessible binding sites for oligonucleotides or small molecules.  相似文献   

8.
In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685?n ?3/2?1.62178 n , and the asymptotic expected number of hairpins follows a normal distribution with mean $0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}$ . Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194?n, a value greater than the asymptotic expected number 0.105573?n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica? computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/.  相似文献   

9.
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is $1.104366 \cdot n^{-3/2} \cdot 2.618034^n$ . Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes $-1$ towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are $1.07427\cdot n^{-3/2} \cdot 2.35467^n$ many saturated structures for a sequence of length $n$ . In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes $-1$ toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).  相似文献   

10.
We make a novel contribution to the theory of biopolymer folding, by developing an efficient algorithm to compute the number of locally optimal secondary structures of an RNA molecule, with respect to the Nussinov-Jacobson energy model. Additionally, we apply our algorithm to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements from A. Bock (personal communication), hammerhead ribozymes from Rfam (Griffiths-Jones et al., 2003), and tRNAs from Sprinzl's database (Sprinzl et al., 1998). It had previously been reported that tRNA has lower minimum free energy than random RNA of the same compositional frequency (Clote et al., 2003; Rivas and Eddy, 2000), although the situation is less clear for mRNA (Seffens and Digby, 1999; Workman and Krogh, 1999; Cohen and Skienna, 2002),(1) which plays no structural role. Applications of our algorithm extend knowledge of the energy landscape differences between naturally occurring and random RNA. Given an RNA molecule a(1), ... , a(n) and an integer k > or = 0, a k-locally optimal secondary structure S is a secondary structure on a(1), ... , a(n) which has k fewer base pairs than the maximum possible number, yet for which no basepairs can be added without violation of the definition of secondary structure (e.g., introducing a pseudoknot). Despite the fact that the number numStr(k) of k-locally optimal structures for a given RNA molecule in general is exponential in n, we present an algorithm running in time O(n (4)) and space O(n (3)), which computes numStr(k) for each k. Structurally important RNA, such as SECIS elements, hammerhead ribozymes, and tRNA, all have a markedly smaller number of k-locally optimal structures than that of random RNA of the same dinucleotide frequency, for small and moderate values of k. This suggests a potential future role of our algorithm as a tool to detect noncoding RNA genes.  相似文献   

11.
Lorenz WA  Clote P 《PloS one》2011,6(1):e16178
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in O(n3) time and O(n2) space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures--indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.  相似文献   

12.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

13.
MOTIVATION: We describe algorithms implemented in a new software package, RNAbor, to investigate structures in a neighborhood of an input secondary structure S of an RNA sequence s. The input structure could be the minimum free energy structure, the secondary structure obtained by analysis of the X-ray structure or by comparative sequence analysis, or an arbitrary intermediate structure. RESULTS: A secondary structure T of s is called a delta-neighbor of S if T and S differ by exactly delta base pairs. RNAbor computes the number (N(delta)), the Boltzmann partition function (Z(delta)) and the minimum free energy (MFE(delta)) and corresponding structure over the collection of all delta-neighbors of S. This computation is done simultaneously for all delta < or = m, in run time O (mn3) and memory O(mn2), where n is the sequence length. We apply RNAbor for the detection of possible RNA conformational switches, and compare RNAbor with the switch detection method paRNAss. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction. AVAILABILITY: http://bioinformatics.bc.edu/clotelab/RNAbor/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
15.
We present a novel topological classification of RNA secondary structures with pseudoknots. It is based on the topological genus of the circular diagram associated to the RNA base-pair structure. The genus is a positive integer number whose value quantifies the topological complexity of the folded RNA structure. In such a representation, planar diagrams correspond to pure RNA secondary structures and have zero genus, whereas non-planar diagrams correspond to pseudoknotted structures and have higher genus. The topological genus allows for the definition of topological folding motifs, similar in spirit to those introduced and commonly used in protein folding. We analyze real RNA structures from the databases Worldwide Protein Data Bank and Pseudobase and classify them according to their topological genus. For simplicity, we limit our analysis by considering only Watson-Crick complementary base pairs and G-U wobble base pairs. We compare the results of our statistical survey with existing theoretical and numerical models. We also discuss possible applications of this classification and show how it can be used for identifying new RNA structural motifs.  相似文献   

16.
The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.  相似文献   

17.
18.
A general secondary structure is proposed for the 5S RNA of prokaryotic ribosomes, based on helical energy filtering calculations. We have considered all secondary structures that are common to 17 different prokaryotic 5S RNAs and for each 5S sequence calculated the (global) minimum energy secondary structure (300,000 common structures are possible for each sequence). The 17 different minimum energy secondary structures all correspond, with minor differences, to a single, secondary structure model. This is strong evidence that this general 5S folding pattern corresponds to the secondary structure of the functional 5S rRNA. The general 5S secondary structure is forked and in analogy with the cloverleaf of tRNA is named the "wishbone" model. It constant 8 double helical regions; one in the stem, four in the first, or constant arm, and three in the second arm. Four of these double helical regions are present in a model earlier proposed (1) and four additional regions not proposed by them are presented here. In the minimum energy general structure, the four helices in the constant arm are exactly 15 nucleotide pairs long. These helices are stacked in the sequences from gram-positive bacteria and probably stacked in gram-negative sequences as well. In sequences from gram-positive bacteria the length of the constant arm is maintained at 15 stacked pairs by an unusual minimum energy interaction involving a C26-G57 base pair intercalated between two adjacent helical regions.  相似文献   

19.
The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can handle only limited types of pseudoknots. In this paper, we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. For an RNA sequence of n bases, the approximation algorithm for planar secondary structures runs in O(n(3)) time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions.  相似文献   

20.
The 3-dimensional fold of an RNA molecule is largely determined by patterns of intramolecular hydrogen bonds between bases. Predicting the base pairing network from the sequence, also referred to as RNA secondary structure prediction or RNA folding, is a nondeterministic polynomial-time (NP)-complete computational problem. The structure of the molecule is strongly predictive of its functions and biochemical properties, and therefore the ability to accurately predict the structure is a crucial tool for biochemists. Many methods have been proposed to efficiently sample possible secondary structure patterns. Classic approaches employ dynamic programming, and recent studies have explored approaches inspired by evolutionary and machine learning algorithms. This work demonstrates leveraging quantum computing hardware to predict the secondary structure of RNA. A Hamiltonian written in the form of a Binary Quadratic Model (BQM) is derived to drive the system toward maximizing the number of consecutive base pairs while jointly maximizing the average length of the stems. A Quantum Annealer (QA) is compared to a Replica Exchange Monte Carlo (REMC) algorithm programmed with the same objective function, with the QA being shown to be highly competitive at rapidly identifying low energy solutions. The method proposed in this study was compared to three algorithms from literature and, despite its simplicity, was found to be competitive on a test set containing known structures with pseudoknots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号