首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 352 毫秒
1.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

2.
The function of many RNAs depends crucially on their structure. Therefore, the design of RNA molecules with specific structural properties has many potential applications, e.g. in the context of investigating the function of biological RNAs, of creating new ribozymes, or of designing artificial RNA nanostructures. Here, we present a new algorithm for solving the following RNA secondary structure design problem: given a secondary structure, find an RNA sequence (if any) that is predicted to fold to that structure. Unlike the (pseudoknot-free) secondary structure prediction problem, this problem appears to be hard computationally. Our new algorithm, "RNA Secondary Structure Designer (RNA-SSD)", is based on stochastic local search, a prominent general approach for solving hard combinatorial problems. A thorough empirical evaluation on computationally predicted structures of biological sequences and artificially generated RNA structures as well as on empirically modelled structures from the biological literature shows that RNA-SSD substantially out-performs the best known algorithm for this problem, RNAinverse from the Vienna RNA Package. In particular, the new algorithm is able to solve structures, consistently, for which RNAinverse is unable to find solutions. The RNA-SSD software is publically available under the name of RNA Designer at the RNASoft website (www.rnasoft.ca).  相似文献   

3.
With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example.  相似文献   

4.
The prediction of RNA secondary structure including pseudoknots remains a challenge due to the intractable computation of the sequence conformation from nucleotide interactions under free energy models. Optimal algorithms often assume a restricted class for the predicted RNA structures and yet still require a high-degree polynomial time complexity, which is too expensive to use. Heuristic methods may yield time-efficient algorithms but they do not guarantee optimality of the predicted structure. This paper introduces a new and efficient algorithm for the prediction of RNA structure with pseudoknots for which the structure is not restricted. Novel prediction techniques are developed based on graph tree decomposition. In particular, based on a simplified energy model, stem overlapping relationships are defined with a graph, in which a specialized maximum independent set corresponds to the desired optimal structure. Such a graph is tree decomposable; dynamic programming over a tree decomposition of the graph leads to an efficient optimal algorithm. The final structure predictions are then based on re-ranking a list of suboptimal structures under a more comprehensive free energy model. The new algorithm is evaluated on a large number of RNA sequence sets taken from diverse resources. It demonstrates overall sensitivity and specificity that outperforms or is comparable with those of previous optimal and heuristic algorithms yet it requires significantly less time than the compared optimal algorithms. The preliminary version of this paper appeared in the proceedings of the 6th Workshop on Algorithms for Bioinformatics (WABI 2006).  相似文献   

5.
We have encountered an unexpected property of rRNA secondary structures that may generalize to all RNAs. Analysis of 8892 ribosomal RNA sequences and structures from a wide range of species revealed unexpected universal compositional trends. First, different categories of rRNA secondary structure (stems, loops, bulges, and junctions) have distinct, characteristic base compositions. Second, the observed patterns of variation are similar among sequences from large and small rRNA subunits and all domains of life, despite extensive evolutionary divergence. Surprisingly, these differences do not seem to be related to selection for different compositions in different structural categories, but rather relate to the overall composition of the molecule: Randomized RNAs with no evolutionary history show the same structure-dependent compositional biases as rRNAs. These compositional trends may improve the accuracy of RNA secondary structure prediction, because they allow us to compare predicted structures against known compositional preferences. They also suggest caution in interpreting differences in the rate of change of the GC content in different parts of the molecule as evidence of differential selection.  相似文献   

6.
BACKGROUND: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.  相似文献   

7.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

8.
Gupta A  Rahman R  Li K  Gribskov M 《RNA biology》2012,9(2):187-199
The close relationship between RNA structure and function underlines the significance of accurately predicting RNA structures from sequence information. Structural topologies such as pseudoknots are of particular interest due to their ubiquity and direct involvement in RNA function, but identifying pseudoknots is a computationally challenging problem and existing heuristic approaches usually perform poorly for RNA sequences of even a few hundred bases. We survey the performance of pseudoknot prediction methods on a data set of full-length RNA sequences representing varied sequence lengths, and biological RNA classes such as RNase P RNA, Group I Intron, tmRNA and tRNA. Pseudoknot prediction methods are compared with minimum free energy and suboptimal secondary structure prediction methods in terms of correct base-pairs, stems and pseudoknots and we find that the ensemble of suboptimal structure predictions succeeds in identifying correct structural elements in RNA that are usually missed in MFE and pseudoknot predictions. We propose a strategy to identify a comprehensive set of non-redundant stems in the suboptimal structure space of a RNA molecule by applying heuristics that reduce the structural redundancy of the predicted suboptimal structures by merging slightly varying stems that are predicted to form in local sequence regions. This reduced-redundancy set of structural elements consistently outperforms more specialized approaches.in data sets. Thus, the suboptimal folding space can be used to represent the structural diversity of an RNA molecule more comprehensively than optimal structure prediction approaches alone.  相似文献   

9.
We present here an extended protein-RNA docking benchmark composed of 71 test cases in which the coordinates of the interacting protein and RNA molecules are available from experimental structures, plus an additional set of 35 cases in which at least one of the interacting subunits is modeled by homology. All cases in the experimental set have available unbound protein structure, and include five cases with available unbound RNA structure, four cases with a pseudo-unbound RNA structure, and 62 cases with the bound RNA form. The additional set of modeling cases comprises five unbound-model, eight model-unbound, 19 model-bound, and three model-model protein-RNA cases. The benchmark covers all major functional categories and contains cases with different degrees of difficulty for docking, as far as protein and RNA flexibility is concerned. The main objective of this benchmark is to foster the development of protein-RNA docking algorithms and to contribute to the better understanding and prediction of protein-RNA interactions. The benchmark is freely available at http://life.bsc.es/pid/protein-rna-benchmark.  相似文献   

10.
Y RNAs are small 'cytoplasmic' RNAs which are components of the Ro ribonucleoprotein (RNP) complex. The core of this complex, which is found in the cell nuclei of higher eukaryotes as well as the cytoplasm, is composed of a complex between the 60 kDa Ro protein and Y RNAs. Human cells contain four distinct Y RNAs (Y1, Y3, Y4 and Y5), while other eukaryotes contain a variable number of Y RNA homologues. When detected in a particular species, the Ro RNP has been present in every cell type within that particular organism. This characteristic, along with its high conservation among vertebrates, suggests an important function for Ro RNP in cellular metabolism; however, this function has not yet been definitively elucidated. In order to identify conserved features of Y RNA sequences and structures which may be directly involved in Ro RNP function, a phylogenetic comparative analysis of Y RNAs has been performed. Sequences of Y RNA homologues from five vertebrate species have been obtained and, together with previously published Y RNA sequences, used to predict Y RNA secondary structures. A novel RNA secondary structure comparison algorithm, the suboptimal RNA analysis program, has been developed and used in conjunction with available algorithms to find phylogenetically conserved secondary structure models for YI, Y3 and Y4 RNAs. Short, conserved sequences within the Y RNAs have been identified and are invariant among vertebrates, consistent with a direct role for Y RNAs in Ro function. A subset of these are located wholly or partially in looped regions in the Y3 and Y4 RNA predicted model structures, in accord with the possibility that these Y RNAs base pair with other cellular nucleic acids or are sites of interaction between the Ro RNP and other macromolecules.  相似文献   

11.
As the raw material for evolution, arbitrary RNA sequences represent the baseline for RNA structure formation and a standard to which evolved structures can be compared. Here, we set out to probe, using physical and chemical methods, the structural properties of RNAs having randomly generated oligonucleotide sequences that were of sufficient length and information content to encode complex, functional folds, yet were unbiased by either genealogical or functional constraints. Typically, these unevolved, nonfunctional RNAs had sequence-specific secondary structure configurations and compact magnesium-dependent conformational states comparable to those of evolved RNA isolates. But unlike evolved sequences, arbitrary sequences were prone to having multiple competing conformations. Thus, for RNAs the size of small ribozymes, natural selection seems necessary to achieve uniquely folding sequences, but not to account for the well-ordered secondary structures and overall compactness observed in nature.  相似文献   

12.
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires knowledge of their tertiary structures. Though computational RNA folding approaches exist, they often require manual manipulation and expert intuition; predicting global long-range tertiary contacts remains challenging. Here we develop a computational approach and associated program module (RNAJAG) to predict helical arrangements/topologies in RNA junctions. Our method has two components: junction topology prediction and graph modeling. First, junction topologies are determined by a data mining approach from a given secondary structure of the target RNAs; second, the predicted topology is used to construct a tree graph consistent with geometric preferences analyzed from solved RNAs. The predicted graphs, which model the helical arrangements of RNA junctions for a large set of 200 junctions using a cross validation procedure, yield fairly good representations compared to the helical configurations in native RNAs, and can be further used to develop all-atom models as we show for two examples. Because junctions are among the most complex structural elements in RNA, this work advances folding structure prediction methods of large RNAs. The RNAJAG module is available to academic users upon request.  相似文献   

13.
Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.  相似文献   

14.
Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.  相似文献   

15.
BACKGROUND: A small class of RNA molecules, in particular the tiny genomes of viroids, are circular. Yet most structure prediction algorithms handle only linear RNAs. The most straightforward approach is to compute circular structures from 'internal' and 'external' substructures separated by a base pair. This is incompatible, however, with the memory-saving approach of the Vienna RNA Package which builds a linear RNA structure from shorter (internal) structures only. RESULT: Here we describe how circular secondary structures can be obtained without additional memory requirements as a kind of 'post-processing' of the linear structures. AVAILABILITY: The circular folding algorithm is implemented in the current version of the of RNAfold program of the Vienna RNA Package, which can be downloaded from http://www.tbi.univie.ac.at/RNA/  相似文献   

16.
A new approach is proposed for determining common RNA secondary structures within a set of homologous RNAs. The approach is a combination of phylogenetic and thermodynamic methods which is based on the prediction of optimal and suboptimal secondary structures, topological similarity searches and phylogenetic comparative analysis. The optimal and suboptimal RNA secondary structures are predicted by energy minimization. Structural comparison of the predicted RNA secondary structures is used to find conserved structures that are topologically similar in all these homologous RNAs. The validity of the conserved structural elements found is then checked by phylogenetic comparison of the sequences. This procedure is used to predict common structures of ribonuclease P (RNAase P) RNAs.  相似文献   

17.
Mining frequent stem patterns from unaligned RNA sequences   总被引:1,自引:0,他引:1  
MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request.  相似文献   

18.
19.
We present an updated version of the protein–RNA docking benchmark, which we first published four years back. The non‐redundant protein–RNA docking benchmark version 2.0 consists of 126 test cases, a threefold increase in number compared to its previous version. The present version consists of 21 unbound–unbound cases, of which, in 12 cases, the unbound RNAs are taken from another complex. It also consists of 95 unbound–bound cases where only the protein is available in the unbound state. Besides, we introduce 10 new bound–unbound cases where only the RNA is found in the unbound state. Based on the degree of conformational change of the interface residues upon complex formation the benchmark is classified into 72 rigid‐body cases, 25 semiflexible cases and 19 full flexible cases. It also covers a wide range of conformational flexibility including small side chain movement to large domain swapping in protein structures as well as flipping and restacking in RNA bases. This benchmark should provide the docking community with more test cases for evaluating rigid‐body as well as flexible docking algorithms. Besides, it will also facilitate the development of new algorithms that require large number of training set. The protein–RNA docking benchmark version 2.0 can be freely downloaded from http://www.csb.iitkgp.ernet.in/applications/PRDBv2 . Proteins 2017; 85:256–267. © 2016 Wiley Periodicals, Inc.  相似文献   

20.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号