首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example.  相似文献   

2.
RNA structure formation is hierarchical and, therefore, secondary structure, the sum of canonical base-pairs, can generally be predicted without knowledge of the three-dimensional structure. Secondary structure prediction algorithms evolved from predicting a single, lowest free energy structure to their current state where statistics can be determined from the thermodynamic ensemble. This article reviews the free energy minimization technique and the salient revolutions in the dynamic programming algorithm methods for secondary structure prediction. Emphasis is placed on highlighting the recently developed method, which statistically samples structures from the complete Boltzmann ensemble.  相似文献   

3.

Background  

Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).  相似文献   

4.
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires RNA tertiary structure knowledge. Although modeling approaches for the study of RNA structures and dynamics lag behind efforts in protein folding, much progress has been achieved in the past two years. Here, we review recent advances in RNA folding algorithms, RNA tertiary motif discovery, applications of graph theory approaches to RNA structure and function, and in silico generation of RNA sequence pools for aptamer design. Advances within each area can be combined to impact many problems in RNA structure and function.  相似文献   

5.
Single-stranded regions in RNA secondary structure are important for RNA–RNA and RNA–protein interactions. We present a probability profile approach for the prediction of these regions based on a statistical algorithm for sampling RNA secondary structures. For the prediction of phylogenetically-determined single-stranded regions in secondary structures of representative RNA sequences, the probability profile offers substantial improvement over the minimum free energy structure. In designing antisense oligonucleotides, a practical problem is how to select a secondary structure for the target mRNA from the optimal structure(s) and many suboptimal structures with similar free energies. By summarizing the information from a statistical sample of probable secondary structures in a single plot, the probability profile not only presents a solution to this dilemma, but also reveals ‘well-determined’ single-stranded regions through the assignment of probabilities as measures of confidence in predictions. In antisense application to the rabbit β-globin mRNA, a significant correlation between hybridization potential predicted by the probability profile and the degree of inhibition of in vitro translation suggests that the probability profile approach is valuable for the identification of effective antisense target sites. Coupling computational design with DNA–RNA array technique provides a rational, efficient framework for antisense oligonucleotide screening. This framework has the potential for high-throughput applications to functional genomics and drug target validation.  相似文献   

6.
Jia M  Luo L  Liu C 《Biopolymers》2004,73(1):16-26
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.  相似文献   

7.
MOTIVATION: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. RESULTS: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods. AVAILABILITY: Our CG implementation is available at http://www.rnasoft.ca/CG/.  相似文献   

8.
This paper presents two in-depth studies on RnaPredict, an evolutionary algorithm for RNA secondary structure prediction. The first study is an analysis of the performance of two thermodynamic models, Individual Nearest Neighbor (INN) and Individual Nearest Neighbor Hydrogen Bond (INN-HB). The correlation between the free energy of predicted structures and the sensitivity is analyzed for 19 RNA sequences. Although some variance is shown, there is a clear trend between a lower free energy and an increase in true positive base pairs. With increasing sequence length, this correlation generally decreases. In the second experiment, the accuracy of the predicted structures for these 19 sequences are compared against the accuracy of the structures generated by the mfold dynamic programming algorithm (DPA) and also to known structures. RnaPredict is shown to outperform the minimum free energy structures produced by mfold and has comparable performance when compared to sub-optimal structures produced by mfold.  相似文献   

9.
We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis method, a novel algorithm independent of multiple sequence alignment was developed to improve performance. When dealing with a single-RNA sequence, the minimum free-energy model is improved by combining it with RNA class information. Secondary structure prediction experiments were done on tRNA and RNAse P RNA; sensitivity and specificity were both improved. Furthermore, software programs were developed for non-commercial use.  相似文献   

10.

Background  

An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms.  相似文献   

11.
MOTIVATION: Ribonucleic acid is vital in numerous stages of protein synthesis; it also possesses important functional and structural roles within the cell. The function of an RNA molecule within a particular organic system is principally determined by its structure. The current physical methods available for structure determination are time-consuming and expensive. Hence, computational methods for structure prediction are sought after. The energies involved by the formation of secondary structure elements are significantly greater than those of tertiary elements. Therefore, RNA structure prediction focuses on secondary structure. RESULTS: We present P-RnaPredict, a parallel evolutionary algorithm for RNA secondary structure prediction. The speedup provided by parallelization is investigated with five sequences, and a dramatic improvement in speedup is demonstrated, especially with longer sequences. An evaluation of the performance of P-RnaPredict in terms of prediction accuracy is made through comparison with 10 individual known structures from 3 RNA classes (5S rRNA, Group I intron 16S rRNA and 16S rRNA) and the mfold dynamic programming algorithm. P-RnaPredict is able to predict structures with higher true positive base pair counts and lower false positives than mfold on certain sequences. AVAILABILITY: P-RnaPredict is available for non-commercial usage. Interested parties should contact Kay C. Wiese (wiese@cs.sfu.ca).  相似文献   

12.
RNAstructure: software for RNA secondary structure prediction and analysis   总被引:1,自引:0,他引:1  

Background  

To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence.  相似文献   

13.
14.
An RNA molecule, particularly a long-chain mRNA, may exist as a population of structures. Further more, multiple structures have been demonstrated to play important functional roles. Thus, a representation of the ensemble of probable structures is of interest. We present a statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures. The forward step of the algorithm computes the equilibrium partition functions of RNA secondary structures with recent thermodynamic parameters. Using conditional probabilities computed with the partition functions in a recursive sampling process, the backward step of the algorithm quickly generates a statistically representative sample of structures. With cubic run time for the forward step, quadratic run time in the worst case for the sampling step, and quadratic storage, the algorithm is efficient for broad applicability. We demonstrate that, by classifying sampled structures, the algorithm enables a statistical delineation and representation of the Boltzmann ensemble. Applications of the algorithm show that alternative biological structures are revealed through sampling. Statistical sampling provides a means to estimate the probability of any structural motif, with or without constraints. For example, the algorithm enables probability profiling of single-stranded regions in RNA secondary structure. Probability profiling for specific loop types is also illustrated. By overlaying probability profiles, a mutual accessibility plot can be displayed for predicting RNA:RNA interactions. Boltzmann probability-weighted density of states and free energy distributions of sampled structures can be readily computed. We show that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics. Our applications suggest that the sampling algorithm may be well suited to prediction of mRNA structure and target accessibility. The algorithm is applicable to the rational design of small interfering RNAs (siRNAs), antisense oligonucleotides, and trans-cleaving ribozymes in gene knock-down studies.  相似文献   

15.
MOTIVATION: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS: In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY: Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.  相似文献   

16.
ABSTRACT: BACKGROUND: Stochastic Context-Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few intuitively designed grammars have remained dominant. In this paper we investigate two automatic search techniques for effective grammars - exhaustive search for very compact grammars and an evolutionary algorithm to find larger grammars. We also examine whether grammar ambiguity is as problematic to structure prediction as has been previously suggested. RESULTS: These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars. CONCLUSIONS: Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many grammars that had strong predictive accuracy, as good or slightly better than those designed manually. Furthermore, several of the best grammars found were ambiguous, demonstrating that such grammars should not be disregarded.  相似文献   

17.
Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.  相似文献   

18.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

19.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

20.

Background  

The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号