首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The secondary structure of an RNA molecule is of great importance and possesses influence, e.g., on the interaction of tRNA molecules with proteins or on the stabilization of mRNA molecules. The classification of secondary structures by means of their order proved useful with respect to numerous applications. In 1978, Waterman, who gave the first precise formal framework for the topic, suggested to determine the number a(n,p) of secondary structures of size n and given order p. Since then, no satisfactory result has been found. Based on an observation due to Viennot et al., we will derive generating functions for the secondary structures of order p from generating functions for binary tree structures with Horton-Strahler number p. These generating functions enable us to compute a precise asymptotic equivalent for a(n,p). Furthermore, we will determine the related number of structures when the number of unpaired bases shows up as an additional parameter. Our approach proves to be general enough to compute the average order of a secondary structure together with all the r-th moments and to enumerate substructures such as hairpins or bulges in dependence on the order of the secondary structures considered.  相似文献   

2.
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.  相似文献   

3.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA.  相似文献   

4.
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations. © 1993 John Wiley & Sons, Inc.  相似文献   

5.
Following Zuker (1986), a saturated secondary structure for a given RNA sequence is a secondary structure such that no base pair can be added without violating the definition of secondary structure, e.g., without introducing a pseudoknot. In the Nussinov-Jacobson energy model (Nussinov and Jacobson, 1980), where the energy of a secondary structure is -1 times the number of base pairs, saturated secondary structures are local minima in the energy landscape, hence form kinetic traps during the folding process. Here we present recurrence relations and closed form asymptotic limits for combinatorial problems related to the number of saturated secondary structures. In addition, Python source code to compute the number of saturated secondary structures having k base pairs can be found at the web servers link of bioinformatics.bc.edu/clotelab/.  相似文献   

6.
Functionally homologous RNA sequences can substantially diverge in their primary sequences but it can be reasonably assumed that they are related in their higher-degree structures. The problem to find such structures and simultaneously satisfy as far as possible the free-energy-minimization criterion, is considered here in two aspects. Firstly a quantitative measure of the folding consensus among secondary structures is defined, translating each structure into a linear representation and using the correlation theorem to compare them. Secondly an algorithm for the parallel search for secondary structures according to the free-energy-minimization criterion, but with a filtering action on the basis of the folding consensus measure is presented. The method is tested on groups of RNA sequences different in origin and in functions, for which proposals of homologous secondary structures based on experimental data exist. A comparison of the results with a blank consisting of a search on the basis of the free energy minimization alone is always performed. In these tests the method shows its ability in obtaining, from different sequences, secondary structures characterized by a high-folding consensus measure also when lower free energy but not homologous structures are possible. Two applications are also shown. The first demonstrates the transfer of experimental data available for one sequence, to a functionally related and therefore homologous one. The second application is the possibility of using a topological probe in the search for precise structural motifs.  相似文献   

7.
Detection of common motifs in RNA secondary structures.   总被引:2,自引:2,他引:0       下载免费PDF全文
We describe a novel computerized system for comparison of RNA secondary structures and demonstrate its use for experimental studies. The system is able to screen a very large number of structures, to cluster similar structures and to detect specific structural motifs. In particular, the system is useful for detecting mutations with specific structural effects among all possible point mutations, and for predicting compensatory mutations that will restore the wild type structure. The algorithms are independent of the folding rules that are used to generate the secondary structures.  相似文献   

8.
Base-pair probability profiles of RNA secondary structures   总被引:7,自引:0,他引:7  
Dynamic programming algorithms are able to predict optimal andsuboptimal secondary structures of RNA. These suboptimal oralternative secondary structures are important for the biologicalfunction of RNA. The distribution of secondary structures presentin solution is governed by the thermodynamic equilibrium betweenthe different structures. An algorithm is presented which approximatesthe total partition function by a Boltzmann–weighted summationof optimal and suboptimal secondary structures at several temperatures.A clear representation of the equilibrium distribution of secondarystructures is derived from a two-dimensional bonding matrixwith base–pairing probability as the third dimension.The temperature dependence of the equilibrium distribution givesthe denaturation behavior of the nucleic acid, which may becompared to experimental optical denaturation curves after correctionfor the hypochromicities of the different base-pairs. Similarly,temperature-induced mobility changes detected in temperature-gradientgel electrophoresis of nucleic acids may be interpreted on thebasis of the temperature dependence of the equilibrium distribution.Results are illustrated for natural circular and synthetic linearpotato spindle tuber viroid RNA respectively, and are comparedto experimental data.  相似文献   

9.
10.
Lorenz WA  Clote P 《PloS one》2011,6(1):e16178
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in O(n3) time and O(n2) space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures--indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.  相似文献   

11.
Computer-aided prediction of RNA secondary structures.   总被引:8,自引:5,他引:3       下载免费PDF全文
A brief survey of computer algorithms that have been developed to generate predictions of the secondary structures of RNA molecules is presented. Two particular methods are described in some detail. The first utilizes a thermodynamic energy minimization algorithm that takes into account the likelihood that short-range folding tends to be favored over long-range interactions. The second utilizes an interactive computer graphic modelling algorithm that enables the user to consider thermodynamic criteria as well as structural data obtained by nuclease susceptibility, chemical reactivity and phylogenetic studies. Examples of structures for prokaryotic 16S and 23S ribosomal RNAs, several eukaryotic 5S ribosomal RNAs and rabbit beta-globin messenger RNA are presented as case studies in order to describe the two techniques. Anm argument is made for integrating the two approaches presented in this paper, enabling the user to generate proposed structures using thermodynamic criteria, allowing interactive refinement of these structures through the application of experimentally derived data.  相似文献   

12.
Random graph theory is used to model and analyse the relationship between sequences and secondary structures of RNA molecules, which are understood as mappings from sequence space into shape space. These maps are non-invertible since there are always many orders of magnitude more sequences than structures. Sequences folding into identical structures formneutral networks. A neutral network is embedded in the set of sequences that arecompatible with the given structure. Networks are modeled as graphs and constructed by random choice of vertices from the space of compatible sequences. The theory characterizes neutral networks by the mean fraction of neutral neighbors (λ). The networks are connected and percolate sequence space if the fraction of neutral nearest neighbors exceeds a threshold value (λ>λ*). Below threshold (λ<λ*), the networks are partitioned into a largest “giant” component and several smaller components. Structure are classified as “common” or “rare” according to the sizes of their pre-images, i.e. according to the fractions of sequences folding into them. The neutral networks of any pair of two different common structures almost touch each other, and, as expressed by the conjecture ofshape space covering sequences folding into almost all common structures, can be found in a small ball of an arbitrary location in sequence space. The results from random graph theory are compared to data obtained by folding large samples of RNA sequences. Differences are explained in terms of specific features of RNA molecular structures. Deicated to professor Manfred Eigen  相似文献   

13.
With more and more ribonucleic acid (RNA) secondary structures accumulated, the need for comparing different RNA secondary structures often arises in function prediction and evolutionary analysis. Numerous efficient algorithms were developed for comparing different RNA secondary structures, but challenges remain. In this paper, six new models based on the linear regression model were proposed for the comparison of RNA secondary structures. The proposed models were tested on a mixed data, containing six secondary structures from RNase P RNAs, three secondary structures from SSU rRNA and five secondary structures from 16S ribosomal RNAs. The results have shown the effectiveness of the proposed models. Moreover, the time complexity of our models is favorable by comparing with that of the existing methods which solve the similar problem.  相似文献   

14.
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is $1.104366 \cdot n^{-3/2} \cdot 2.618034^n$ . Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes $-1$ towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are $1.07427\cdot n^{-3/2} \cdot 2.35467^n$ many saturated structures for a sequence of length $n$ . In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes $-1$ toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).  相似文献   

15.
A new approach to the problem of prediction of secondary structures of RNA, which is based on the kinetic analysis of self-organising molecules is proposed. Structural reconstructions that take place during formation of secondary structures are described in terms of Markov process. A set of states and probability transition were defined. Monte-Carlo methods were used to describe this process. Probability distributions of various secondary structures depending on time are given. Examples of calculations for ensembles of secondary structures of some tRNAs are described. An effective method of steady-state ensemble research, which is based on a quick RESETTING of all possible variance of the secondary structures of RNAs is given. By ascribing to each of these structures the value of probabilities as a function of free energy it was possible to obtain the Boltzmann ensemble of secondary structures.  相似文献   

16.
Dynamic programming algorithms that predict RNA secondary structure by minimizing the free energy have had one important limitation. They were able to predict only one optimal structure. Given the uncertainties of the thermodynamic data and the effects of proteins and other environmental factors on structure, the optimal structure predicted by these methods may not have biological significance. We present a dynamic programming algorithm that can determine optimal and suboptimal secondary structures for an RNA. The power and utility of the method is demonstrated in the folding of the intervening sequence of the rRNA of Tetrahymena. By first identifying the major secondary structures corresponding to the lowest free energy minima, a secondary structure of possible biological significance is derived.  相似文献   

17.
A program for predicting significant RNA secondary structures   总被引:1,自引:0,他引:1  
We describe a program for the analysis of RNA secondary structure.There are two new features in this program. (i) To get vectorspeeds on a vector pipeline machine (such as Cray X-MP/24) wehave vectorized the secondary structure dynamic algorithm. (ii)The statistical significance of a locally ‘optimal’secondary structure is assessed by a Monte Carlo method. Theresults can be depicted graphically including profiles of thestability of local secondary structures and the distributionof the potentially significant secondary structures in the RNAmolecules. Interesting regions where both the potentially significantsecondary structures and ‘open’ structures (single-strandedcoils) occur can be identified by the plots mentioned above.Furthermore, the speed of the vectorized code allows repeatedMonte Carlo simulations with different overlapping window sizes.Thus, the optimal size of the significant secondary structureoccurring in the interesting region can be assessed by repeatingthe Monte Carlo simulation. The power of the program is demonstratedin the analysis of local secondary structures of human T-celllymphotrophic virus type III (HIV). Received on August 17, 1987; accepted on January 5, 1988  相似文献   

18.
19.
W C Johnson 《Proteins》1999,35(3):307-312
We have developed an algorithm to analyze the circular dichroism of proteins for secondary structure. Its hallmark is tremendous flexibility in creating the basis set, and it also combines the ideas of many previous workers. We also present a new basis set containing the CD spectra of 22 proteins with secondary structures from high quality X-ray diffraction data. High flexibility is obtained by doing the analysis with a variable selection basis set of only eight proteins. Many variable selection basis sets fail to give a good analysis, but good analyses can be selected without any a priori knowledge by using the following criteria: (1) the sum of secondary structures should be close to 1.0, (2) no fraction of secondary structure should be less than -0.03, (3) the reconstructed CD spectrum should fit the original CD spectrum with only a small error, and (4) the fraction of alpha-helix should be similar to that obtained using all the proteins in the basis set. This algorithm gives a root mean square error for the predicted secondary structure for the proteins in the basis set of 3.3% for alpha-helix, 2.6% for 3(10)-helix, 4.2% for beta-strand, 4.2% for beta-turn, 2.7% for poly(L-proline) II type 3(1)-helix, and 5.1% for other structures when compared with the X-ray structure.  相似文献   

20.
The most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete. We suggest a probabilistic model for RNA secondary structures with pseudoknots and present a Markov-chain Monte-Carlo Method for sampling RNA structures according to their posterior distribution for a given sequence. We favor Bayesian sampling over optimization methods in this context, because it makes the uncertainty of RNA structure predictions assessable. We demonstrate the benefit of our method in examples with tmRNA and also with simulated data. McQFold, an implementation of our method, is freely available from http://www.cs.uni-frankfurt.de/~metzler/McQFold.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号