首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
RNA multi-structure landscapes   总被引:6,自引:0,他引:6  
Statistical properties of RNA folding landscapes obtained by the partition function algorithm (McCaskill 1990) are investigated in detail. The pair correlation of free energies as a function of the Hamming distance is used as a measure for the ruggedness of the landscape. The calculation of the partition function contains information about the entire ensemble of secondary structures as a function of temperature and opens the door to all quantities of thermodynamic interest, in contrast with the conventional minimal free energy approach. A metric distance of structure ensembles is introduced and pair correlations at the level of the structures themselves are computed. Just as with landscapes based on most stable secondary structure prediction, the landscapes defined on the full biophysical GCAU alphabet are much smoother than the landscapes restricted to pure GC sequences and the correlation lengths are almost constant fractions of the chain lengths. Correlation functions for multi-structure landscapes exhibit an increased correlation length, especially near the melting temperature. However, the main effect on evolution is rather an effective increase in sampling for finite populations where each sequence explores multiple structures. Correspondence to: P. Schuster  相似文献   

2.
Algorithms predicting RNA secondary structures based on different folding criteria – minimum free energies (mfe), kinetic folding (kin), maximum matching (mm) – and different parameter sets are studied systematically. Two base pairing alphabets were used: the binary GC and the natural four-letter AUGC alphabet. Computed structures and free energies depend strongly on both the algorithm and the parameter set. Statistical properties, such as mean number of base pairs, mean numbers of stacks, mean loop sizes, etc., are much less sensitive to the choice of parameter set and even of algorithm. Some features of RNA secondary structures, such as structure correlation functions, shape space covering and neutral networks, seem to depend only on the base pairing logic (GC or AUGC alphabet). Received: 16 May 1996 / Accepted: 10 July 1996  相似文献   

3.
We here present a dynamic programming algorithm which is capable of calculating arbitrary moments of the Boltzmann distribution for RNA secondary structures. We have implemented the algorithm in a program called RNA-VARIANCE and investigate the difference between the Boltzmann distribution of biological and random RNA sequences. We find that the minimum free energy structure of biological sequences has a higher probability in the Boltzmann distribution than random sequences. Moreover, we show that the free energies of biological sequences have a smaller variance than random sequences and that the minimum free energy of biological sequences is closer to the expected free energy of the rest of the structures than that of random sequences. These results suggest that biologically functional RNA sequences not only require a thermodynamically stable minimum free energy structure, but also an ensemble of structures whose free energies are close to the minimum free energy.  相似文献   

4.
This paper studies local connectivity of neutral networks of RNA secondary and pseudoknot structures. A neutral network denotes the set of RNA sequences that fold into a particular structure. It is called locally connected, if in the limit of long sequences, the distance of any two of its sequences scales with their distance in the n-cube. One main result of this paper is that is the threshold probability for local connectivity for neutral networks, considered as random subgraphs of n-cubes. Furthermore, we analyze local connectivity for finite sequence length and different alphabets. We show that it is closely related to the existence of specific paths within the neutral network. We put our theoretical results into context with folding algorithms into minimum-free energy RNA secondary and pseudoknot structures. Finally, we relate our structural findings with dynamics by discussing the role of local connectivity in the context of neutral evolution.  相似文献   

5.
RNA二级结构预测系统构建   总被引:9,自引:0,他引:9  
运用下列RNA二级结构预测算法:碱基最大配对方法、Zuker极小化自由能方法、螺旋区最优堆积、螺旋区随机堆积和所有可能组合方法与基于一级螺旋区的RNA二级结构绘图技术, 构建了RNA二级结构预测系统Rnafold. 另外, 通过随机选取20个tRNA序列, 从自由能和三叶草结构两个方面比较了前4种二级结构预测算法, 并运用t检验方法分析了自由能的统计学差别. 从三叶草结构来看, 以随机堆积方法最好, 其次是螺旋区最优堆积方法和Zuker算法, 以碱基最大配对方法最差. 最后, 分析了两种极小化自由能方法之间的差别.  相似文献   

6.
Within this paper we investigate the Bernoulli model for random secondary structures of ribonucleic acid (RNA) molecules. Assuming that two random bases can form a hydrogen bond with probability p we prove asymptotic equivalents for the averaged number of hairpins and bulges, the averaged loop length, the expected order, the expected number of secondary structures of size n and order k and further parameters all depending on p. In this way we get an insight into the change of shape of a random structure during the process . Afterwards we compare the computed parameters for random structures in the Bernoulli model to the corresponding quantities for real existing secondary structures of large subunit rRNA molecules found in the database of Wuyts et al. That is how it becomes possible to identify those parameters which behave (almost) randomly and those which do not and thus should be considered as interesting, e.g., with respect to the biological functions or the algorithmic prediction of RNA secondary structures.  相似文献   

7.
Experiments have shown that the canonical AUCG genetic alphabet is not the only possible nucleotide alphabet. In this work we address the question ''is the canonical alphabet optimal?'' We make the assumption that the genetic alphabet was determined in the RNA world. Computational tools are used to infer the RNA secondary structure (shape) from a given RNA sequence, and statistics from RNA shapes are gathered with respect to alphabet size. Then, simulations based upon the replication and selection of fixed-sized RNA populations are used to investigate the effect of alternative alphabets upon RNA''s ability to step through a fitness landscape. These results show that for a low copy fidelity the canonical alphabet is fitter than two-, six- and eight-letter alphabets. In higher copy-fidelity experiments, six-letter alphabets outperform the four-letter alphabets, suggesting that the canonical alphabet is indeed a relic of the RNA world.  相似文献   

8.
There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules. Even if the parameters of our grammar are chosen with respect to structural properties of native molecules only (and therefore, independent of molecules' free energy), we prove formulae for the expected minimum free energy and the corresponding variance as functions of the molecule's size which perfectly fit the native behavior of free energies. This gives proof for a high quality of our stochastic model making it a handy tool for further investigations. In fact, the stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of random RNA secondary structures.  相似文献   

9.
In this paper, we study irreducibility in RNA structures. By RNA structure, we mean RNA secondary as well as RNA pseudoknot structures as abstract contact structures. We give an analysis contrasting random and minimum free energy (mfe) configurations and secondary versus pseudoknots structures. In the process, we compute various distributions: the numbers of irreducible substructures and their locations and sizes, parameterized in terms of the maximal number of mutually crossing arcs, k−1, and the minimal size of stacks σ. In particular, we analyze the size of the largest irreducible substructure for random and mfe structures, which is the key factor for the folding time of mfe configurations. We show that the largest irreducible substructure is typically unique and contains almost all nucleotides.  相似文献   

10.
Single-stranded regions in RNA secondary structure are important for RNA–RNA and RNA–protein interactions. We present a probability profile approach for the prediction of these regions based on a statistical algorithm for sampling RNA secondary structures. For the prediction of phylogenetically-determined single-stranded regions in secondary structures of representative RNA sequences, the probability profile offers substantial improvement over the minimum free energy structure. In designing antisense oligonucleotides, a practical problem is how to select a secondary structure for the target mRNA from the optimal structure(s) and many suboptimal structures with similar free energies. By summarizing the information from a statistical sample of probable secondary structures in a single plot, the probability profile not only presents a solution to this dilemma, but also reveals ‘well-determined’ single-stranded regions through the assignment of probabilities as measures of confidence in predictions. In antisense application to the rabbit β-globin mRNA, a significant correlation between hybridization potential predicted by the probability profile and the degree of inhibition of in vitro translation suggests that the probability profile approach is valuable for the identification of effective antisense target sites. Coupling computational design with DNA–RNA array technique provides a rational, efficient framework for antisense oligonucleotide screening. This framework has the potential for high-throughput applications to functional genomics and drug target validation.  相似文献   

11.
Prediction of RNA secondary structure based on helical regions distribution   总被引:5,自引:0,他引:5  
MOTIVATION: RNAs play an important role in many biological processes and knowing their structure is important in understanding their function. Due to difficulties in the experimental determination of RNA secondary structure, the methods of theoretical prediction for known sequences are often used. Although many different algorithms for such predictions have been developed, this problem has not yet been solved. It is thus necessary to develop new methods for predicting RNA secondary structure. The most-used at present is Zuker's algorithm which can be used to determine the minimum free energy secondary structure. However many RNA secondary structures verified by experiments are not consistent with the minimum free energy secondary structures. In order to solve this problem, a method used to search a group of secondary structures whose free energy is close to the global minimum free energy was developed by Zuker in 1989. When considering a group of secondary structures, if there is no experimental data, we cannot tell which one is better than the others. This case also occurs in combinatorial and heuristic methods. These two kinds of methods have several weaknesses. Here we show how the central limit theorem can be used to solve these problems. RESULTS: An algorithm for predicting RNA secondary structure based on helical regions distribution is presented, which can be used to find the most probable secondary structure for a given RNA sequence. It consists of three steps. First, list all possible helical regions. Second, according to central limit theorem, estimate the occurrence probability of every helical region based on the Monte Carlo simulation. Third, add the helical region with the biggest probability to the current structure and eliminate the helical regions incompatible with the current structure. The above processes can be repeated until no more helical regions can be added. Take the current structure as the final RNA secondary structure. In order to demonstrate the confidence of the program, a test on three RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena ribosomal RNA intervening sequence, is performed. AVAILABILITY: The program is written in Turbo Pascal 7.0. The source code is available upon request. CONTACT: Wujj@nic.bmi.ac.cn or Liwj@mail.bmi.ac.cn   相似文献   

12.
13.
Comparisons within and between the human, mouse and rabbit immunoglobulin-kappa gene (J-C region) DNA sequences are carried out in terms of three two-letter nucleotide alphabets: (i) S-W alphabet (W = A or T; S = G or C); (ii) P-Q alphabet which distinguishes purines (P = A or G) from pyrimidines (Q = C or T); and (iii) a 'control' E-F alphabet (E = A or C; F = G or T). All statistically significant direct repeats within each of the three sequences and all significant block identities (a set of consecutive matching letters) shared by two or more sequences are determined for each alphabet. By contrast to the S-W and E-F alphabets, the P-Q alphabet comparisons reveal an abundance of statistically significant block identities not seen at the nucleotide level. Various interpretations of these P-Q structures with respect to control and functional roles are considered.  相似文献   

14.
In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.  相似文献   

15.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.  相似文献   

16.

Background

Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest.

Results

In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform) random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure). Compared to well-known sampling approaches used in several structure prediction tools (such as SFold) ours has two major advantages: Firstly, after a preprocessing step in time for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity while other algorithms typically have a runtime in . Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with logarithmized probabilities.

Conclusion

A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too.  相似文献   

17.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

18.
The functions of RNAs, like proteins, are determined by their structures, which, in turn, are determined by their sequences. Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships. For RNA sequence alignment, most methods developed for protein and DNA sequence alignment can be directly applied. RNA 3-dimensional structure alignment, on the other hand, tends to be more difficult than protein structure alignment due to the lack of regular secondary structures as observed in proteins. Most of the existing RNA 3D structure alignment methods use only the backbone geometry and ignore the sequence information. Using both the sequence and backbone geometry information in RNA alignment may not only produce more accurate classification, but also deepen our understanding of the sequence–structure–function relationship of RNA molecules. In this study, we developed a new RNA alignment method based on elastic shape analysis (ESA). ESA treats RNA structures as three dimensional curves with sequence information encoded on additional dimensions so that the alignment can be performed in the joint sequence–structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. Based on ESA, a rigorous mathematical framework can be built for RNA structure comparison. Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs. Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets. The programs are available at http://stat.fsu.edu/ ∼jinfeng/ESA.html.  相似文献   

19.
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28‐letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28‐letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. Proteins 2015; 83:631–639. © 2015 Wiley Periodicals, Inc.  相似文献   

20.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. Supported by the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) and the Nonlinear Project (973) of the NSM  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号