首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.

Background  

In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces.  相似文献   

2.
RNA secondary structure formation is a field of considerable biological interest as well as a model system for understanding generic properties of heteropolymer folding. This system is particularly attractive because the partition function and thus all thermodynamic properties of RNA secondary structure ensembles can be calculated numerically in polynomial time for arbitrary sequences and homopolymer models admit analytical solutions. Such solutions for many different aspects of the combinatorics of RNA secondary structure formation share the property that the final solution depends on differences of statistical weights rather than on the weights alone. Here, we present a unified approach to a large class of problems in the field of RNA secondary structure formation. We prove a generic theorem for the calculation of RNA folding partition functions. Then, we show that this approach can be applied to the study of the molten-native transition, denaturation of RNA molecules, as well as to studies of the glass phase of random RNA sequences.  相似文献   

3.
MOTIVATION: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases. RESULTS: Here we present an algorithm, iterated loop matching, for reliably and efficiently predicting RNA secondary structures including pseudoknots. The method can utilize either thermodynamic or comparative information or both, thus is able to predict pseudoknots for both aligned and individual sequences. We have tested the algorithm on a number of RNA families. Using 8-12 homologous sequences, the algorithm correctly identifies more than 90% of base-pairs for short sequences and 80% overall. It correctly predicts nearly all pseudoknots and produces very few spurious base-pairs for sequences without pseudoknots. Comparisons show that our algorithm is both more sensitive and more specific than the maximum weighted matching method. In addition, our algorithm has high-prediction accuracy on individual sequences, comparable with the PKNOTS algorithm, while using much less computational resources. AVAILABILITY: The program has been implemented in ANSI C and is freely available for academic use at http://www.cse.wustl.edu/~zhang/projects/rna/ilm/ Supplementary information: http://www.cse.wustl.edu/~zhang/projects/rna/ilm/  相似文献   

4.
A kinetic approach to the prediction of RNA secondary structures   总被引:3,自引:0,他引:3  
A new approach to the prediction of secondary RNA structures based on the analysis of the kinetics of molecular self-organisation is proposed herein. The Markov process is used to describe structural reconstructions during secondary structure formation. This process is modelled by a Monte-Carlo method. Examples of the calculation by this method of the secondary structures kinetic ensemble are given. Distribution of time-dependent probabilities within the ensembles is obtained. An effective method for search for the equilibrium ensemble is also suggested. This method is based on the construction of a tree of all possible secondary structures of RNA. By ascribing a probability for each structure (according to its free energy) the Boltzmann equilibrium ensemble can be obtained.  相似文献   

5.
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.  相似文献   

6.
Given an RNA sequence and two designated secondary structures A, B, we describe a new algorithm that computes a nearly optimal folding pathway from A to B. The algorithm, RNAtabupath, employs a tabu semi-greedy heuristic, known to be an effective search strategy in combinatorial optimization. Folding pathways, sometimes called routes or trajectories, are computed by RNAtabupath in a fraction of the time required by the barriers program of Vienna RNA Package. We benchmark RNAtabupath with other algorithms to compute low energy folding pathways between experimentally known structures of several conformational switches. The RNApathfinder web server, source code for algorithms to compute and analyze pathways and supplementary data are available at http://bioinformatics.bc.edu/clotelab/RNApathfinder.  相似文献   

7.
An algorithm for comparing multiple RNA secondary structures   总被引:1,自引:0,他引:1  
A new distributed computational procedure is presented for rapidlydetermining the similarity of multiple conformations of RNAsecondary structures. A data abstraction scheme is utilizedto reduce the quantity of data that must be handled to determinethe degree of similarity among multiple structures. The methodhas been used to compare 200 structures with easy visualizationof both those structures and substructures that are similarand those that are vastly different. It has the capability ofprocessing many more conformations as a function of researchrequirements. The algorithm is described as well as some suggestionsfor future uses and extensions. Received on October 29, 1987; accepted on May 4, 1988  相似文献   

8.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA.  相似文献   

9.
10.
In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.  相似文献   

11.
Immobilized small deoxyribozyme to distinguish RNA secondary structures   总被引:3,自引:0,他引:3  
Okumoto Y  Ohmichi T  Sugimoto N 《Biochemistry》2002,41(8):2769-2773
The RNA folding variation due to one or more mutations leads to different RNA splicing, RNA processing, and translational controls as a result of differences in the primary and higher-ordered structures that interact with other cellular molecules. Thus, distinguishing RNA folding is one of the guides to detect the gene functions related to disease and drug responses. We found, previously, a small Ca(2+)-dependent deoxyribozyme with its site-specific RNA cleavage [Sugimoto, N., Okumoto, Y., and Ohmichi, T. (1999) J. Chem. Soc., Perkin Trans. 2, 1382-1388]. In this study, we report the potential of this deoxyribozyme as a useful tool to distinguish RNA foldings. It is found that the immobilized deoxyribozyme using avidin-biotin interaction cleaves the target site within only single-stranded RNAs. The systematic design for the target RNA hairpin loops shows that the immobilized deoxyribozyme is able to cleave them with a > or =17 nucleotide loop size at only one site under single-turnover conditions. Furthermore, an RNA cleavage reaction is detected using the immobilized deoxyribozyme on a surface plasmon resonance (SPR) sensor chip. These results show that the immobilized deoxyribozymes on a column and on an SPR sensor chip become a novel and useful tool to distinguish the RNA foldings.  相似文献   

12.
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations. © 1993 John Wiley & Sons, Inc.  相似文献   

13.
An analysis of higher-order structures of globular proteins by means of a distance-constraint approach is presented. Conformations are generated for each of 21 test proteins of small and medium sizes by optimizing an objective functionf=w ij(d ijd ij)2, whered ij is a distance between residuesi andj in a calculated conformation, d ij is an assigned distance to the (ij) pair of residues which is determined based on the statistics of known three-dimensional structures of 14 proteins in the earlier study, andw ij is a weighting factor. d ij involves information about hydrophobicity and hydrophilicity of each amino acid residue and about connectivity of a polypeptide chain. In these calculations, only the amino acid sequence is used as input data specific to a calculated protein. With respect to higher-order structures regenerated in the optimized conformations, the following properties are analyzed: (a) N14 of a residue, defined as the number of residues surrounding the residue located within a sphere of radius of 14 Å; (b) root-mean-square differences of the global and local conformations from the corresponding X-ray conformations; (c) distance profiles in the short and medium ranges; and (d) distance maps. The effects of supplementary information about locations of secondary structures and disulfide bonds are also examined to discuss the potential ability of this methodology to predict the three-dimensional structures of globular proteins.  相似文献   

14.
Circular dichroism (CD) spectroscopy is a widely used technique for the evaluation of protein secondary structures that has a significant impact for the understanding of molecular biology. However, the quantitative analysis of protein secondary structures based on CD spectra is still a hard work due to the serious overlap of the spectra corresponding to different structural motifs. Here, Tchebichef image moment (TM) approach is introduced for the first time, which can effectively extract the chemical features in CD spectra for the quantitative analysis of protein secondary structures. The proposed approach was applied to analyze reference set and the obtained results were evaluated by the strict statistical parameters such as correlation coefficient, cross‐validation correlation coefficient and root mean squared error. Compared with several specialized prediction methods, TM approach provided satisfactory results, especially for turns and unordered structures. Our study indicates that TM approach can be regarded as a feasible tool for the analysis of the secondary structures of proteins based on CD spectra. An available TMs package is provided and can be used directly for secondary structures prediction.  相似文献   

15.
SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html  相似文献   

16.
17.
18.
A mathematical model for analyzing the secondary structures of RNA is developed that is based on the connection matrix associated with the planar p-h graph. The classification of the elementary structures allows the introduction of the basis of structural space from which to build the global secondary structure. All admissible solutions belong to the configuration space and can be obtained directly from its basis.  相似文献   

19.
20.
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号