首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Combinatorics of RNA Structures with Pseudoknots   总被引:1,自引:0,他引:1  
In this paper, we derive the generating function of RNA structures with pseudoknots. We enumerate all k-noncrossing RNA pseudoknot structures categorized by their maximal sets of mutually intersecting arcs. In addition, we enumerate pseudoknot structures over circular RNA. For 3-noncrossing RNA structures and RNA secondary structures we present a novel 4-term recursion formula and a 2-term recursion, respectively. Furthermore, we enumerate for arbitrary k all k-noncrossing, restricted RNA structures i.e. k-noncrossing RNA structures without 2-arcs i.e. arcs of the form (i,i+2), for 1≤in−2.  相似文献   

2.
In this paper we study canonical RNA pseudoknot structures. We prove central limit theorems for the distributions of the arc-numbers of k-noncrossing RNA structures with given minimum stack-size τ over n nucleotides. Furthermore we compare the space of all canonical structures with canonical minimum free energy pseudoknot structures. Our results generalize the analysis of Schuster et al. obtained for RNA secondary structures [Hofacker, I.L., Schuster, P., Stadler, P.F., 1998. Combinatorics of RNA secondary structures. Discrete Appl. Math. 88, 207–237; Jin, E.Y., Reidys, C.M., 2007b. Central and local limit theorems for RNA structures. J. Theor. Biol. 250 (2008), 547–559; 2007a. Asymptotic enumeration of RNA structures with pseudoknots. Bull. Math. Biol., 70 (4), 951–970] to k-noncrossing RNA structures. Here k2 and τ are arbitrary natural numbers. We compare canonical pseudoknot structures to arbitrary structures and show that canonical pseudoknot structures exhibit significantly smaller exponential growth rates. We then compute the asymptotic distribution of their arc-numbers. Finally, we analyze how the minimum stack-size and crossing number factor into the distributions.  相似文献   

3.
In this paper we study the distribution of stacks/loops in k-non-crossing, τ-canonical RNA pseudoknot structures (〈k,τ〉-structures). Here, an RNA structure is called k-non-crossing if it has no more than k-1 mutually crossing arcs and τ-canonical if each arc is contained in a stack of length at least τ. Based on the ordinary generating function of 〈k,τ〉-structures [G. Ma, C.M. Reidys, Canonical RNA pseudoknot structures, J. Comput. Biol. 15 (10) (2008) 1257] we derive the bivariate generating function , where Tk,τ(n,t) is the number of 〈k,τ〉-structures having exactly t stacks and study its singularities. We show that for a specific parametrization of the variable u, Tk,τ(x,u) exhibits a unique, dominant singularity. The particular shift of this singularity parametrized by u implies a central limit theorem for the distribution of stack-numbers. Our results are of importance for understanding the ‘language’ of minimum-free energy RNA pseudoknot structures, generated by computer folding algorithms.  相似文献   

4.
In this paper, we present the asymptotic enumeration of RNA structures with pseudoknots. We develop a general framework for the computation of exponential growth rate and the asymptotic expansion for the numbers of k-noncrossing RNA structures. Our results are based on the generating function for the number of k-noncrossing RNA pseudoknot structures, , derived in Bull. Math. Biol. (2008), where k−1 denotes the maximal size of sets of mutually intersecting bonds. We prove a functional equation for the generating function and obtain for k=2 and k=3, the analytic continuation and singular expansions, respectively. It is implicit in our results that for arbitrary k singular expansions exist and via transfer theorems of analytic combinatorics, we obtain asymptotic expression for the coefficients. We explicitly derive the asymptotic expressions for 2- and 3-noncrossing RNA structures. Our main result is the derivation of the formula .  相似文献   

5.
In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685?n ?3/2?1.62178 n , and the asymptotic expected number of hairpins follows a normal distribution with mean $0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}$ . Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194?n, a value greater than the asymptotic expected number 0.105573?n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica? computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/.  相似文献   

6.

Background  

RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and G-U-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, RNAinverse, RNA-SSD as well as INFO-RNA are limited to RNA secondary structures, we present in this paper the inverse folding algorithm Inv which can deal with 3-noncrossing, canonical pseudoknot structures.  相似文献   

7.
In this paper, we study irreducibility in RNA structures. By RNA structure, we mean RNA secondary as well as RNA pseudoknot structures as abstract contact structures. We give an analysis contrasting random and minimum free energy (mfe) configurations and secondary versus pseudoknots structures. In the process, we compute various distributions: the numbers of irreducible substructures and their locations and sizes, parameterized in terms of the maximal number of mutually crossing arcs, k−1, and the minimal size of stacks σ. In particular, we analyze the size of the largest irreducible substructure for random and mfe structures, which is the key factor for the folding time of mfe configurations. We show that the largest irreducible substructure is typically unique and contains almost all nucleotides.  相似文献   

8.
In this paper, we analyze the length spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least \(n-O(n^{1/2})\) and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite k, the distribution of rainbows of length k becomes for large n a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.  相似文献   

9.
This paper studies local connectivity of neutral networks of RNA secondary and pseudoknot structures. A neutral network denotes the set of RNA sequences that fold into a particular structure. It is called locally connected, if in the limit of long sequences, the distance of any two of its sequences scales with their distance in the n-cube. One main result of this paper is that is the threshold probability for local connectivity for neutral networks, considered as random subgraphs of n-cubes. Furthermore, we analyze local connectivity for finite sequence length and different alphabets. We show that it is closely related to the existence of specific paths within the neutral network. We put our theoretical results into context with folding algorithms into minimum-free energy RNA secondary and pseudoknot structures. Finally, we relate our structural findings with dynamics by discussing the role of local connectivity in the context of neutral evolution.  相似文献   

10.
11.
Let ${\mathcal {S}}$ denote the set of (possibly noncanonical) base pairs {i, j} of an RNA tertiary structure; i.e. ${\{i, j\} \in \mathcal {S}}$ if there is a hydrogen bond between the ith and jth nucleotide. The page number of ${\mathcal {S}}$ , denoted ${\pi(\mathcal {S})}$ , is the minimum number k such that ${\mathcal {S}}$ can be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ${\omega(\mathcal {S}) \leq \pi(\mathcal {S}) \leq \omega(\mathcal {S}) \cdot \log n}$ , where the clique number of ${\mathcal {S}, \omega(\mathcal {S})}$ , denotes the maximum number of base pairs that pairwise cross each other.  相似文献   

12.
Recent structural and functional characterization of the pseudoknot in the Saccharomyces cerevisiae telomerase RNA (TLC1) has demonstrated that tertiary structure is present, similar to that previously described for the human and Kluyveromyces lactis telomerase RNAs. In order to biophysically characterize the identified pseudoknot secondary and tertiary structures, UV-monitored thermal denaturation experiments, nuclear magnetic resonance spectroscopy, and native gel electrophoresis were used to investigate various potential conformations in the pseudoknot domain in vitro, in the absence of the telomerase protein. Here, we demonstrate that alternative secondary structures are not mutually exclusive in the S. cerevisiae telomerase RNA, tertiary structure contributes 1.5 kcal mol(-1) to the stability of the pseudoknot (≈ half the stability observed for the human telomerase pseudoknot), and identify additional base pairs in the 3' pseudoknot stem near the helical junction. In addition, sequence conservation in an adjacent overlapping hairpin appears to prevent dimerization and alternative conformations in the context of the entire pseudoknot-containing region. Thus, this work provides a detailed in vitro characterization of the thermodynamic features of the S. cerevisiae TLC1 pseudoknot region for comparison with other telomerase RNA pseudoknots.  相似文献   

13.
The −1 ribosomal frameshifting requires the existence of an in cis RNA slippery sequence and is promoted by a downstream stimulator RNA. An atypical RNA pseudoknot with an extra stem formed by complementary sequences within loop 2 of an H-type pseudoknot is characterized in the severe acute respiratory syndrome coronavirus (SARS CoV) genome. This pseudoknot can serve as an efficient stimulator for −1 frameshifting in vitro. Mutational analysis of the extra stem suggests frameshift efficiency can be modulated via manipulation of the secondary structure within the loop 2 of an infectious bronchitis virus-type pseudoknot. More importantly, an upstream RNA sequence separated by a linker 5′ to the slippery site is also identified to be capable of modulating the −1 frameshift efficiency. RNA sequence containing this attenuation element can downregulate −1 frameshifting promoted by an atypical pseudoknot of SARS CoV and two other pseudoknot stimulators. Furthermore, frameshift efficiency can be reduced to half in the presence of the attenuation signal in vivo. Therefore, this in cis RNA attenuator represents a novel negative determinant of general importance for the regulation of −1 frameshift efficiency, and is thus a potential antiviral target.  相似文献   

14.
The phylogenetically-derived secondary structures of telomerase RNAs (TR) from ciliates, yeasts and vertebrates are surprisingly conserved and contain a pseudoknot domain at a similar location downstream of the template. As the pseudoknot domains of Tetrahymena TR (tTR) and human TR (hTR) mediate certain similar functions, we hypothesized that they might be functionally interchangeable. We constructed a chimeric TR (htTR) by exchanging the hTR pseudoknot sequences for the tTR pseudoknot region. The chimeric RNA reconstituted human telomerase activity when coexpressed with hTERT in vitro, but exhibited defects in repeat addition processivity and levels of DNA synthesis compared to hTR. Activity was dependent on tTR sequences within the chimeric RNA. htTR interacted with hTERT in vitro and dimerized predominantly via a region of its hTR backbone, the J7b/8a loop. Introduction of htTR in telomerase-negative cells stably expressing hTERT did not reconstitute an active enzyme able to elongate telomeres. Thus, our results indicate that the chimeric RNA reconstituted a weakly active nonprocessive human telomerase enzyme in vitro that was defective in telomere elongation in vivo. This suggests that there may be species-specific requirements for pseudoknot functions.  相似文献   

15.
16.
A statistical approach has been applied to analyse primary structure patterns at inner positions of α-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse α-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the α-helix were considered as inner. Amino acid pairings i, i+k (k=1, 2, 3, 4, 5), were analysed and the corresponding 20×20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for α-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.  相似文献   

17.
Sequences from the 5′ region of R2 retrotransposons of four species of silk moth are reported. In Bombyx mori, this region of the R2 messenger RNA contains a binding site for R2 protein and mediates interactions critical to R2 element insertion into the host genome. A model of secondary structure for a segment of this RNA is proposed on the basis of binding to oligonucleotide microarrays, chemical mapping, and comparative sequence analysis. Five conserved secondary structures are identified, including a novel pseudoknot. There is an apparent transition from an entirely RNA structure coding function in most of the 5′ segment to a protein coding function near the 3′ end. This suggests that local regions evolved under separate functional constraints (structural, coding, or both).  相似文献   

18.
The problem of finding k-edge-connected components is a fundamental problem in computer science. Given a graph G = (V, E), the problem is to partition the vertex set V into {V 1, V 2,…, V h}, where each V i is maximized, such that for any two vertices x and y in V i, there are k edge-disjoint paths connecting them. In this paper, we present an algorithm to solve this problem for all k. The algorithm preprocesses the input graph to construct an Auxiliary Graph to store information concerning edge-connectivity among every vertex pair in O(Fn) time, where F is the time complexity to find the maximum flow between two vertices in graph G and n = ∣V∣. For any value of k, the k-edge-connected components can then be determined by traversing the auxiliary graph in O(n) time. The input graph can be a directed or undirected, simple graph or multigraph. Previous works on this problem mainly focus on fixed value of k.  相似文献   

19.
Within this paper we investigate the Bernoulli model for random secondary structures of ribonucleic acid (RNA) molecules. Assuming that two random bases can form a hydrogen bond with probability p we prove asymptotic equivalents for the averaged number of hairpins and bulges, the averaged loop length, the expected order, the expected number of secondary structures of size n and order k and further parameters all depending on p. In this way we get an insight into the change of shape of a random structure during the process . Afterwards we compare the computed parameters for random structures in the Bernoulli model to the corresponding quantities for real existing secondary structures of large subunit rRNA molecules found in the database of Wuyts et al. That is how it becomes possible to identify those parameters which behave (almost) randomly and those which do not and thus should be considered as interesting, e.g., with respect to the biological functions or the algorithmic prediction of RNA secondary structures.  相似文献   

20.
We describe the first dynamic programming algorithm that computes the expected degree for the network, or graph G = (V, E) of all secondary structures of a given RNA sequence a = a 1, …, a n. Here, the nodes V correspond to all secondary structures of a, while an edge exists between nodes s, t if the secondary structure t can be obtained from s by adding, removing or shifting a base pair. Since secondary structure kinetics programs implement the Gillespie algorithm, which simulates a random walk on the network of secondary structures, the expected network degree may provide a better understanding of kinetics of RNA folding when allowing defect diffusion, helix zippering, and related conformation transformations. We determine the correlation between expected network degree, contact order, conformational entropy, and expected number of native contacts for a benchmarking dataset of RNAs. Source code is available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号