首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 382 毫秒
1.
Understanding the structural repertoire of RNA is crucial for RNA genomics research. Yet current methods for finding novel RNAs are limited to small or known RNA families. To expand known RNA structural motifs, we develop a two-dimensional graphical representation approach for describing and estimating the size of RNA’s secondary structural repertoire, including naturally occurring and other possible RNA motifs. We employ tree graphs to describe RNA tree motifs and more general (dual) graphs to describe both RNA tree and pseudoknot motifs. Our estimates of RNA’s structural space are vastly smaller than the nucleotide sequence space, suggesting a new avenue for finding novel RNAs. Specifically our survey shows that known RNA trees and pseudoknots represent only a small subset of all possible motifs, implying that some of the ‘missing’ motifs may represent novel RNAs. To help pinpoint RNA-like motifs, we show that the motifs of existing functional RNAs are clustered in a narrow range of topological characteristics. We also illustrate the applications of our approach to the design of novel RNAs and automated comparison of RNA structures; we report several occurrences of RNA motifs within larger RNAs. Thus, our graph theory approach to RNA structures has implications for RNA genomics, structure analysis and design.  相似文献   

2.
BackgroundWe re-evaluate our RNA-As-Graphs clustering approach, using our expanded graph library and new RNA structures, to identify potential RNA-like topologies for design. Our coarse-grained approach represents RNA secondary structures as tree and dual graphs, with vertices and edges corresponding to RNA helices and loops. The graph theoretical framework facilitates graph enumeration, partitioning, and clustering approaches to study RNA structure and its applications.MethodsClustering graph topologies based on features derived from graph Laplacian matrices and known RNA structures allows us to classify topologies into ‘existing’ or hypothetical, and the latter into, ‘RNA-like’ or ‘non RNA-like’ topologies. Here we update our list of existing tree graph topologies and RAG-3D database of atomic fragments to include newly determined RNA structures. We then use linear and quadratic regression, optionally with dimensionality reduction, to derive graph features and apply several clustering algorithms on our tree-graph library and recently expanded dual-graph library to classify them into the three groups.ResultsThe unsupervised PAM and K-means clustering approaches correctly classify 72–77% of all existing graph topologies and 75–82% of newly added ones as RNA-like. For supervised k-NN clustering, the cross-validation accuracy ranges from 57 to 81%.ConclusionsUsing linear regression with unsupervised clustering, or quadratic regression with supervised clustering, provides better accuracies than supervised/linear clustering. All accuracies are better than random, especially for newly added existing topologies, thus lending credibility to our approach.General significanceOur updated RAG-3D database and motif classification by clustering present new RNA substructures and RNA-like motifs as novel design candidates.  相似文献   

3.
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.  相似文献   

4.
Modular architecture is a hallmark of RNA structures, implying structural, and possibly functional, similarity among existing RNAs. To systematically delineate the existence of smaller topologies within larger structures, we develop and apply an efficient RNA secondary structure comparison algorithm using a newly developed two-dimensional RNA graphical representation. Our survey of similarity among 14 pseudoknots and subtopologies within ribosomal RNAs (rRNAs) uncovers eight pairs of structurally related pseudoknots with non-random sequence matches and reveals modular units in rRNAs. Significantly, three structurally related pseudoknot pairs have functional similarities not previously known: one pair involves the 3′ end of brome mosaic virus genomic RNA (PKB134) and the alternative hammerhead ribozyme pseudoknot (PKB173), both of which are replicase templates for viral RNA replication; the second pair involves structural elements for translation initiation and ribosome recruitment found in the viral internal ribosome entry site (PKB223) and the V4 domain of 18S rRNA (PKB205); the third pair involves 18S rRNA (PKB205) and viral tRNA-like pseudoknot (PKB134), which probably recruits ribosomes via structural mimicry and base complementarity. Additionally, we quantify the modularity of 16S and 23S rRNAs by showing that RNA motifs can be constructed from at least 210 building blocks. Interestingly, we find that the 5S rRNA and two tree modules within 16S and 23S rRNAs have similar topologies and tertiary shapes. These modules can be applied to design novel RNA motifs via build-up-like procedures for constructing sequences and folds.  相似文献   

5.
In vitro selection of functional RNAs from large random sequence pools has led to the identification of many ligand-binding and catalytic RNAs. However, the structural diversity in random pools is not well understood. Such an understanding is a prerequisite for designing sequence pools to increase the probability of finding complex functional RNA by in vitro selection techniques. Toward this goal, we have generated by computer five random pools of RNA sequences of length up to 100 nt to mimic experiments and characterized the distribution of associated secondary structural motifs using sets of possible RNA tree structures derived from graph theory techniques. Our results show that such random pools heavily favor simple topological structures: For example, linear stem-loop and low-branching motifs are favored rather than complex structures with high-order junctions, as confirmed by known aptamers. Moreover, we quantify the rise of structural complexity with sequence length and report the dominant class of tree motifs (characterized by vertex number) for each pool. These analyses show not only that random pools do not lead to a uniform distribution of possible RNA secondary topologies; they point to avenues for designing pools with specific simple and complex structures in equal abundance in the goal of broadening the range of functional RNAs discovered by in vitro selection. Specifically, the optimal RNA sequence pool length to identify a structure with x stems is 20x.  相似文献   

6.
RAG: RNA-As-Graphs database--concepts, analysis, and features   总被引:3,自引:0,他引:3  
MOTIVATION: Understanding RNA's structural diversity is vital for identifying novel RNA structures and pursuing RNA genomics initiatives. By classifying RNA secondary motifs based on correlations between conserved RNA secondary structures and functional properties, we offer an avenue for predicting novel motifs. Although several RNA databases exist, no comprehensive schemes are available for cataloguing the range and diversity of RNA's structural repertoire. RESULTS: Our RNA-As-Graphs (RAG) database describes and ranks all mathematically possible (including existing and candidate) RNA secondary motifs on the basis of graphical enumeration techniques. We represent RNA secondary structures as two-dimensional graphs (networks), specifying the connectivity between RNA secondary structural elements, such as loops, bulges, stems and junctions. We archive RNA tree motifs as 'tree graphs' and other RNAs, including pseudoknots, as general 'dual graphs'. All RNA motifs are catalogued by graph vertex number (a measure of sequence length) and ranked by topological complexity. The RAG inventory immediately suggests candidates for novel RNA motifs, either naturally occurring or synthetic, and thereby might stimulate the prediction and design of novel RNA motifs. AVAILABILITY: The database is accessible on the web at http://monod.biomath.nyu.edu/rna  相似文献   

7.
RNA tertiary motifs play an important role in RNA folding and biochemical functions. To help interpret the complex organization of RNA tertiary interactions, we comprehensively analyze a data set of 54 high-resolution RNA crystal structures for motif occurrence and correlations. Specifically, we search seven recognized categories of RNA tertiary motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin, tRNA D-loop/T-loop, and tetraloop-tetraloop receptor) by various computer programs. For the nonredundant RNA data set, we find 613 RNA tertiary interactions, most of which occur in the 16S and 23S rRNAs. An analysis of these motifs reveals the diversity and variety of A-minor motif interactions and the various possible loop-loop receptor interactions that expand upon the tetraloop-tetraloop receptor. Correlations between motifs, such as pseudoknot or coaxial helix with A-minor, reveal higher-order patterns. These findings may ultimately help define tertiary structure restraints for RNA tertiary structure prediction. A complete annotation of the RNA diagrams for our data set is available at http://www.biomath.nyu.edu/motifs/.  相似文献   

8.
We present a novel topological classification of RNA secondary structures with pseudoknots. It is based on the topological genus of the circular diagram associated to the RNA base-pair structure. The genus is a positive integer number whose value quantifies the topological complexity of the folded RNA structure. In such a representation, planar diagrams correspond to pure RNA secondary structures and have zero genus, whereas non-planar diagrams correspond to pseudoknotted structures and have higher genus. The topological genus allows for the definition of topological folding motifs, similar in spirit to those introduced and commonly used in protein folding. We analyze real RNA structures from the databases Worldwide Protein Data Bank and Pseudobase and classify them according to their topological genus. For simplicity, we limit our analysis by considering only Watson-Crick complementary base pairs and G-U wobble base pairs. We compare the results of our statistical survey with existing theoretical and numerical models. We also discuss possible applications of this classification and show how it can be used for identifying new RNA structural motifs.  相似文献   

9.
The various motifs of RNA molecules are closely related to their structural and functional properties. To better understand the nature and distributions of such structural motifs (i.e., paired and unpaired bases in stems, junctions, hairpin loops, bulges, and internal loops) and uncover characteristic features, we analyze the large 16S and 23S ribosomal RNAs of Escherichia coli. We find that the paired and unpaired bases in structural motifs have characteristic distribution shapes and ranges; for example, the frequency distribution of paired bases in stems declines linearly with the number of bases, whereas that for unpaired bases in junctions has a pronounced peak. Significantly, our survey reveals that the ratio of total (over the entire molecule) unpaired to paired bases (0.75) and the fraction of bases in stems (0.6), junctions (0.16), hairpin loops (0.12), and bulges/internal loops (0.12) are shared by 16S and 23S ribosomal RNAs, suggesting that natural RNAs may maintain certain proportions of bases in various motifs to ensure structural integrity. These findings may help in the design of novel RNAs and in the search (via constraints) for RNA-coding motifs in genomes, problems of intense current focus.  相似文献   

10.
11.
MOTIVATION: Searching genomes for non-coding RNAs (ncRNAs) by their secondary structure has become an important goal for bioinformatics. For pseudoknot-free structures, ncRNA search can be effective based on the covariance model and CYK-type dynamic programming. However, the computational difficulty in aligning an RNA sequence to a pseudoknot has prohibited fast and accurate search of arbitrary RNA structures. Our previous work introduced a graph model for RNA pseudoknots and proposed to solve the structure-sequence alignment by graph optimization. Given k candidate regions in the target sequence for each of the n stems in the structure, we could compute a best alignment in time O(k(t)n) based upon a tree width t decomposition of the structure graph. However, to implement this method to programs that can routinely perform fast yet accurate RNA pseudoknot searches, we need novel heuristics to ensure that, without degrading the accuracy, only a small number of stem candidates need to be examined and a tree decomposition of a small tree width can always be found for the structure graph. RESULTS: The current work builds on the previous one with newly developed preprocessing algorithms to reduce the values for parameters k and t and to implement the search method into a practical program, called RNATOPS, for RNA pseudoknot search. In particular, we introduce techniques, based on probabilistic profiling and distance penalty functions, which can identify for every stem just a small number k (e.g. k 相似文献   

12.
13.
Mining frequent stem patterns from unaligned RNA sequences   总被引:1,自引:0,他引:1  
MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request.  相似文献   

14.
MOTIVATION: RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS: We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY: The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html  相似文献   

15.
Evolution of secondary structure in the family of 7SL-like RNAs   总被引:8,自引:0,他引:8  
Primate and rodent genomes are populated with hundreds of thousands copies of Alu and B1 elements dispersed by retroposition, i.e., by genomic reintegration of their reverse transcribed RNAs. These, as well as primate BC200 and rodent 4.5S RNAs, are ancestrally related to the terminal portions of 7SL RNA sequence. The secondary structure of 7SL RNA (an integral component of the signal recognition particle) is conserved from prokaryotes to distant eukaryotic species. Yet only in primates and rodents did this molecule give rise to retroposing Alu and B1 RNAs and to apparently functional BC200 and 4.5S RNAs. To understand this transition and the underlying molecular events, we examined, by comparative analysis, the evolution of RNA structure in this family of molecules derived from 7SL RNA.RNA sequences of different simian (mostly human) and prosimian Alu subfamilies as well as rodent B1 repeats were derived from their genomic consensus sequences taken from the literature and our unpublished results (prosimian and New World Monkey). RNA secondary structures were determined by enzymatic studies (new data on 4.5S RNA are presented) and/or energy minimization analyses followed by phylogenetic comparison. Although, with the exception of 4.5S RNA, all 7SL-derived RNA species maintain the cruciform structure of their progenitor, the details of 7SL RNA folding domains are modified to a different extent in various RNA groups. Novel motifs found in retropositionally active RNAs are conserved among Alu and B1 subfamilies in different genomes. In RNAs that do not proliferate by retroposition these motifs are modified further. This indicates structural adaptation of 7SL-like RNA molecules to novel functions, presumably mediated by specific interactions with proteins; these functions were either useful for the host or served the selfish propagation of RNA templates within the host genome.Abbreviations FAM fossil Alu element - FLAM free left Alu monomer - FRAM free right Alu monomer - L-Alu left Alu subunit - R-Alu right Alu subunit Correspondence to: D. LabudaDedicated to Dr. Robert Cedergren on the occasion of his 25th anniversary at the University of Montreal  相似文献   

16.
Recent structural and functional characterization of the pseudoknot in the Saccharomyces cerevisiae telomerase RNA (TLC1) has demonstrated that tertiary structure is present, similar to that previously described for the human and Kluyveromyces lactis telomerase RNAs. In order to biophysically characterize the identified pseudoknot secondary and tertiary structures, UV-monitored thermal denaturation experiments, nuclear magnetic resonance spectroscopy, and native gel electrophoresis were used to investigate various potential conformations in the pseudoknot domain in vitro, in the absence of the telomerase protein. Here, we demonstrate that alternative secondary structures are not mutually exclusive in the S. cerevisiae telomerase RNA, tertiary structure contributes 1.5 kcal mol(-1) to the stability of the pseudoknot (≈ half the stability observed for the human telomerase pseudoknot), and identify additional base pairs in the 3' pseudoknot stem near the helical junction. In addition, sequence conservation in an adjacent overlapping hairpin appears to prevent dimerization and alternative conformations in the context of the entire pseudoknot-containing region. Thus, this work provides a detailed in vitro characterization of the thermodynamic features of the S. cerevisiae TLC1 pseudoknot region for comparison with other telomerase RNA pseudoknots.  相似文献   

17.
A puzzling aspect of replication of bacteriophage Qbeta RNA has always been that replicase binds at an internal segment, the M-site, some 1450 nt away from the 3' end. Here, we report on the existence of a long-range pseudoknot, base-pairing eight nt in the loop of the 3' terminal hairpin to a single-stranded interdomain sequence located about 1200 nt upstream, close to the internal replicase binding site. Introduction of a single mismatch into this pseudoknot is sufficient to abolish replication, but the inhibition is fully reversed by a second-site substitution that restores the pairing. The pseudoknot is part of an elaborate structure that seems to hold the 3' end in a fixed position vis a vis the replicase binding site. Our results imply that the shape of the RNA confers the functonality. We discuss the possible relevance of our findings for replication of other viral RNAs.  相似文献   

18.
Hu YJ 《Nucleic acids research》2002,30(17):3886-3893
Given a set of homologous or functionally related RNA sequences, the consensus motifs may represent the binding sites of RNA regulatory proteins. Unlike DNA motifs, RNA motifs are more conserved in structures than in sequences. Knowing the structural motifs can help us gain a deeper insight of the regulation activities. There have been various studies of RNA secondary structure prediction, but most of them are not focused on finding motifs from sets of functionally related sequences. Although recent research shows some new approaches to RNA motif finding, they are limited to finding relatively simple structures, e.g. stem-loops. In this paper, we propose a novel genetic programming approach to RNA secondary structure prediction. It is capable of finding more complex structures than stem-loops. To demonstrate the performance of our new approach as well as to keep the consistency of our comparative study, we first tested it on the same data sets previously used to verify the current prediction systems. To show the flexibility of our new approach, we also tested it on a data set that contains pseudoknot motifs which most current systems cannot identify. A web-based user interface of the prediction system is set up at http://bioinfo. cis.nctu.edu.tw/service/gprm/.  相似文献   

19.
Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号