首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new approach is proposed for determining common RNA secondary structures within a set of homologous RNAs. The approach is a combination of phylogenetic and thermodynamic methods which is based on the prediction of optimal and suboptimal secondary structures, topological similarity searches and phylogenetic comparative analysis. The optimal and suboptimal RNA secondary structures are predicted by energy minimization. Structural comparison of the predicted RNA secondary structures is used to find conserved structures that are topologically similar in all these homologous RNAs. The validity of the conserved structural elements found is then checked by phylogenetic comparison of the sequences. This procedure is used to predict common structures of ribonuclease P (RNAase P) RNAs.  相似文献   

2.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

3.
RNA secondary structure is often predicted from sequence by free energy minimization. Over the past two years, advances have been made in the estimation of folding free energy change, the mapping of secondary structure and the implementation of computer programs for structure prediction. The trends in computer program development are: efficient use of experimental mapping of structures to constrain structure prediction; use of statistical mechanics to improve the fidelity of structure prediction; inclusion of pseudoknots in secondary structure prediction; and use of two or more homologous sequences to find a common structure.  相似文献   

4.
The complete nucleotide sequence of the major species of cytoplasmic 5S ribosomal RNA of Euglena gracilis has been determined. The sequence is: 5' GGCGUACGGCCAUACUACCGGGAAUACACCUGAACCCGUUCGAUUUCAGAAGUUAAGCCUGGUCAGGCCCAGUUAGUAC UGAGGUGGGCGACCACUUGGGAACACUGGGUGCUGUACGCUUOH3'. This sequence can be fitted to the secondary structural models recently proposed for eukaryotic 5S ribosomal RNAs (1,2). Several properties of the Euglena 5S RNA reveal a close phylogenetic relationship between this organism and the protozoa. Large stretches of nucleotide sequences in predominantly single-stranded regions of the RNA are homologous to that of the trypanosomatid protozoan Crithidia fasticulata. There is less homology when compared to the RNAs of the green alga Chlorella or to the RNAs of the higher plants. The sequence AGAAC near position 40 that is common to plant 5S RNAs is CGAUU in both Euglena and Crithidia. The Euglena 5S RNA has secondary structural features at positions 79-99 similar to that of the protozoa and different from that of the plants. The conclusions drawn from comparative studies of cytochrome c structures which indicate a close phylogenetic relatedness between Euglena and the trypanosomatid protozoa are supported by the comparative data with 5S ribosomal RNAs.  相似文献   

5.
Phylogenetic analysis and evolution of RNase P RNA in proteobacteria.   总被引:11,自引:0,他引:11       下载免费PDF全文
The secondary structures of the eubacterial RNase P RNAs are being elucidated by a phylogenetic comparative approach. Sequences of genes encoding RNase P RNA from each of the recognized subgroups (alpha, beta, gamma, and delta) of the proteobacteria have now been determined. These sequences allow the refinement, to nearly the base pair level, of the phylogenetic model for RNase P RNA secondary structure. Evolutionary change among the RNase P RNAs was found to occur primarily in four discrete structural domains that are peripheral to a highly conserved core structure. The new sequences were used to examine critically the proposed similarity (C. Guerrier-Takada, N. Lumelsky, and S. Altman, Science 246:1578-1584, 1989) between a portion of RNase P RNA and the "exit site" of the 23S rRNA of Escherichia coli. Phylogenetic comparisons indicate that these sequences are not homologous and that any similarity in the structures is, at best, tenuous.  相似文献   

6.
A secondary structure model for 18S rRNA of peloridiids, relict insects with a present-day circumantarctic distribution, is constructed using comparative sequence analysis, thermodynamic folding, a consensus method using 18S rRNA models of other taxa, and support of helices based on compensatory substitutions. Results show that probable in vivo configuration of 18S rRNA is not predictable using current free-energy models to fold the entire molecule concurrently. This suggests that refinements in free-energy minimization algorithms are needed. Molecular phylogenetic datasets were created using 18S rRNA nucleotide alignments produced by CLUSTAL and rigorous interpretation of homologous position based on certain secondary substructures. Phylogenetic analysis of a hemipteran data matrix of 18S rDNA sequences placed peloridiids sister to Heteroptera. Resolution of affiliations between the three main euhemipteran lineages was unresolved. The peloridiid 18S RNA model presented here provides the most accurate template to date for aligning homologous nucleotides of hemipteran taxa. Using folded 18S rRNA to infer homology of character as morpho-molecular structures or nucleotides and scoring particular sites or substructures is discussed.  相似文献   

7.
Comparative sequence analysis addresses the problem of RNA folding and RNA structural diversity, and is responsible for determining the folding of many RNA molecules, including 5S, 16S, and 23S rRNAs, tRNA, RNAse P RNA, and Group I and II introns. Initially this method was utilized to fold these sequences into their secondary structures. More recently, this method has revealed numerous tertiary correlations, elucidating novel RNA structural motifs, several of which have been experimentally tested and verified, substantiating the general application of this approach. As successful as the comparative methods have been in elucidating higher-order structure, it is clear that additional structure constraints remain to be found. Deciphering such constraints requires more sensitive and rigorous protocols, in addition to RNA sequence datasets that contain additional phylogenetic diversity and an overall increase in the number of sequences. Various RNA databases, including the tRNA and rRNA sequence datasets, continue to grow in number as well as diversity. Described herein is the development of more rigorous comparative analysis protocols. Our initial development and applications on different RNA datasets have been very encouraging. Such analyses on tRNA, 16S and 23S rRNA are substantiating previously proposed associations and are now beginning to reveal additional constraints on these molecules. A subset of these involve several positions that correlate simultaneously with one another, implying units larger than a basepair can be under a phylogenetic constraint.  相似文献   

8.
We propose a new method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of thermodynamic structure prediction and phylogenetic comparison. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of approximately 10 sequences, efficiently exploiting the information contained in the sequence variability. The procedure constructs a prediction only for those parts of sequences that are consistent with a single conserved structure. Our implementation produces reasonable consensus structures without user interference. As an example we have analysed the complete HIV-1 and hepatitis C virus (HCV) genomes as well as the small segment of hantavirus. Our method confirms the known structures in HIV-1 and predicts previously unknown conserved RNA secondary structures in HCV.  相似文献   

9.

Background  

We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm.  相似文献   

10.
11.
12.
Computer-aided prediction of RNA secondary structures.   总被引:8,自引:5,他引:3       下载免费PDF全文
A brief survey of computer algorithms that have been developed to generate predictions of the secondary structures of RNA molecules is presented. Two particular methods are described in some detail. The first utilizes a thermodynamic energy minimization algorithm that takes into account the likelihood that short-range folding tends to be favored over long-range interactions. The second utilizes an interactive computer graphic modelling algorithm that enables the user to consider thermodynamic criteria as well as structural data obtained by nuclease susceptibility, chemical reactivity and phylogenetic studies. Examples of structures for prokaryotic 16S and 23S ribosomal RNAs, several eukaryotic 5S ribosomal RNAs and rabbit beta-globin messenger RNA are presented as case studies in order to describe the two techniques. Anm argument is made for integrating the two approaches presented in this paper, enabling the user to generate proposed structures using thermodynamic criteria, allowing interactive refinement of these structures through the application of experimentally derived data.  相似文献   

13.
Larsson SL  Nygård O 《Biochemistry》2001,40(10):3222-3231
The expansion segments in eukaryotic ribosomal RNAs are additional RNA sequences not found in the RNA core common to both prokaryotes and eukaryotes. These regions show large species-dependent variations in sequence and size. This makes it difficult to create secondary structure models for the expansion segments exclusively based on phylogenetic sequence comparison. Here we have used a combination of experimental data and computational methods to generate secondary structure models for expansion segment 15 in 28S rRNA in mice, rats, and rabbits. The experimental data were collected using the structure sensitive reagents DMS, CMCT, kethoxal, micrococcal nuclease, RNase T(1), RNase CL3, RNase V(1), and lead(II) acetate. ES15 was folded with the computer program RNAStructure 3.5 using modification data and phylogenetic similarities between different ES15 sequences. This program uses energy minimization to find the most stable secondary structure of an RNA sequence. The presented secondary structure models include several common structural motifs, but they also have characteristics unique to each organism. Overall, the secondary structure models showed indications of an energetically stable but dynamic structure, easily accessible from the solution by the modification reagents, suggesting that the expansion segment is located on the ribosomal surface.  相似文献   

14.
Variable regions within ribosomal RNAs frequently vary in length as a result of incorporating products of slippage. This makes constructing secondary structure models problematic because base homology is difficult or impossible to establish between species. Here, we model such a region by comparing the results of the MFOLD suboptimal folding algorithm for different species to identify conserved structures. Based on the reconstruction of base change on a phylogenetic tree of the species and comparison against null models of character change, we devise a statistical analysis to assess support of these structures from compensatory and semi-compensatory (i.e. G.C to G.U or A.U to G.U) mutations. As a model system we have used variable region V4 from cicindelid (tiger beetle) small subunit ribosomal RNAs (SSU rRNAs). This consists of a mixture of conserved and highly variable subregions and has been subject to extensive comparative analysis in the past. The model that results is similar to a previously described model of this variable region derived from a different set of species and contains a novel structure in the central, highly variable part. The method we describe may be useful in modelling other RNA regions that are subject to slippage.  相似文献   

15.
Abstract

The process of designing novel RNA sequences by inverse RNA folding, available in tools such as RNAinverse and InfoRNA, can be thought of as a reconstruction of RNAs from secondary structure. In this reconstruction problem, no physical measures are considered as additional constraints that are independent of structure, aside of the goal to reach the same secondary structure as the input using energy minimization methods. An extension of the reconstruction problem can be formulated since in many cases of natural RNAs, it is desired to analyze the sequence and structure of RNA molecules using various physical quantifiable measures. In prior works that used secondary structure predictions, it has been shown that natural RNAs differ significantly from random RNAs in some of these measures. Thus, we relax the problem of reconstructing RNAs from secondary structure into reconstructing RNAs from shapes, and in turn incorporate physical quantities as constraints. This allows for the design of novel RNA sequences by inverse folding while considering various physical quantities of interest such as thermodynamic stability, mutational robustness, and linguistic complexity. At the expense of altering the number of nucleotides in stems and loops, for example, physical measures can be taken into account. We use evolutionary computation for the new reconstruction problem and illustrate the procedure on various natural RNAs.  相似文献   

16.
Evolution of secondary structure in the family of 7SL-like RNAs   总被引:8,自引:0,他引:8  
Primate and rodent genomes are populated with hundreds of thousands copies of Alu and B1 elements dispersed by retroposition, i.e., by genomic reintegration of their reverse transcribed RNAs. These, as well as primate BC200 and rodent 4.5S RNAs, are ancestrally related to the terminal portions of 7SL RNA sequence. The secondary structure of 7SL RNA (an integral component of the signal recognition particle) is conserved from prokaryotes to distant eukaryotic species. Yet only in primates and rodents did this molecule give rise to retroposing Alu and B1 RNAs and to apparently functional BC200 and 4.5S RNAs. To understand this transition and the underlying molecular events, we examined, by comparative analysis, the evolution of RNA structure in this family of molecules derived from 7SL RNA.RNA sequences of different simian (mostly human) and prosimian Alu subfamilies as well as rodent B1 repeats were derived from their genomic consensus sequences taken from the literature and our unpublished results (prosimian and New World Monkey). RNA secondary structures were determined by enzymatic studies (new data on 4.5S RNA are presented) and/or energy minimization analyses followed by phylogenetic comparison. Although, with the exception of 4.5S RNA, all 7SL-derived RNA species maintain the cruciform structure of their progenitor, the details of 7SL RNA folding domains are modified to a different extent in various RNA groups. Novel motifs found in retropositionally active RNAs are conserved among Alu and B1 subfamilies in different genomes. In RNAs that do not proliferate by retroposition these motifs are modified further. This indicates structural adaptation of 7SL-like RNA molecules to novel functions, presumably mediated by specific interactions with proteins; these functions were either useful for the host or served the selfish propagation of RNA templates within the host genome.Abbreviations FAM fossil Alu element - FLAM free left Alu monomer - FRAM free right Alu monomer - L-Alu left Alu subunit - R-Alu right Alu subunit Correspondence to: D. LabudaDedicated to Dr. Robert Cedergren on the occasion of his 25th anniversary at the University of Montreal  相似文献   

17.
The process of designing novel RNA sequences by inverse RNA folding, available in tools such as RNAinverse and InfoRNA, can be thought of as a reconstruction of RNAs from secondary structure. In this reconstruction problem, no physical measures are considered as additional constraints that are independent of structure, aside of the goal to reach the same secondary structure as the input using energy minimization methods. An extension of the reconstruction problem can be formulated since in many cases of natural RNAs, it is desired to analyze the sequence and structure of RNA molecules using various physical quantifiable measures. In prior works that used secondary structure predictions, it has been shown that natural RNAs differ significantly from random RNAs in some of these measures. Thus, we relax the problem of reconstructing RNAs from secondary structure into reconstructing RNAs from shapes, and in turn incorporate physical quantities as constraints. This allows for the design of novel RNA sequences by inverse folding while considering various physical quantities of interest such as thermodynamic stability, mutational robustness, and linguistic complexity. At the expense of altering the number of nucleotides in stems and loops, for example, physical measures can be taken into account. We use evolutionary computation for the new reconstruction problem and illustrate the procedure on various natural RNAs.  相似文献   

18.
MOTIVATION: Most non-coding RNAs are characterized by a specific secondary and tertiary structure that determines their function. Here, we investigate the folding energy of the secondary structure of non-coding RNA sequences, such as microRNA precursors, transfer RNAs and ribosomal RNAs in several eukaryotic taxa. Statistical biases are assessed by a randomization test, in which the predicted minimum free energy of folding is compared with values obtained for structures inferred from randomly shuffling the original sequences. RESULTS: In contrast with transfer RNAs and ribosomal RNAs, the majority of the microRNA sequences clearly exhibit a folding free energy that is considerably lower than that for shuffled sequences, indicating a high tendency in the sequence towards a stable secondary structure. A possible usage of this statistical test in the framework of the detection of genuine miRNA sequences is discussed.  相似文献   

19.
Y RNAs are small 'cytoplasmic' RNAs which are components of the Ro ribonucleoprotein (RNP) complex. The core of this complex, which is found in the cell nuclei of higher eukaryotes as well as the cytoplasm, is composed of a complex between the 60 kDa Ro protein and Y RNAs. Human cells contain four distinct Y RNAs (Y1, Y3, Y4 and Y5), while other eukaryotes contain a variable number of Y RNA homologues. When detected in a particular species, the Ro RNP has been present in every cell type within that particular organism. This characteristic, along with its high conservation among vertebrates, suggests an important function for Ro RNP in cellular metabolism; however, this function has not yet been definitively elucidated. In order to identify conserved features of Y RNA sequences and structures which may be directly involved in Ro RNP function, a phylogenetic comparative analysis of Y RNAs has been performed. Sequences of Y RNA homologues from five vertebrate species have been obtained and, together with previously published Y RNA sequences, used to predict Y RNA secondary structures. A novel RNA secondary structure comparison algorithm, the suboptimal RNA analysis program, has been developed and used in conjunction with available algorithms to find phylogenetically conserved secondary structure models for YI, Y3 and Y4 RNAs. Short, conserved sequences within the Y RNAs have been identified and are invariant among vertebrates, consistent with a direct role for Y RNAs in Ro function. A subset of these are located wholly or partially in looped regions in the Y3 and Y4 RNA predicted model structures, in accord with the possibility that these Y RNAs base pair with other cellular nucleic acids or are sites of interaction between the Ro RNP and other macromolecules.  相似文献   

20.

Background

A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1.

Results

The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases.

Conclusion

Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号