首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a method for detecting more stable and significantfolding regions relative to others in the sequence. The algorithmis based on the calculation of the lowest free energy of RNAsecondary structures and Monte Carlo simulation. For any givenRNA segment, the stability and statistical significance of RNAfolding are assessed by two measures: the stability score andthe significance score. The stability score measures the degreeof thermodynamic stability of the segment between all possiblebiological segments in the RNA sequence. The significance scorecharacterizes the specific arrangement of the nucleotides inthe segment that could imply a structural role for the sequenceinformation. Using these two measures, we are able to detecta series of distinct folding regions where highly stable andstatistically significant secondary structures occur in humanimmunodeficiency virus (HIV) and simian immunodeficiency virus(SIV) sequences. Received on April 4, 1990; accepted on October 2, 1990  相似文献   

2.
Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.  相似文献   

3.
A linear segment in which a number of pairs of intervals of equal length are identified as potential stems is the subject of a folding problem analogous to inference of RNA secondary structure. A quantity of free energy (or equivalently, energy per unit length) is associated with each stem, and the various types of loops are assigned energy costs as a function of their lengths. Inference of stable structures can then be carried out in the same way as in RNA folding. More important, perturbation of stem lengths and energy densities (modelling various mutational processes affecting nucleotide sequences) allows the delineation of domains of stability of various foldings, through the explicit calculation of their boundaries, in a low-dimensional parameter space.  相似文献   

4.
MOTIVATION: Recently novel classes of functional RNAs, most prominently the miRNAs have been discovered, strongly suggesting that further types of functional RNAs are still hidden in the recently completed genomic DNA sequences. Only few techniques are known, however, to survey genomes for such RNA genes. When sufficiently similar sequences are not available for comparative approaches the only known remedy is to search directly for structural features. RESULTS: We present here efficient algorithms for computing locally stable RNA structures at genome-wide scales. Both the minimum energy structure and the complete matrix of base pairing probabilities can be computed in theta(N x L2) time and theta(N + L2) memory in terms of the length N of the genome and the size L of the largest secondary structure motifs of interest. In practice, the 100 Mb of the complete genome of Caenorhabditis elegans can be folded within about half a day on a modern PC with a search depth of L = 100. This is sufficient example for a survey for miRNAs. AVAILABILITY: The software described in this contribution will be available for download at http://www.tbi.univie.ac.at/~ivo/RNA/ as part of the Vienna RNA Package.  相似文献   

5.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

6.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

7.
The stability of potential RNA stem-loop structures in human immunodeficiency virus isolates, HTLV-III and ARV, has been calculated, and the relevance to the local significant secondary structures in the sequence has been tested statistically using a Monte Carlo simulation method. Potentially significant structures exist in the 5'non-coding region, the boundary regions between the protein coding frames, and the 3' non-coding region. The locally optimal secondary structure occurring in the 5' terminal region has been assessed using different overlapping segment sizes and the Monte Carlo method. The results show that the most favorable structure for the 5' mRNA leader sequence of HIV has two stem-loops folded at nucleotides 5-104 in the R region (stem-loop I, 5-54 and stem-loop II, 58-104). A large fluctuation of segment score of the local optimal secondary structure also occurs in the boundary between the exterior glycosylated protein or outer membrane protein and transmembrane protein coding region. This finding is surprising since no RNA signals or RNA processing are expected to occur at this site. In addition, regions of the genome predicted to have significantly more open structure at the RNA level correlate closely with hypervariable sites found in these viral genomes. The possible importance of local secondary structure to the biological function of the human immunodeficiency virus genome is discussed.  相似文献   

8.
MOTIVATION: Several results in the literature suggest that biologically interesting RNAs have secondary structures that are more stable than expected by chance. Based on these observations, we developed a scanning algorithm for detecting noncoding RNA genes in genome sequences, using a fully probabilistic version of the Zuker minimum-energy folding algorithm. RESULTS: Preliminary results were encouraging, but certain anomalies led us to do a carefully controlled investigation of this class of methods. Ultimately, our results argue that for the probabilistic model there is indeed a statistical effect, but it comes mostly from local base-composition bias and not from RNA secondary structure. For the thermodynamic implementation (which evaluates statistical significance by doing Monte Carlo shuffling in fixed-length sequence windows, thus eliminating the base-composition effect) the signals for noncoding RNAs are still usually indistinguishable from noise, especially when certain statistical artifacts resulting from local base-composition inhomogeneity are taken into account. We conclude that although a distinct, stable secondary structure is undoubtedly important in most noncoding RNAs, the stability of most noncoding RNA secondary structures is not sufficiently different from the predicted stability of a random sequence to be useful as a general genefinding approach.  相似文献   

9.

Background  

The secondary structure and complexity of mRNA influences its accessibility to regulatory molecules (proteins, micro-RNAs), its stability and its level of expression. The mobile elements of the RNA sequence, the wobble bases, are expected to regulate the formation of structures encompassing coding sequences.  相似文献   

10.
In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.  相似文献   

11.
12.
S Y Le  J H Chen    J V Maizel 《Nucleic acids research》1989,17(15):6143-6152
RNA stem-loop structures situated just 3' to the frameshift sites of the retroviral gag-pol or gag-pro and pro-pol regions may make important contributions to frame-shifting in retroviruses. In this study, the thermodynamic stability and statistical significance of such secondary structural features relative to others in the sequence have been assessed using a newly developed method that combines calculations of the lowest free energy of formation of RNA secondary structures and the Monte Carlo simulations. Our results show that stem-loop structures situated just 3' to the frameshift sites are both highly stable and statistically significant relative to others in the gag-pol or gag-pro and pro-pol junction domains (both 300 nucleotides upstream and downstream from the possible frameshift sites are included) of Rous sarcoma virus (RSV), human immunodeficiency virus (HIV-1), bovine leukemia virus (BLV), human T-cell leukemia virus type II (HTLV-II), and mouse mammary tumor virus (MMTV). No other more stable, or significant folding regions are predicted in these domains.  相似文献   

13.
LNA (locked nucleic acids, i.e. oligonucleotides with a methyl bridge between the 2′ oxygen and 4′ carbon of ribose) and 2,6-diaminopurine were incorporated into 2′-O-methyl RNA pentamer and hexamer probes to make a microarray that binds unpaired RNA approximately isoenergetically. That is, binding is roughly independent of target sequence if target is unfolded. The isoenergetic binding and short probe length simplify interpretation of binding to a structured RNA to provide insight into target RNA secondary structure. Microarray binding and chemical mapping were used to probe the secondary structure of a 323 nt segment of the 5′ coding region of the R2 retrotransposon from Bombyx mori (R2Bm 5′ RNA). This R2Bm 5′ RNA orchestrates functioning of the R2 protein responsible for cleaving the second strand of DNA during insertion of the R2 sequence into the genome. The experimental results were used as constraints in a free energy minimization algorithm to provide an initial model for the secondary structure of the R2Bm 5′ RNA.  相似文献   

14.
《Seminars in Virology》1997,8(3):231-241
We have analyzed 11 picornaviral RNA genomic sequences by optimal and suboptimal minimum free energy folding algorithms. The systematic summation of all pairing partners for each base in the suboptimal structures (P-num value) shows a distinct pattern of alternating low and high values when plotted against the sequence length and indicate regions within each genome where secondary structure(s) are likely to play a significant role in virus biology. The individual folds augmented by data from phylogenetic folds, collectively suggest some revisions of existing models for 5′-untranslated regions of cardioviruses and enteroviruses that might better explain the functions of these regions.  相似文献   

15.
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.  相似文献   

16.
MOTIVATION: A k-point mutant of a given RNA sequence s = s(1), ..., s(n) is an RNA sequence s' = s'(1),..., s'(n) obtained by mutating exactly k-positions in s; i.e. Hamming distance between s and s' equals k. To understand the effect of pointwise mutation in RNA, we consider the distribution of energies of all secondary structures of k-point mutants of a given RNA sequence. RESULTS: Here we describe a novel algorithm to compute the mean and standard deviation of energies of all secondary structures of k-point mutants of a given RNA sequence. We then focus on the tail of the energy distribution and compute, using the algorithm AMSAG, the k-superoptimal structure; i.e. the secondary structure of a < or =k-point mutant having least free energy over all secondary structures of all k'-point mutants of a given RNA sequence, for k' < or = k. Evidence is presented that the k-superoptimal secondary structure is often closer, as measured by base pair distance and two additional distance measures, to the secondary structure derived by comparative sequence analysis than that derived by the Zuker minimum free energy structure of the original (wild type or unmutated) RNA.  相似文献   

17.
A crucial step in the determination of the three-dimensional native structures of RNA is the prediction of their secondary structures, which are stable independent of the tertiary fold. Accurate prediction of the secondary structure requires context-dependent estimates of the interaction parameters. We have exploited the growing database of natively folded RNA structures in the Protein Data Bank (PDB) to obtain stacking interaction parameters using a knowledge-based approach. Remarkably, the calculated values of the resulting statistical potentials (SPs) are in excellent agreement with the parameters determined using measurements in small oligonucleotides. We validate the SPs by predicting 74% of the base-pairs in a dataset of structures using the ViennaRNA package. Interestingly, this number is similar to that obtained using the measured thermodynamic parameters. We also tested the efficacy of the SP in predicting secondary structure by using gapless threading, which we advocate as an alternative method for rapidly predicting RNA structures. For RNA molecules with less than 700 nucleotides, about 70% of the native base-pairs are correctly predicted. As a further validation of the SPs we calculated Z-scores, which measure the relative stability of the native state with respect to a manifold of higher free energy states. The computed Z-scores agree with estimates made using calorimetric measurements for a few RNA molecules. Structural analysis was used to rationalize the success and failures of SP and experimentally determined parameters. First, from the near perfect linear relationship between the number of native base-pairs and sequence length, we show that nearly 46% of nucleotides are not in stacks. Second, by analyzing the suboptimal structures that are generated in gapless threading we show that the SPs and experimentally determined parameters are most successful in predicting stacks that end in hairpins. These results show that further improvement in secondary structure prediction requires reliable estimates of interaction parameters for loops, bulges, and stacks that do not end in hairpins.  相似文献   

18.
A method for assessing the statistical significance of RNA folding   总被引:9,自引:0,他引:9  
We have developed a statistical method that is designed for analyzing potential RNA folded substructures. The statistical significance of RNA folding is assessed by the segment score. The segment score is defined as the difference between the lowest free energy calculated for the real biological sequence and the mean of the lowest free energies from random permutations of the real segment sequence, divided by the standard deviation of the random sample. This procedure was applied to the well-studied Escherichia coli 16S rRNA and potato spindle tuber viroid (PSTV) RNA. The results showed that the predictions of the locally significant secondary structures in these two molecules are in accord with the universally conserved local secondary structure elements (Gutell, Weiser & Noller, 1985, Prog. Nucl. Acid Res. molec. Biol. 32, 155-216; Riesner & Gross, 1985, A. Rev. Biochem. 54, 531-564). In addition, a statistical analysis indicated that the lowest free energies of a random sample set follow an approximately normal distribution. A reasonable size for the random sample set was determined statistically. Moreover, the statistical evaluation has been carried out using three different sets of energy rules--two sets (Salser, 1977, Cold Spring Harb. Symp. Quant Biol. 42, 985-1002; Freier, Kierzek, Jaeger, Sugimoto, Caruthers, Neilson & Turner, 1986, Proc. natn. Acad. Sci. U.S.A. 83, 9373-9377) take into account stacking energies and are based on experimental data and their computational extension (Salser, 1977)--the third set is a simplistic "unitary matrix" approach, where any base-pair is given a weight of "minus one" and an unpaired based is "zero". The Freier energy rules usually yield the strongest indication of significant folding region. However, the results derived from paired comparisons test don't provide sufficient evidence for concluding that a different set of energy rules is effective in changing the segment score level for local stem-loop structures in the 16S rRNA.  相似文献   

19.
Christiansen ME  Znosko BM 《Biochemistry》2008,47(14):4329-4336
Because of the availability of an abundance of RNA sequence information, the ability to rapidly and accurately predict the secondary structure of RNA from sequence is becoming increasingly important. A common method for predicting RNA secondary structure from sequence is free energy minimization. Therefore, accurate free energy contributions for every RNA secondary structure motif are necessary for accurate secondary structure predictions. Tandem mismatches are prevalent in naturally occurring sequences and are biologically important. A common method for predicting the stability of a sequence asymmetric tandem mismatch relies on the stabilities of the two corresponding sequence symmetric tandem mismatches [Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. (1999) J. Mol. Biol. 288, 911-940]. To improve the prediction of sequence asymmetric tandem mismatches, the experimental thermodynamic parameters for the 22 previously unmeasured sequence symmetric tandem mismatches are reported. These new data, however, do not improve prediction of the free energy contributions of sequence asymmetric tandem mismatches. Therefore, a new model, independent of sequence symmetric tandem mismatch free energies, is proposed. This model consists of two penalties to account for destabilizing tandem mismatches, two bonuses to account for stabilizing tandem mismatches, and two penalties to account for A-U and G-U adjacent base pairs. This model improves the prediction of asymmetric tandem mismatch free energy contributions and is likely to improve the prediction of RNA secondary structure from sequence.  相似文献   

20.
The existence and functional importance of RNA secondary structure in the replication of positive-stranded RNA viruses is increasingly recognized. We applied several computational methods to detect RNA secondary structure in the coding region of hepatitis C virus (HCV), including thermodynamic prediction, calculation of free energy on folding, and a newly developed method to scan sequences for covariant sites and associated secondary structures using a parsimony-based algorithm. Each of the prediction methods provided evidence for complex RNA folding in the core- and NS5B-encoding regions of the genome. The positioning of covariant sites and associated predicted stem-loop structures coincided with thermodynamic predictions of RNA base pairing, and localized precisely in parts of the genome with marked suppression of variability at synonymous sites. Combined, there was evidence for a total of six evolutionarily conserved stem-loop structures in the NS5B-encoding region and two in the core gene. The virus most closely related to HCV, GB virus-B (GBV-B) also showed evidence for similar internal base pairing in its coding region, although predictions of secondary structures were limited by the absence of comparative sequence data for this virus. While the role(s) of stem-loops in the coding region of HCV and GBV-B are currently unknown, the structure predictions in this study could provide the starting point for functional investigations using recently developed self-replicating clones of HCV.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号