首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
MOTIVATION: Function derives from structure, therefore, there is need for methods to predict functional RNA structures. RESULTS: The Dynalign algorithm, which predicts the lowest free energy secondary structure common to two unaligned RNA sequences, is extended to the prediction of a set of low-energy structures. Dot plots can be drawn to show all base pairs in structures within an energy increment. Dynalign predicts more well-defined structures than structure prediction using a single sequence; in 5S rRNA sequences, the average number of base pairs in structures with energy within 20% of the lowest energy structure is 317 using Dynalign, but 569 using a single sequence. Structure prediction with Dynalign can also be constrained according to experiment or comparative analysis. The accuracy, measured as sensitivity and positive predictive value, of Dynalign is greater than predictions with a single sequence. AVAILABILITY: Dynalign can be downloaded at http://rna.urmc.rochester.edu  相似文献   

2.
3.
Prediction of common folding structures of homologous RNAs.   总被引:2,自引:2,他引:0       下载免费PDF全文
K Han  H J Kim 《Nucleic acids research》1993,21(5):1251-1257
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.  相似文献   

4.

Background

A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1.

Results

The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases.

Conclusion

Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.  相似文献   

5.
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.  相似文献   

6.

Background  

Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction.  相似文献   

7.
We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences.  相似文献   

8.
The bicoid (bcd) gene of Drosophila has played an important role in understanding the system of developmental regulatory genes that controls segmentation in the fruit fly. Several studies in Drosophila and closely related insects suggest that bcd may be the result of a gene duplication in the Dipteran lineage. In addition, the presence of a large, conserved secondary structure in the 3' untranslated region (UTR) makes the bcd gene a good candidate for studying compensatory evolution and the relationship between RNA secondary structure and patterns of standing variation in natural populations. Despite these interesting aspects, a population-level analysis has until now not been performed on bcd. In this study, DNA sequence variation was examined for a 4-kb region of the bcd gene, including a portion of the 5' UTR, the entire coding region, and the 3' UTR, for 25 Drosophila melanogaster isofemale lines from Zimbabwe and one allele from D. simulans. Statistical tests revealed a significant excess of replacement polymorphisms in the D. melanogaster lineage that are clustered in two putative linker regions of the Bicoid protein. This result is consistent with a relaxation of selective constraints in these regions. In addition, we found a distinct haplotype structure and a significantly smaller number of haplotypes than predicted by the standard neutral model. It is unlikely that the haplotype structure is maintained by epistatic selection acting on the secondary structure in the 3' UTR or by the association of the bcd gene with polymorphic inversions. Instead, our two main observations, namely the occurrence of a haplotype structure and the excess of replacement polymorphisms, may indicate that the selective history of this gene is rather complex, involving both the relaxation of purifying selection in some parts of the protein and the action of positive selection in other parts of the gene region.  相似文献   

9.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

10.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

11.
Computational tools for prediction of the secondary structure of two or more interacting nucleic acid molecules are useful for understanding mechanisms for ribozyme function, determining the affinity of an oligonucleotide primer to its target, and designing good antisense oligonucleotides, novel ribozymes, DNA code words, or nanostructures. Here, we introduce new algorithms for prediction of the minimum free energy pseudoknot-free secondary structure of two or more nucleic acid molecules, and for prediction of alternative low-energy (sub-optimal) secondary structures for two nucleic acid molecules. We provide a comprehensive analysis of our predictions against secondary structures of interacting RNA molecules drawn from the literature. Analysis of our tools on 17 sequences of up to 200 nucleotides that do not form pseudoknots shows that they have 79% accuracy, on average, for the minimum free energy predictions. When the best of 100 sub-optimal foldings is taken, the average accuracy increases to 91%. The accuracy decreases as the sequences increase in length and as the number of pseudoknots and tertiary interactions increases. Our algorithms extend the free energy minimization algorithm of Zuker and Stiegler for secondary structure prediction, and the sub-optimal folding algorithm by Wuchty et al. Implementations of our algorithms are freely available in the package MultiRNAFold.  相似文献   

12.
RNA secondary structure is often predicted from sequence by free energy minimization. Over the past two years, advances have been made in the estimation of folding free energy change, the mapping of secondary structure and the implementation of computer programs for structure prediction. The trends in computer program development are: efficient use of experimental mapping of structures to constrain structure prediction; use of statistical mechanics to improve the fidelity of structure prediction; inclusion of pseudoknots in secondary structure prediction; and use of two or more homologous sequences to find a common structure.  相似文献   

13.
14.
We determined the complete sequence of the mitochondrial DNA of the entomopathogenic nematode Steinernema carpocapsae and analyzed its structure and composition as well as the secondary structures predicted for its tRNAs and rRNAs. Almost the complete genome has been amplified in one fragment with long PCR and sequenced using a shotgun strategy. The 13,925-bp genome contains genes for 2 rRNAs, 22 tRNAs, and 12 proteins and lacks an ORF encoding ATPase subunit 8. Four initiation codons were inferred, TTT, TTA, ATA, and ATT, most of the genes ended with TAA or TAG, and only two had a T as an incomplete stop codon. All predicted tRNAs showed the nonconventional secondary structure typical of Secernentea. Although we were able to fold the sequences of trnN, trnD, and trnC into more conventional cloverleaf structures after adding adjacent nucleotides, northern blot experiments showed that the nonstandard tRNAs are actually expressed. Phylogenetic and comparative analyses showed that the mitochondrial genome of S. carpocapsae is more closely related to the genomes of A. suum and C. elegans than to that of Strongyloides stercoralis. This finding does not support the phylogeny based on nuclear small subunit ribosomal DNA sequences previously published. This discrepancy may result from differential reproductive strategies and/or differential selective pressure acting on nuclear and mitochondrial genes. The distinctive characteristics observed among mitochondrial genomes of Secernentea may have arisen to counteract the deleterious effects of Muller’s ratchet, which is probably enhanced by the reproductive strategies and selective pressures referred to above. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Rafael Zardoya]  相似文献   

15.
The D2-D3 expansion segments of the 28S ribosomal RNA (rRNA) were sequenced and compared to predict secondary structures for Hoplolaiminae species based on free energy minimization and comparative sequence analysis. The free energy based prediction method provides putative stem regions within primary structure and these base pairings in stems were confirmed manually by compensatory base changes among closely and distantly related species. Sequence differences ranged from identical between Hoplolaimus columbus and H. seinhorsti to 20.8% between Scutellonema brachyurum and H. concaudajuvencus. The comparative sequence analysis and energy minimization method yielded 9 stems in the D2 and 6 stems in the D3 which showed complete or partial compensatory base changes. At least 75% of nucleotides in the D2 and 68% of nucleotides in the D3 were related with formation of base pairings to maintain secondary structure. GC contents in stems ranged from 61 to 73% for the D2 and from 64 to 71% for the D3 region. These ranges are higher than G-C contents in loops which ranged from 37 to 48% in the D2 and 33-45% in the D3. In stems, G-C/C-G base pairings were the most common in the D2 and the D3 and also non-canonical base pairs including A•A and U•U, C•U/U•C, and G•A/A•G occurred in stems. The predicted secondary model and new sequence alignment based on predicted secondary structures for the D2 and D3 expansion segments provide useful information to assign positional nucleotide homology and reconstruction of more reliable phylogenetic trees.  相似文献   

16.
About 200 mRNA sequences of Escherichia coli and human with matching protein secondary structure data were studied. The mRNA folding for each native sequence and for corresponding randomized sequences was calculated through free energy minimization. We have found that the folding energy of mRNA segments in different protein secondary structures is significantly different. The average Z score is more negative for regular secondary structure (alpha-helix and beta-strand) than that for coil. This suggests that the codon choice in native mRNA sequence coding for protein regular structure contributes more to the mRNA folding stability.  相似文献   

17.
This article describes the latest version of an RNA folding algorithm that predicts both optimal and suboptimal solutions based on free energy minimization. A number of RNA's with known structures deduced from comparative sequence analysis are folded to test program performance. The group of solutions obtained for each molecule is analysed to determine how many of the known helixes occur in the optimal solution and in the best suboptimal solution. In most cases, a structure about 80% correct is found with a free energy within 2% of the predicted lowest free energy structure.  相似文献   

18.
We report the sequence of the guinea pig p53 cDNA. The comparative analysis of the coding and noncoding regions of p53 cDNAs of all available complete vertebrate sequences has allowed us to single out new conserved signals possibly involved in p53 functional activity. We have focused our attention on the most variable region of the protein, the proline (P)-rich domain, suggested to play a fundamental role in antiproliferative pathways. In this domain we have identified the PXXXXP repeated motif and singled out a common consensus sequence that can be considered a signature for mammalian p53: PXXXXPX{0,4}PX{0,9}PA(T,P,I,)(S,P)WPL. We have demonstrated the significance of the PXXXXP motif in SH3-binding protein and suggested its structure to be a loop. Also, the 5' and 3' untranslated regions (UTRs) of the guinea pig were sequenced, and this study represents the first detailed structural analysis of the UTRs of the p53 mRNAs available in literature. The 5' UTR of guinea pig (233 nt) can be folded into a stable secondary structure resembling that predicted in mouse. The 3' UTR of guinea pig is 771 nt long and shows higher similarity with human than with rodent sequences, having a region of about 350 nt that is deleted in rat and mouse. In the 3' UTR we have identified the presence of a mammalian-wide interspersed repeat sequence and of a cytoplasmic polyadenylation element, which could be involved in translational activation by promoting polyadenylation of mRNA, providing information about a possible mechanism of regulation of p53 expression mediated by the 3' UTR of the mRNA. The observations presented here could open new avenues to targeted mutations and experimental approaches useful in investigating new regulation mechanisms of p53 translation, activity, and stability.  相似文献   

19.
Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号