首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Existing computational methods for RNA secondary-structure prediction tacitly assume RNA to only encode functional RNA structures. However, experimental studies have revealed that some RNA sequences, e.g. compact viral genomes, can simultaneously encode functional RNA structures as well as proteins, and evidence is accumulating that this phenomenon may also be found in Eukaryotes. We here present the first comparative method, called RNA-DECODER, which explicitly takes the known protein-coding context of an RNA-sequence alignment into account in order to predict evolutionarily conserved secondary-structure elements, which may span both coding and non-coding regions. RNA-DECODER employs a stochastic context-free grammar together with a set of carefully devised phylogenetic substitution-models, which can disentangle and evaluate the different kinds of overlapping evolutionary constraints which arise. We show that RNA-DECODER's parameters can be automatically trained to successfully fold known secondary structures within the HCV genome. We scan the genomes of HCV and polio virus for conserved secondary-structure elements, and analyze performance as a function of available evolutionary information. On known secondary structures, RNA-DECODER shows a sensitivity similar to the programs MFOLD, PFOLD and RNAALIFOLD. When scanning the entire genomes of HCV and polio virus for structure elements, RNA-DECODER's results indicate a markedly higher specificity than MFOLD, PFOLD and RNAALIFOLD.  相似文献   

2.
Fang X  Luo Z  Yuan B  Wang J 《Bioinformation》2007,2(5):222-229
The prediction of RNA secondary structure can be facilitated by incorporating with comparative analysis of homologous sequences. However, most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application. Here we improve the prediction of RNA secondary structure by detecting and assessing conserved stems shared by all sequences in the alignment. Our method can be summarized by: 1) we detect possible stems in single RNA sequence using the so-called position matrix with which some possibly paired positions can be uncovered; 2) we detect conserved stems across multiple RNA sequences by multiplying the position matrices; 3) we assess the conserved stems using the Signal-to-Noise; 4) we compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program. We tested our method on data sets of RNA alignments with known secondary structures. The accuracy, measured as sensitivity and specificity, of our method is greater than predictions by RNAalifold.  相似文献   

3.
Secondary structure prediction for aligned RNA sequences   总被引:19,自引:0,他引:19  
Most functional RNA molecules have characteristic secondary structures that are highly conserved in evolution. Here we present a method for computing the consensus structure of a set aligned RNA sequences taking into account both thermodynamic stability and sequence covariation. Comparison with phylogenetic structures of rRNAs shows that a reliability of prediction of more than 80% is achieved for only five related sequences. As an application we show that the Early Noduline mRNA contains significant secondary structure that is supported by sequence covariation.  相似文献   

4.
A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction without distance constraints. Besides recovering known structural elements, we predict several novel structural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus.  相似文献   

5.
The existence and functional importance of RNA secondary structure in the replication of positive-stranded RNA viruses is increasingly recognized. We applied several computational methods to detect RNA secondary structure in the coding region of hepatitis C virus (HCV), including thermodynamic prediction, calculation of free energy on folding, and a newly developed method to scan sequences for covariant sites and associated secondary structures using a parsimony-based algorithm. Each of the prediction methods provided evidence for complex RNA folding in the core- and NS5B-encoding regions of the genome. The positioning of covariant sites and associated predicted stem-loop structures coincided with thermodynamic predictions of RNA base pairing, and localized precisely in parts of the genome with marked suppression of variability at synonymous sites. Combined, there was evidence for a total of six evolutionarily conserved stem-loop structures in the NS5B-encoding region and two in the core gene. The virus most closely related to HCV, GB virus-B (GBV-B) also showed evidence for similar internal base pairing in its coding region, although predictions of secondary structures were limited by the absence of comparative sequence data for this virus. While the role(s) of stem-loops in the coding region of HCV and GBV-B are currently unknown, the structure predictions in this study could provide the starting point for functional investigations using recently developed self-replicating clones of HCV.  相似文献   

6.
MOTIVATION: Many computerized methods for RNA secondary structure prediction have been developed. Few of these methods, however, employ an evolutionary model, thus relevant information is often left out from the structure determination. This paper introduces a method which incorporates evolutionary history into RNA secondary structure prediction. The method reported here is based on stochastic context-free grammars (SCFGs) to give a prior probability distribution of structures. RESULTS: The phylogenetic tree relating the sequences can be found by maximum likelihood (ML) estimation from the model introduced here. The tree is shown to reveal information about the structure, due to mutation patterns. The inclusion of a prior distribution of RNA structures ensures good structure predictions even for a small number of related sequences. Prediction is carried out using maximum a posteriori estimation (MAP) estimation in a Bayesian approach. For small sequence sets, the method performs very well compared to current automated methods.  相似文献   

7.
Prediction of common folding structures of homologous RNAs.   总被引:2,自引:2,他引:0       下载免费PDF全文
K Han  H J Kim 《Nucleic acids research》1993,21(5):1251-1257
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.  相似文献   

8.
BACKGROUND: Telomerase is a ribonucleoprotein complex whose RNA moiety dictates the addition of specific simple sequences onto chromosomes ends. While relevant for certain human genetic diseases, the contribution of the essential telomerase RNA to RNP assembly still remains unclear. Phylogenetic analyses of vertebrate and ciliate telomerase RNAs revealed conserved elements that potentially organize protein subunits for RNP function. In contrast, the yeast telomerase RNA could not be fitted to any known structural model, and the limited number of known sequences from Saccharomyces species did not permit the prediction of a yeast specific conserved structure. RESULTS: We cloned and analyzed the complete telomerase RNA loci (TLC1) from all known Saccharomyces species belonging to the "sensu stricto" group. Complementation analyses in S. cerevisiae and end mappings of mature RNAs ensured the relevance of the cloned sequences. By using phylogenetic comparative analysis coupled with in vitro enzymatic probing, we derived a secondary structure prediction of the Saccharomyces cerevisiae TLC1 RNA. This conserved secondary structure prediction includes a central domain that is likely to orchestrate DNA synthesis and at least two accessory domains important for RNA stability and telomerase recruitment. The structure also reveals a potential tertiary interaction between two loops in the central core. CONCLUSIONS: The predicted secondary structure of the TLC1 RNA of S. cerevisiae reveals a distinct folding pattern featuring well-separated but conserved functional elements. The predicted structure now allows for a detailed and rationally designed study to the structure-function relationships within the telomerase RNP-complex in a genetically tractable system.  相似文献   

9.
The 5'-untranslated region (5'-UTR) is the most conserved part of the HIV-1 RNA genome, and it contains regulatory motifs that mediate various steps in the viral life cycle. Previous work showed that the 5'-terminal 290 nucleotides of HIV-1 RNA adopt two mutually exclusive secondary structures, long distance interaction (LDI) and branched multiple hairpin (BMH). BMH has multiple hairpins, including the dimer initiation signal (DIS) hairpin that mediates RNA dimerization. LDI contains a long distance base-pairing interaction that occludes the DIS region. Consequently, the two conformations differ in their ability to form RNA dimers. In this study, we have presented evidence that the full-length 5'-UTR also adopts the LDI and BMH conformations. The downstream 290-352 region, including the Gag start codon, folds differently in the context of the LDI and BMH structures. These nucleotides form an extended hairpin structure in the LDI conformation, but the same sequences create a novel long distance interaction with upstream U5 sequences in the BMH conformation. The presence of this U5-AUG duplex was confirmed by computer-assisted RNA structure prediction, biochemical analyses, and a phylogenetic survey of different virus isolates. The U5-AUG duplex may influence translation of the Gag protein because it occludes the start codon of the Gag open reading frame.  相似文献   

10.
RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at http://www.daimi.au.dk/~compbio/pfold.  相似文献   

11.
MOTIVATION: Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remains a challenge in bioinformatics research. Methods and RESULTS: We present a new sampling based algorithm to predict common RNA secondary structures in multiple unaligned sequences. Our algorithm finds the common structure between two sequences by probabilistically sampling aligned stems based on stem conservation calculated from intrasequence base pairing probabilities and intersequence base alignment probabilities. It iteratively updates these probabilities based on sampled structures and subsequently recalculates stem conservation using the updated probabilities. The iterative process terminates upon convergence of the sampled structures. We extend the algorithm to multiple sequences by a consistency-based method, which iteratively incorporates and reinforces consistent structure information from pairwise comparisons into consensus structures. The algorithm has no limitation on predicting pseudoknots. In extensive testing on real sequence data, our algorithm outperformed other leading RNA structure prediction methods in both sensitivity and specificity with a reasonably fast speed. It also generated better structural alignments than other programs in sequences of a wide range of identities, which more accurately represent the RNA secondary structure conservations. AVAILABILITY: The algorithm is implemented in a C program, RNA Sampler, which is available at http://ural.wustl.edu/software.html  相似文献   

12.
The coexistence of multiple codes in the genome of human immunodeficiency virus type 1 (HIV-1) was analyzed. We explored factors constraining the variability of the virus genome primarily in relation to conserved RNA secondary structures overlapping coding sequences, and used a simple combination of algorithms for RNA secondary structure prediction based on the nearest-neighbor thermodynamic rules and a statistical approach. In our previous study, we applied this combination to a non- redundant data set of env nucleotide sequences, confirmed the conservative secondary structure of the rev-responsive element (RRE) and found a new RNA structure in the first conserved (C1) region of the env gene. In this study, we analyzed the variability of putative RNA secondary structures inside the nef gene of HIV-1 by applying these algorithms to a non-redundant data set of 104 nef sequences retrieved from the Los Alamos HIV database, and predicted the existence of a novel functional RNA secondary structure in the β3/β4 regions of nef. The predicted RNA fold in the β3/β4 region of nef appears in two forms with different loop sizes. The loop of the first fold consists of seven nucleotides (positions 494–500), with consensus UCAAGCU appearing in 79% of sequences. The other has a five-base loop (positions 495–499) with consensus CAAGC. The difference in size between these two loops may reflect the difference between respective counterparts in the hairpin recognition. This may also have an adaptive biological significance.  相似文献   

13.
Functional RNA structures tend to be conserved during evolution. This finding is, for example, exploited by comparative methods for RNA secondary structure prediction that currently provide the state-of-art in terms of prediction accuracy. We here provide strong evidence that homologous RNA genes not only fold into similar final RNA structures, but that their folding pathways also share common transient structural features that have been evolutionarily conserved. For this, we compile and investigate a non-redundant data set of 32 sequences with known transient and final RNA secondary structures and devise a dedicated computational analysis pipeline.  相似文献   

14.
A comparative analysis of TAR RNA structures in human and simian immunodeficiency viruses reveals the conservation of certain structural features despite the divergence in sequence. Both the TAR elements of HIV-1 and SIV-chimpanzee can be folded into relatively simple one-stem hairpin structures. Chemical and RNAase probes were used to analyze the more complex structure of HIV-2 TAR RNA, which folds into a branched hairpin structure. A surprisingly similar RNA conformation can be proposed for SIV-mandrill, despite considerable divergence in nucleotide sequence. A third structural presentation of TAR sequences is seen for SIV-african green monkey. These results are generally consistent with the classification of HIV-SIV viruses in four subgroups based on sequence analyses (both nucleotide- and amino acid-sequences). However, some conserved TAR structures were detected for members of different virus subgroups. It is therefore proposed that RNA structure analysis might provide an additional tool for determining phylogenetic relationships among the HIV-SIV viruses.  相似文献   

15.

Background

Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.

Results

On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.

Conclusions

By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.
  相似文献   

16.
17.
A series of unusual folding regions (UFR) immediately 3' to the cleavage site of the outer membrane protein (OMP) and transmembrane protein (TMP) were detected in the envelope gene RNA of the human immunodeficiency virus (HIV-1, HIV-2) and simian immunodeficiency virus (SIV) by an extensive Monte Carlo simulation. These RNA secondary structures were predicted to be both highly stable and statistically significant. In the calculation, twenty-five different sequence isolates of HIV-1, three isolates of HIV-2 and eight sequences of SIV were included. Although significant sequence divergence occurs in the env coding regions of these viruses, a distinct UFR of 234-nt is consistently located ten nucleotides 3' to the cleavage site of the OMP/TMP in HIV-1, and a 216-nt UFR occurs forty-six and forty-nine nucleotides downstream from the OMP/TMP cleavage site of HIV-2 and SIV, respectively. Compensatory base changes in the helical stem regions of these conserved RNA secondary structures are identified. These results support the hypothesis that these special RNA folding regions are functionally important and suggest that the role of this sequence as the Rev response element (RRE) is mediated by secondary structure as well as primary RNA sequence.  相似文献   

18.
BACKGROUND: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.  相似文献   

19.
A new approach is proposed for determining common RNA secondary structures within a set of homologous RNAs. The approach is a combination of phylogenetic and thermodynamic methods which is based on the prediction of optimal and suboptimal secondary structures, topological similarity searches and phylogenetic comparative analysis. The optimal and suboptimal RNA secondary structures are predicted by energy minimization. Structural comparison of the predicted RNA secondary structures is used to find conserved structures that are topologically similar in all these homologous RNAs. The validity of the conserved structural elements found is then checked by phylogenetic comparison of the sequences. This procedure is used to predict common structures of ribonuclease P (RNAase P) RNAs.  相似文献   

20.
A conserved secondary structure for telomerase RNA.   总被引:41,自引:0,他引:41  
D P Romero  E H Blackburn 《Cell》1991,67(2):343-353
The RNA moiety of the ribonucleoprotein enzyme telomerase contains the template for telomeric DNA synthesis. We present a secondary structure model for telomerase RNA, derived by a phylogenetic comparative analysis of telomerase RNAs from seven tetrahymenine ciliates. The telomerase RNA genes from Tetrahymena malaccensis, T. pyriformis, T. hyperangularis, T. pigmentosa, T. hegewishii, and Glaucoma chattoni were cloned, sequenced, and compared with the previously cloned RNA gene from T. thermophila and with each other. To define secondary structures of these RNAs, homologous complementary sequences were identified by the occurrence of covariation among putative base pairs. Although their primary sequences have diverged rapidly overall, a strikingly conserved secondary structure was identified for all these telomerase RNAs. Short regions of nucleotide conservation include a block of 22 totally conserved nucleotides that contains the telomeric templating region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号