首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
K Han  H J Kim 《Nucleic acids research》1993,21(5):1251-1257
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.  相似文献   

2.
3.
We have analyzed sequence covariation in an alignment of 266 non-redundant SH3 domain sequences using chi-squared statistical methods. Artifactual covariations arising from close evolutionary relationships among certain sequence subgroups were eliminated using empirically derived sequence diversity thresholds. This covariation detection method was able to predict residue-residue contacts (side-chain centres of mass within 8 A) in the structure of the SH3 domain with an accuracy of 85 %, which is greater than that achieved in many previous covariation studies. In examining the positions involved most frequently in covariations, we discovered a dramatic over-representation of a subset of five hydrophobic core positions. This covariation information was used to design second and third site substitutions that could compensate for highly destabilizing hydrophobic core substitutions in the Fyn SH3 domain, thus providing experimental data to validate the covariation analysis. The testing of our covariation detection method on 15 other alignments showed that the accuracy of contact prediction is highly variable depending on which sequence alignment is used, and useful levels of prediction accuracy were obtained with only approximately one-third of alignments. The results presented here provide insight into the difficulties inherent in covariation analysis, and suggest that it may have limited usefulness in tertiary structure prediction. On the other hand, our ability to use covariation analysis to design stabilizing combinations of hydrophobic core substitutions attests to its potential utility for gaining deeper insight into the stability determinants and functional mechanisms of proteins with known three-dimensional structures.  相似文献   

4.
We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences.  相似文献   

5.
MOTIVATION: As more non-coding RNAs are discovered, the importance of methods for RNA analysis increases. Since the structure of ncRNA is intimately tied to the function of the molecule, programs for RNA structure prediction are necessary tools in this growing field of research. Furthermore, it is known that RNA structure is often evolutionarily more conserved than sequence. However, few existing methods are capable of simultaneously considering multiple sequence alignment and structure prediction. RESULT: We present a novel solution to the problem of simultaneous structure prediction and multiple alignment of RNA sequences. Using Markov chain Monte Carlo in a simulated annealing framework, the algorithm MASTR (Multiple Alignment of STructural RNAs) iteratively improves both sequence alignment and structure prediction for a set of RNA sequences. This is done by minimizing a combined cost function that considers sequence conservation, covariation and basepairing probabilities. The results show that the method is very competitive to similar programs available today, both in terms of accuracy and computational efficiency. AVAILABILITY: Source code available from http://mastr.binf.ku.dk/  相似文献   

6.
Shang L  Xu W  Ozer S  Gutell RR 《PloS one》2012,7(6):e39383
Covariation analysis is used to identify those positions with similar patterns of sequence variation in an alignment of RNA sequences. These constraints on the evolution of two positions are usually associated with a base pair in a helix. While mutual information (MI) has been used to accurately predict an RNA secondary structure and a few of its tertiary interactions, early studies revealed that phylogenetic event counting methods are more sensitive and provide extra confidence in the prediction of base pairs. We developed a novel and powerful phylogenetic events counting method (PEC) for quantifying positional covariation with the Gutell lab's new RNA Comparative Analysis Database (rCAD). The PEC and MI-based methods each identify unique base pairs, and jointly identify many other base pairs. In total, both methods in combination with an N-best and helix-extension strategy identify the maximal number of base pairs. While covariation methods have effectively and accurately predicted RNAs secondary structure, only a few tertiary structure base pairs have been identified. Analysis presented herein and at the Gutell lab's Comparative RNA Web (CRW) Site reveal that the majority of these latter base pairs do not covary with one another. However, covariation analysis does reveal a weaker although significant covariation between sets of nucleotides that are in proximity in the three-dimensional RNA structure. This reveals that covariation analysis identifies other types of structural constraints beyond the two nucleotides that form a base pair.  相似文献   

7.
Sumedha  Martin OC  Wagner A 《Bio Systems》2007,90(2):475-485
RNA secondary structure is an important computational model to understand how genetic variation maps into phenotypic (structural) variation. Evolutionary innovation in RNA structures is facilitated by neutral networks, large connected sets of RNA sequences that fold into the same structure. Our work extends and deepens previous studies on neutral networks. First, we show that even the 1-mutant neighborhood of a given sequence (genotype) G0 with structure (phenotype) P contains many structural variants that are not close to P. This holds for biological and generic RNA sequences alike. Second, we analyze the relation between new structures in the 1-neighborhoods of genotypes Gk that are only a moderate Hamming distance k away from G0, and the structure of G0 itself, both for biological and for generic RNA structures. Third, we analyze the relation between mutational robustness of a sequence and the distances of structural variants near this sequence. Our findings underscore the role of neutral networks in evolutionary innovation, and the role that high robustness can play in diminishing the potential for such innovation.  相似文献   

8.
Proteins with similar structures are generally assumed to arise from similar sequences. However, there are more cases than not where this is not true. The dogma is that sequence determines structure; how, then, can very different sequences fold to the same structure? Here, we employ high temperature unfolding simulations to probe the pathways and specific interactions that direct the folding and unfolding of the SH3 domain. The SH3 metafold in the Dynameomics Database consists of 753 proteins with the same structure, but varied sequences and functions. To investigate the relationship between sequence and structure, we selected 17 targets from the SH3 metafold with high sequence variability. Six unfolding simulations were performed for each target, transition states were identified, revealing two general folding/unfolding pathways at the transition state. Transition states were also expressed as mathematical graphs of connected chemical nodes, and it was found that three positions within the structure, independent of sequence, were consistently more connected within the graph than any other nearby positions in the sequence. These positions represent a hub connecting different portions of the structure. Multiple sequence alignment and covariation analyses also revealed certain positions that were more conserved due to packing constraints and stabilizing long‐range contacts. This study demonstrates that members of the SH3 domain with different sequences can unfold through two main pathways, but certain characteristics are conserved regardless of the sequence or unfolding pathway. While sequence determines structure, we show that disparate sequences can provide similar interactions that influence folding and lead to similar structures.  相似文献   

9.
BACKGROUND: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.  相似文献   

10.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

11.
In this study we apply a genetic algorithm to a set of RNA sequences to find common RNA secondary structures. Our method is a three-step procedure. At the first stage of the procedure for each sequence, a genetic algorithm is used to optimize the structures in a population to a certain degree of stability. In this step, the free energy of a structure is the fitness criterion for the algorithm. Next, for each structure, we define a measure of structural conservation with respect to those in other sequences. We use this measure in a genetic algorithm to improve the structural similarity among sequences for the structures in the population of a sequence. Finally, we select those structures satisfying certain conditions of structural stability and similarity as predicted common structures for a set of RNA sequences. We have obtained satisfactory results from a set of tRNA, 5S rRNA, rev response elements (RRE) of HIV-1 and RRE of HIV-2/SIV, respectively.  相似文献   

12.
RNA molecules, which are found in all living cells, fold into characteristic structures that account for their diverse functional activities. Many of these RNA structures consist of a collection of fundamental RNA motifs. The various combinations of RNA basic components form different RNA classes and define their unique structural and functional properties. The availability of many genome sequences makes it possible to search computationally for functional RNAs. Biological experiments indicate that functional RNAs have characteristic RNA structural motifs represented by specific combinations of base pairings and conserved nucleotides in the loop regions. The searching for those well-ordered RNA structures and their homologues in genomic sequences is very helpful for the understanding of RNA-based gene regulation. In this paper, we consider the following problem: given an RNA sequence with a known secondary structure, efficiently determine candidate segments in genomic sequences that can potentially form RNA secondary structures similar to the given RNA secondary structure. Our new bottom-up approach searches all potential stem-loops similar to ones of the given RNA secondary structure first, and then based on located stem-loops, detects potential homologous structural RNAs in genomic sequences.  相似文献   

13.
Comparative sequence analysis complements experimental methods for the determination of RNA three-dimensional structure. This approach is based on the concept that different sequences within the same gene family form similar higher-order structures. The large number of rRNA sequences with sufficient variation, along with improved covariation algorithms, are providing us with the opportunity to identify new base triples in 16S rRNA. The three-dimensional conformations for one of our strongest candidates involving U121 (C124:G237) and/or U121 (U125:A236) (Escherichia coli sequence and numbering) are analyzed here with different molecular modeling tools. Molecular modeling shows that U121 interacts with C124 in the U121 (C124:G237) base triple. This arrangement maintains isomorphic structures for the three most frequent sequence motifs (approximately 93% of known bacterial and archaeal sequences), is consistent with chemical reactivity of U121 in E. coli ribosomes, and is geometrically favorable. Further, the restricted set of observed canonical (GU, AU, GC) base-pair types at positions 124:237 and 125:236 is consistent with the fact that the canonical base-pair sets (for both base pairs) that are not observed in nature prevent the formation of the 121 (124:237) base triple. The analysis described here serves as a general scheme for the prediction of specific secondary and tertiary structure base pairing where there is a network of correlated base changes.  相似文献   

14.
In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information.  相似文献   

15.
We propose a new method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of thermodynamic structure prediction and phylogenetic comparison. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of approximately 10 sequences, efficiently exploiting the information contained in the sequence variability. The procedure constructs a prediction only for those parts of sequences that are consistent with a single conserved structure. Our implementation produces reasonable consensus structures without user interference. As an example we have analysed the complete HIV-1 and hepatitis C virus (HCV) genomes as well as the small segment of hantavirus. Our method confirms the known structures in HIV-1 and predicts previously unknown conserved RNA secondary structures in HCV.  相似文献   

16.
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.  相似文献   

17.
Secondary structures of RNA sequences are increasingly being used as additional information in reconstructing phylogenies and/or in distinguishing species by compensatory base change (CBC) analyses. However, in most cases just one secondary structure is used in manually correcting an automatically generated multiple sequence alignment and/or just one secondary structure is used in guiding a sequence alignment still completely generated by hand. With the advent of databases and tools offering individual RNA secondary structures, here we re-introduce a twelve letter code already implemented in 4SALE – a tool for synchronous sequence and secondary structure alignment and editing – that enables one to align RNA sequences and their individual secondary structures synchronously and fully automatic, while dramatically increasing the phylogenetic information content. We further introduce a scaled down non-GUI version of 4SALE particularly designed for big data analysis, and available at: http://4sale.bioapps.biozentrum.uni-wuerzburg.de.  相似文献   

18.
19.
RNA molecules and in particular the nuclear SSU RNA play an important role in molecular systematics. With the advent of increasingly parameterized substitution models in systematic research, the incorporation of secondary-structure information became a realistic option compensating interdependence of character variation. As a prerequisite, consensus structures of eukaryotic SSU RNA molecules have become available through extensive comparative analyses and crystallographic studies. Despite extensive research in hexapod phylogenetics, consensus SSU RNA secondary structures focusing on hexapods have not yet been explored. In this study, we compiled a representative hexapod SSU data set of 261 sequences and inferred a specific consensus SSU secondary-structure model. Our search for conserved structural motives relied on a combined approach of thermodynamic and covariation analyses. The hexapod consensus-structure model deviates from the canonical eukaryotic model in a number of helices. Additionally, in several helices the hexapod sequences did not support a single consensus structure. We provide consensus structures of these sections of single less-inclusive taxa, thus facilitating the adaptation of the consensus hexapod model to less-inclusive phylogenetic questions. The secondary-structure catalog will foster the application of RNA structure models in phylogenetic analyses using the SSU rRNA molecule, and it will improve the realism of substitution models and the reliability of reconstructions based on rRNA sequences.  相似文献   

20.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号