首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Many raw biological sequence data have been generated by the human genome project and related efforts. The understanding of structural information encoded by biological sequences is important to acquire knowledge of their biochemical functions but remains a fundamental challenge. Recent interest in RNA regulation has resulted in a rapid growth of deposited RNA secondary structures in varied databases. However, a functional classification and characterization of the RNA structure have only been partially addressed. This article aims to introduce a novel interval-based distance metric for structure-based RNA function assignment. The characterization of RNA structures relies on distance vectors learned from a collection of predicted structures. The distance measure considers the intersected, disjoint, and inclusion between intervals. A set of RNA pseudoknotted structures with known function are applied and the function of the query structure is determined by measuring structure similarity. This not only offers sequence distance criteria to measure the similarity of secondary structures but also aids the functional classification of RNA structures with pesudoknots.  相似文献   

2.

Background

Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.

Results

In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search.

Conclusions

The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.
  相似文献   

3.
Most molecular analyses, including phylogenetic inference, are based on sequence alignments. We present an algorithm that estimates relatedness between biomolecules without the requirement of sequence alignment by using a protein frequency matrix that is reduced by singular value decomposition (SVD), in a latent semantic index information retrieval system. Two databases were used: one with 832 proteins from 13 mitochondrial gene families and another composed of 1000 sequences from nine types of proteins retrieved from GenBank. Firstly, 208 sequences from the first database and 200 from the second were randomly selected and compared using edit distance between each pair of sequences and respective cosines and Euclidean distances from SVD. Correlation between cosine and edit distance was -0.32 (P < 0.01) and between Euclidean distance and edit distance was +0.70 (P < 0.01). In order to check the ability of SVD in classifying sequences according to their categories, we used a sample of 202 sequences from the 13 gene families as queries (test set), and the other proteins (630) were used to generate the frequency matrix (training set). The classification algorithm applies a voting scheme based on the five most similar sequences with each query. With a 3-peptide frequency matrix, all 202 queries were correctly classified (accuracy = 100%). This algorithm is very attractive, because sequence alignments are neither generated nor required. In order to achieve results similar to those obtained with edit distance analysis, we recommend that Euclidean distance be used as a similarity measure for protein sequences in latent semantic indexing methods.  相似文献   

4.
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations. © 1993 John Wiley & Sons, Inc.  相似文献   

5.
Elements of local tertiary structure in RNA molecules are important in understanding structure-function relationships. The loop E motif, first identified in several eukaryotic RNAs at functional sites which share an exceptional propensity for UV crosslinking between specific bases, was subsequently shown to have a characteristic tertiary structure. Common sequences and secondary structures have allowed other examples of the E-loop motif to be recognized in a number of RNAs at sites of protein binding or other biological function. We would like to know if more elements of local tertiary structure, in addition to the E-loop, can be identified by such common features. The highly structured circular RNA genome of the hepatitis D virus (HDV) provides an ideal test molecule because it has extensive internal structure, a UV-crosslinkable tertiary element, and specific sites for functional interactions with proteins including host PKR. We have now found a UV-crosslinkable element of local tertiary structure in antigenomic HDV RNA which, although differing from the E-loop, has a very similar pattern of sequence and secondary structure to the UV-crosslinkable element found in the genomic strand. Despite the fact that the two structures map close to one another, the sequences comprising them are not the templates for each other. Instead, the template regions for each element are additional sites for potential higher order structure on their respective complementary strands. This wealth of recurring sequences interspersed with base-paired stems provides a context to examine other RNA species for such features and their correlations with biological function.  相似文献   

6.
We investigated the relationship between RNA structure and folding rates accounting for hierarchical structural formation. Folding rates of two-state folding proteins correlate well with relative contact order, a quantitative measure of the number and sequence distance between tertiary contacts. These proteins do not form stable structures prior to the rate-limiting step. In contrast, most secondary structures are stably formed prior to the rate-limiting step in RNA folding. Accordingly, we introduce "reduced contact order", a metric that reflects only the number of residues available to participate in the conformational search after the formation of secondary structure. Plotting the folding rates and the reduced contact order from ten different RNAs suggests that RNA folding can be divided into two classes. To examine this division, folding rates of circularly permutated isomers are compared for two RNAs, one from each class. Folding rates vary by tenfold for circularly permuted Bacillus subtilis RNase P RNA isomers, whereas folding rates vary by only 1.2-fold for circularly permuted catalytic domains. This difference is likely related to the dissimilar natures of their rate-limiting steps.  相似文献   

7.
RNA performs a variety of diverse functions and therefore must adopt many different three-dimensional conformations. The number and complexity of RNA structures that are currently available are steadily increasing, necessitating the generation of versatile structure visualization tools. Here, we describe a new RNA secondary and tertiary structure visualization tool, the display program coloRNA. This program colors each nucleotide in a secondary structure schematic according to the value of an assigned property of the corresponding backbone phosphate group, such as the distance between corresponding residues in two atomic models of the same RNA molecule. To assist in analyzing tertiary structure, coloRNA also colors nucleotides based on the three-dimensional distances between a user-selected nucleotide and all others. Minimum and maximum thresholds can be used to focus in on, or eliminate, a particular value range. coloRNA can display a user-specified group of nucleotides by outlining the structure in an automatically assigned, but user-changeable color. As an example, we have used coloRNA to analyze a pair of recently published structures of the Escherichia coli 70S ribosome. When coloRNA is used to display the conformational difference between the two structures, the large movement of the small subunit head stands visually out from the background changes in the remaining domains of the small subunit.  相似文献   

8.
9.
A k-noncrossing RNA pseudoknot structure is a graph over {1,…,n} without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA structures. In this paper we prove a central and a local limit theorem for the distribution of the number of 3-noncrossing RNA structures over n nucleotides with exactly h bonds. Our analysis employs the generating function of k-noncrossing RNA pseudoknot structures and the asymptotics for the coefficients. The results of this paper explain the findings on the number of arcs of RNA secondary structures obtained by molecular folding algorithms and are of relevance for prediction algorithms of k-noncrossing RNA structures.  相似文献   

10.
Interval-based distance function for identifying RNA structure candidates   总被引:1,自引:0,他引:1  
Many clustering approaches have been developed for biological data analysis, however, the application of traditional clustering algorithms for RNA structure data analysis is still a challenging issue. This arises from the existence of complex secondary structures while clustering. One of the most critical issues of cluster analysis is the development of appropriate distance measures in high dimensional space. The traditional distance measures focus on scale issues, but ignores the correlation between two values. This article develops a novel interval-based distance (Hausdorff) measure for computing the similarity between characterized structures. Three relationships including perfect match, partially overlapped and non-overlapped are considered. Finally, we demonstrate the methods by analyzing a data set of RNA secondary structures from the Rfam database.  相似文献   

11.
The sequences and structures of RNase P RNAs of some Gram-positive bacteria, e.g. Bacillus subtilis, are very different than those of other bacteria. In order to expand our understanding of the structure and evolution of RNase P RNA in Gram-positive bacteria, gene sequences encoding RNase P RNAs from 10 additional species from this evolutionary group have been determined, doubling the number of sequences available for comparative analysis. The enlarged data set allows refinement of the secondary structure model of these unusual RNase P RNAs and the identification of potential tertiary interactions between P10.1 and L12, and between L5.1 and L15.1. The newly-obtained sequences suggest that RNase P RNA underwent an abrupt, dramatic restructuring in the ancestry of the low-G+C Gram-positive bacteria after the divergence of the branches leading to the 'Clostridia and relatives' and the remaining low-G+C Gram-positive species. The unusual structures of the RNase P RNAs of Mycoplasma hyopneumoniae and M.floccularre are apparently derived from RNAs with Bacillus-like structure rather than from intermediate, partially restructured ancestral RNAs. The structure of the RNase P RNA from the photosynthetic Heliobacillus mobilis supports the relationship of this specie with Bacillus and Staphylococcus rather than the 'Clostridia and relatives' as suggested by the sequences of their small-subunit ribosomal RNAs.  相似文献   

12.
In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.  相似文献   

13.
Han K  Nepal C 《FEBS letters》2007,581(9):1881-1890
A complete understanding of protein and RNA structures and their interactions is important for determining the binding sites in protein-RNA complexes. Computational approaches exist for identifying secondary structural elements in proteins from atomic coordinates. However, similar methods have not been developed for RNA, due in part to the very limited structural data so far available. We have developed a set of algorithms for extracting and visualizing secondary and tertiary structures of RNA and for analyzing protein-RNA complexes. These algorithms have been implemented in a web-based program called PRI-Modeler (protein-RNA interaction modeler). Given one or more protein data bank files of protein-RNA complexes, PRI-Modeler analyzes the conformation of the RNA, calculates the hydrogen bond (H bond) and van der Waals interactions between amino acids and nucleotides, extracts secondary and tertiary RNA structure elements, and identifies the patterns of interactions between the proteins and RNAs. This paper presents PRI-Modeler and its application to the hydrogen bond and van der Waals interactions in the most representative set of protein-RNA complexes. The analysis reveals several interesting interaction patterns at various levels. The information provided by PRI-Modeler should prove useful for determining the binding sites in protein-RNA complexes. PRI-Modeler is accessible at http://wilab.inha.ac.kr/primodeler/, and supplementary materials are available in the analysis results section at http://wilab.inha.ac.kr/primodeler/.  相似文献   

14.
The secondary and tertiary structures of interferon were predicted from four homologous amino acid sequences. Three methods of secondary structure prediction gave differing results that were interpreted to suggest that there might be four α-helices that are important in the tertiary fold. The validity of this interpretation was assessed by the application of the methods to predict the secondary structures of two proteins known to consist of four α-helices. A possible tertiary model for interferon is then proposed in which the four α-helices pack into a right-handed bundle similar to that observed in several known protein structures. This model was shown to be stereochemically feasible by an α-helix docking algorithm. One of the resultant structures is shown to be compatible with the known disulphide linkages in interferon. Certain residues that are conserved between the different sequences lie near each other in our model and these residues might form a functional site. In the absence of a crystal structure for interferon, a predicted tertiary model will help further structural and functional studies.  相似文献   

15.
M Lu  D E Draper 《Nucleic acids research》1995,23(17):3426-3433
Ribosomal protein L11 and an antibiotic, thiostrepton, bind to the same highly conserved region of large subunit ribosomal RNA and stabilize a set of NH4(+)-dependent tertiary interactions within the domain. In vitro selection from partially randomized pools of RNA sequences has been used to ask what aspects of RNA structure are recognized by the ligands. L11-selected RNAs showed little sequence variation over the entire 70 nucleotide randomized region, while thiostrepton required a slightly smaller 58 nucleotide domain. All the selected mutations preserved or stabilized the known secondary and tertiary structure of the RNA. L11-selected RNAs from a pool mutagenized only around a junction structure yielded a very different consensus sequence, in which the RNA tertiary structure was substantially destabilized and L11 binding was no longer dependent on NH4+. We propose that L11 can bind the RNA in two different 'modes', depending on the presence or absence of the NH4(+)-dependent tertiary structure, while thiostrepton can only recognize the RNA tertiary structure. The different RNA recognition mechanisms for the two ligands may be relevant to their different effects on protein synthesis.  相似文献   

16.
Hu YJ 《Nucleic acids research》2002,30(17):3886-3893
Given a set of homologous or functionally related RNA sequences, the consensus motifs may represent the binding sites of RNA regulatory proteins. Unlike DNA motifs, RNA motifs are more conserved in structures than in sequences. Knowing the structural motifs can help us gain a deeper insight of the regulation activities. There have been various studies of RNA secondary structure prediction, but most of them are not focused on finding motifs from sets of functionally related sequences. Although recent research shows some new approaches to RNA motif finding, they are limited to finding relatively simple structures, e.g. stem-loops. In this paper, we propose a novel genetic programming approach to RNA secondary structure prediction. It is capable of finding more complex structures than stem-loops. To demonstrate the performance of our new approach as well as to keep the consistency of our comparative study, we first tested it on the same data sets previously used to verify the current prediction systems. To show the flexibility of our new approach, we also tested it on a data set that contains pseudoknot motifs which most current systems cannot identify. A web-based user interface of the prediction system is set up at http://bioinfo. cis.nctu.edu.tw/service/gprm/.  相似文献   

17.
MOTIVATION: Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remains a challenge in bioinformatics research. Methods and RESULTS: We present a new sampling based algorithm to predict common RNA secondary structures in multiple unaligned sequences. Our algorithm finds the common structure between two sequences by probabilistically sampling aligned stems based on stem conservation calculated from intrasequence base pairing probabilities and intersequence base alignment probabilities. It iteratively updates these probabilities based on sampled structures and subsequently recalculates stem conservation using the updated probabilities. The iterative process terminates upon convergence of the sampled structures. We extend the algorithm to multiple sequences by a consistency-based method, which iteratively incorporates and reinforces consistent structure information from pairwise comparisons into consensus structures. The algorithm has no limitation on predicting pseudoknots. In extensive testing on real sequence data, our algorithm outperformed other leading RNA structure prediction methods in both sensitivity and specificity with a reasonably fast speed. It also generated better structural alignments than other programs in sequences of a wide range of identities, which more accurately represent the RNA secondary structure conservations. AVAILABILITY: The algorithm is implemented in a C program, RNA Sampler, which is available at http://ural.wustl.edu/software.html  相似文献   

18.
The 5'-untranslated region (5'-UTR) is the most conserved part of the HIV-1 RNA genome, and it contains regulatory motifs that mediate various steps in the viral life cycle. Previous work showed that the 5'-terminal 290 nucleotides of HIV-1 RNA adopt two mutually exclusive secondary structures, long distance interaction (LDI) and branched multiple hairpin (BMH). BMH has multiple hairpins, including the dimer initiation signal (DIS) hairpin that mediates RNA dimerization. LDI contains a long distance base-pairing interaction that occludes the DIS region. Consequently, the two conformations differ in their ability to form RNA dimers. In this study, we have presented evidence that the full-length 5'-UTR also adopts the LDI and BMH conformations. The downstream 290-352 region, including the Gag start codon, folds differently in the context of the LDI and BMH structures. These nucleotides form an extended hairpin structure in the LDI conformation, but the same sequences create a novel long distance interaction with upstream U5 sequences in the BMH conformation. The presence of this U5-AUG duplex was confirmed by computer-assisted RNA structure prediction, biochemical analyses, and a phylogenetic survey of different virus isolates. The U5-AUG duplex may influence translation of the Gag protein because it occludes the start codon of the Gag open reading frame.  相似文献   

19.
DNA harvested directly from complex natural microbial communities by PCR has been successfully used to predict RNase P RNA structure, and can potentially provide an abundant source of information for structural predictions of other RNAs. In this study, we utilized genetic variation in natural communities to test and refine the secondary and tertiary structural model for the bacterial tmRNA. The variability of proposed tmRNA secondary structures in different organisms and the lack of any predicted tertiary structure suggested that further refinement of the tmRNA could be useful. To increase the phylogenetic representation of tmRNA sequences, and thereby provide additional data for statistical comparative analysis, we amplified, sequenced, and compared tmRNA sequences from natural microbial communities. Using primers designed from gamma proteobacterial sequences, we determined 44 new tmRNA sequences from a variety of environmental DNA samples. Covariation analyses of these sequences, along with sequences from cultured organisms, confirmed most of the proposed tmRNA model but also provided evidence for a new tertiary interaction. This approach of gathering sequence information from natural microbial communities seems generally applicable in RNA structural analysis.  相似文献   

20.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号