首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: RNA secondary structure analysis often requires searching for potential helices in large sequence data. RESULTS: We present a utility program GUUGle that efficiently locates potential helical regions under RNA base pairing rules, which include Watson-Crick as well as G-U pairs. It accepts a positive and a negative set of sequences, and determines all exact matches under RNA rules between positive and negative sequences that exceed a specified length. The GUUGle algorithm can also be adapted to use a precomputed suffix array of the positive sequence set. We show how this program can be effectively used as a filter preceding a more computationally expensive task such as miRNA target prediction. AVAILABILITY: GUUGle is available via the Bielefeld Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de/guugle  相似文献   

2.
3.
RNA folding is assumed to be a hierarchical process. The secondary structure of an RNA molecule, signified by base-pairing and stacking interactions between the paired bases, is formed first. Subsequently, the RNA molecule adopts an energetically favorable three-dimensional conformation in the structural space determined mainly by the rotational degrees of freedom associated with the backbone of regions of unpaired nucleotides (loops). To what extent the backbone conformation of RNA loops also results from interactions within the local sequence context or rather follows global optimization constraints alone has not been addressed yet. Because the majority of base stacking interactions are exerted locally, a critical influence of local sequence on local structure appears plausible. Thus, local loop structure ought to be predictable, at least in part, from the local sequence context alone. To test this hypothesis, we used Random Forests on a nonredundant data set of unpaired nucleotides extracted from 97 X-ray structures from the Protein Data Bank (PDB) to predict discrete backbone angle conformations given by the discretized η/θ-pseudo-torsional space. Predictions on balanced sets with four to six conformational classes using local sequence information yielded average accuracies of up to 55%, thus significantly better than expected by chance (17%-25%). Bases close to the central nucleotide appear to be most tightly linked to its conformation. Our results suggest that RNA loop structure does not only depend on long-range base-pairing interactions; instead, it appears that local sequence context exerts a significant influence on the formation of the local loop structure.  相似文献   

4.
MOTIVATION: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches. RESULTS: We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length  相似文献   

5.
MOTIVATION: Several results in the literature suggest that biologically interesting RNAs have secondary structures that are more stable than expected by chance. Based on these observations, we developed a scanning algorithm for detecting noncoding RNA genes in genome sequences, using a fully probabilistic version of the Zuker minimum-energy folding algorithm. RESULTS: Preliminary results were encouraging, but certain anomalies led us to do a carefully controlled investigation of this class of methods. Ultimately, our results argue that for the probabilistic model there is indeed a statistical effect, but it comes mostly from local base-composition bias and not from RNA secondary structure. For the thermodynamic implementation (which evaluates statistical significance by doing Monte Carlo shuffling in fixed-length sequence windows, thus eliminating the base-composition effect) the signals for noncoding RNAs are still usually indistinguishable from noise, especially when certain statistical artifacts resulting from local base-composition inhomogeneity are taken into account. We conclude that although a distinct, stable secondary structure is undoubtedly important in most noncoding RNAs, the stability of most noncoding RNA secondary structures is not sufficiently different from the predicted stability of a random sequence to be useful as a general genefinding approach.  相似文献   

6.
Twenty-four new insertions were obtained from seven different locations in the nuclear 18S rDNA for seven species of the lichen-forming fungal genus PHYSCONIA: They were analyzed allowing for terminal sequence conservation by adopting a flexible approach to exact insertion site position, and they were compared with 12 previously reported small insertion sequences from the 18S ribosomal RNA gene. Such insertions have previously been proposed to be degenerate self-splicing group I introns; however, the methodology used here identified consensus terminal sequences characteristic of spliceosomal introns. This finding is the first suggestion that multiple spliceosomal introns occur in ribosomal genes.  相似文献   

7.
Finding the common structure shared by two homologous RNAs   总被引:5,自引:0,他引:5  
MOTIVATION: CARNAC is a new method for pairwise folding of RNA sequences. The program takes into account local similarity, stem energy, and covariations to produce the common folding. It can handle all RNA types, and has also been adapted to align a new homologous sequence along a reference structured sequence. RESULTS: Using different data sets, we show that CARNAC provides a good partial prediction for a wide range of sequences (16S ssu rRNA, RNase P RNA, viruses) with only two sequences. In presence of a whole family of sequences, we also show that CARNAC can be used to detect whether the sequences actually share the same structure. AVAILABILITY: CARNAC is available at the URLhttp://www.lifl.fr/~perrique/rna/.  相似文献   

8.
The nucleotide sequence of tobacco vein mottling virus RNA.   总被引:24,自引:5,他引:19       下载免费PDF全文
The nucleotide sequence of the RNA of tobacco vein mottling virus, a member of the potyvirus group, was determined. The RNA was found to be 9471 residues in length, excluding a 3'-terminal poly(A) tail. The first three AUG codons from the 5'-terminus were followed by in-frame termination codons. The fourth, at position 206, was the beginning of an open reading frame of 9015 residues which could encode a polyprotein of 340 kDa. No other long open reading frames were present in the sequence or its complement. This AUG was present in the sequence AGGCCAUG, which is similar to the consensus initiation sequence shared by most eukaryotic mRNAs. The chemically-determined amino acid compositions of the helper component and coat proteins were similar to those predicted from the nucleotide sequence. Amino acid sequencing of coat protein from which an amino-terminal peptide had been removed allowed exact location of the coat protein cistron. A consensus sequence of V-(R or K)-F-Q was found on the N-terminal sides of proposed cleavage sites for proteolytic processing of the polyprotein.  相似文献   

9.
In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases  相似文献   

10.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

11.
12.
13.
The DNA sequences of the intergenic region between the 17S and 5.8S rRNA genes of the ribosomal RNA operon in yeast has been determined. In this region the 37S ribosomal precursor RNA is specifically cleaved at a number of sites in the course of the maturation process. The exact position of these processing sites has been established by sequence analysis of the terminal fragments of the respective RNA species. There appears to be no significant complementarity between the sequences surrounding the two termini of the 18S secondary precursor RNA nor between those surrounding the two termini of 17S mature rRNA. This finding implies that the processing of yeast 37S ribosomal precursor RNA is not directed by a double-strand specific ribonuclease previously shown to be involved in the processing of E. coli ribosomal precursor RNA [see Refs 1,2]. The processing sites of yeast ribosomal precursor RNA described in the present paper are all flanked at one side by a very [A+T]-rich sequence. In addition, sequence repeats are found around the processing sites in this precursor RNA. Finally, sequence homologies are present at the 3'-termini [6 nucleotides] and the 5'-termini [13 nucleotides] of a number of mature rRNA products and intermediate ribosomal RNA precursors. These structural features are discussed in terms of possible recognition sites for the processing enzymes.  相似文献   

14.
Thermodynamics of RNA-RNA binding   总被引:3,自引:0,他引:3  
BACKGROUND: Reliable prediction of RNA-RNA binding energies is crucial, e.g. for the understanding on RNAi, microRNA-mRNA binding and antisense interactions. The thermodynamics of such RNA-RNA interactions can be understood as the sum of two energy contributions: (1) the energy necessary to 'open' the binding site and (2) the energy gained from hybridization. METHODS: We present an extension of the standard partition function approach to RNA secondary structures that computes the probabilities Pu[i, j] that a sequence interval [i, j] is unpaired. RESULTS: Comparison with experimental data shows that Pu[i, j] can be applied as a significant determinant of local target site accessibility for RNA interference (RNAi). Furthermore, these quantities can be used to rigorously determine binding free energies of short oligomers to large mRNA targets. The resource consumption is comparable with a single partition function computation for the large target molecule. We can show that RNAi efficiency correlates well with the binding energies of siRNAs to their respective mRNA target. AVAILABILITY: RNAup will be distributed as part of the Vienna RNA Package, www.tbi.univie.ac.at/~ivo/RNA/  相似文献   

15.
16.
Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2 x m2 x max(n,m)) and a space complexity of only O(n x m). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment.  相似文献   

17.
18.
MOTIVATION: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding today as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. RESULTS: Here we present such an approach for pairwise local alignment which is based on foldalign and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new foldalign implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but foldalign is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. AVAILABILITY: The program is available online at http://foldalign.kvl.dk/  相似文献   

19.
Spinacia oleracia cholorplast 5S ribosomal RNA was end-labeled with [32P] and the complete nucleotide sequence was determined. The sequence is: pUAUUCUGGUGUCCUAGGCGUAGAGGAACCACACCAAUCCAUCCCGAACUUGGUGGUUAAACUCUACUGCGGUGACGAU ACUGUAGGGGAGGUCCUGCGGAAAAAUAGCUCGACGCCAGGAUGOH. This sequence can be fitted to the secondary structural model proposed for prokaryotic 5S ribosomal RNAs by Fox and Woese (1). However, the lengths of several single- and double-stranded regions differ from those common to prokaryotes. The spinach chloroplast 5S ribosomal RNA is homologous to the 5S ribosomal RNA of Lemna chloroplasts with the exception that the spinach RNA is longer by one nucleotide at the 3' end and has a purine base substitution at position 119. The sequence of spinach chloroplast 5S RNA is identical to the chloroplast 5S ribosomal RNA gene of tobacco. Thus the structures of the chloroplast 5S ribosomal RNAs from some of the higher plants appear to be almost totally conserved. This does not appear to be the case for the higher plant cytoplasmic 5S ribosomal RNAs.  相似文献   

20.
Yeast protein Yol066 (encoded by YOL066 ORF, also known as Rib2) possesses two distinct sequence domains: C-terminal deaminase domain and N-terminal part related to RNA:pseudouridine (psi)-synthases. The deaminase domain is implicated in the riboflavine biosynthesis, while the exact function of the RNA:Psi-synthase domain remains obscure. Here we report the optimisation of growth conditions and purification scheme for recombinant His(6)-tagged Yol066 expressed in E. coli BL21(DE3) using pET28 plasmid. Production of soluble Yol066 protein is best at low temperature (18 degrees C) and IPTG concentration (50 micro M) and Yol066 purification was achieved using metal-affinity and ion-exchange chromatography. This optimised protocol yields about 10 mg of highly purified recombinant Yol066 from 3 l of E. coli culture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号