首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genetic selection and DNA sequences of 4.5S RNA homologs.   总被引:6,自引:2,他引:6       下载免费PDF全文
S Brown  G Thon    E Tolentino 《Journal of bacteriology》1989,171(12):6517-6520
A general strategy for cloning the functional homologs of an Escherichia coli gene was used to clone homologs of 4.5S RNA from other bacteria. The genes encoding these homologs were selected by their ability to complement a deletion of the gene for 4.5S RNA. DNA sequences of the regions encoding the homologs were determined. Since this approach does not require that the homologous genes hybridize with probes generated from the E. coli sequence, the sequences of the homologs were not all similar to the sequence of the E. coli gene. Despite the dissimilarity of the primary sequences of some of the homologs, all could be folded to obtain a similar structure.  相似文献   

2.
We focus on finding a consensus motif of a set of homologous or functionally related RNA molecules. Recent approaches to this problem have been limited to simple motifs, require sequence alignment, and make prior assumptions concerning the data set. We use genetic programming to predict RNA consensus motifs based solely on the data set. Our system -- dubbed GeRNAMo (Genetic programming of RNA Motifs) -- predicts the most common motifs without sequence alignment and is capable of dealing with any motif size. Our program only requires the maximum number of stems in the motif, and if prior knowledge is available the user can specify other attributes of the motif (e.g., the range of the motif's minimum and maximum sizes), thereby increasing both sensitivity and speed. We describe several experiments using either ferritin iron response element (IRE); signal recognition particle (SRP); or microRNA sequences, showing that the most common motif is found repeatedly, and that our system offers substantial advantages over previous methods.  相似文献   

3.
DNA regulatory sequences control gene expression by forming DNA-protein complex with specific DNA binding protein. A major task of studies of gene regulation is to identify DNA regulatory sequences in genome-wide. Especially with the rapid pace of genome project, the function of DNA regulatory sequences becomes one of the focuses in functional genome era. Several approaches for screening and characterizing DNA regulatory sequences emerged one by one, from initial low-throughput methods to high-throughput strategies. Even though at present bioinformatics tools facilitate the process of screening regulatory fragments, the most reliable results will come from experimental test. This article highlights some experimental methods for the identification of regulatory sequences. A brief review of the history and procedures for selection methods are provided. Tendency as well as limitation and extension of these methods are also presented.  相似文献   

4.

Background  

Identification of RNA homologs within genomic stretches is difficult when pairwise sequence identity is low or unalignable flanking residues are present. In both cases structure-sequence or profile/family-sequence alignment programs become difficult to apply because of unreliable RNA structures or family alignments. As such, local sequence-sequence alignment programs are frequently used instead. We have recently demonstrated that maximal expected accuracy alignments using partition function match probabilities (implemented in Probalign) are significantly better than contemporary methods on heterogeneous length protein sequence datasets, thus suggesting an affinity for local alignment.  相似文献   

5.

Background  

Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost.  相似文献   

6.
7.
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.  相似文献   

8.
Cluster-Buster: Finding dense clusters of motifs in DNA sequences   总被引:15,自引:2,他引:13       下载免费PDF全文
Frith MC  Li MC  Weng Z 《Nucleic acids research》2003,31(13):3666-3668
  相似文献   

9.
The world of regulatory RNAs is fast expanding into mainstream molecular biology as both a subject of intense mechanistic study and as a tool for functional characterization. The RNA world is one of complex structures that carry out catalysis, sense metabolites and synthesize proteins. The dynamic and structural nature of RNAs presents a whole new set of informatics challenges to the computational community. The ability to relate structure and dynamics to function will be key to understanding this complex world. I review several important classes of structured RNAs that present our community with a series of biologically novel informatics challenges. I also review available informatics tools that have been recently developed in the field.  相似文献   

10.
11.
12.
Finding composite regulatory patterns in DNA sequences   总被引:1,自引:0,他引:1  
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be 'too weak'. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data.  相似文献   

13.
14.
Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.  相似文献   

15.
Adenosine to inosine (A-to-I) RNA editing is the most abundant editing event in animals. It converts adenosine to inosine in double-stranded RNA regions through the action of the adenosine deaminase acting on RNA (ADAR) proteins. Editing of pre-mRNA coding regions can alter the protein codon and increase functional diversity. However, most of the A-to-I editing sites occur in the non-coding regions of pre-mRNA or mRNA and non-coding RNAs. Untranslated regions (UTRs) and introns are located in pre-mRNA non-coding regions, thus A-to-I editing can influence gene expression by nuclear retention, degradation, alternative splicing, and translation regulation. Non-coding RNAs such as microRNA (miRNA), small interfering RNA (siRNA) and long non-coding RNA (lncRNA) are related to pre-mRNA splicing, translation, and gene regulation. A-to-I editing could therefore affect the stability, biogenesis, and target recognition of non-coding RNAs. Finally, it may influence the function of non-coding RNAs, resulting in regulation of gene expression. This review focuses on the function of ADAR-mediated RNA editing on mRNA non-coding regions (UTRs and introns) and non-coding RNAs (miRNA, siRNA, and lncRNA).  相似文献   

16.
17.
Finding approximate tandem repeats in genomic sequences.   总被引:1,自引:0,他引:1  
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.  相似文献   

18.
Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.  相似文献   

19.
We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号