首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.  相似文献   

2.
Hu YJ 《Nucleic acids research》2003,31(13):3446-3449
RNA molecules play an important role in many biological activities. Knowing its secondary structure can help us better understand the molecule's ability to function. The methods for RNA structure determination have traditionally been implemented through biochemical, biophysical and phylogenetic analyses. As the advance of computer technology, an increasing number of computational approaches have recently been developed. They have different goals and apply various algorithms. For example, some focus on secondary structure prediction for a single sequence; some aim at finding a global alignment of multiple sequences. Some predict the structure based on free energy minimization; some make comparative sequence analyses to determine the structure. In this paper, we describe how to correctly use GPRM, a genetic programming approach to finding common secondary structure elements in a set of unaligned coregulated or homologous RNA sequences. GPRM can be accessed at http://bioinfo.cis.nctu.edu.tw/service/gprm/.  相似文献   

3.

Background

For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results

We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion

RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.
  相似文献   

4.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

5.
6.
Glutathione synthetase (gshB) has previously been reported to confer tolerance to acidic soil condition in Rhizobium species. Cloning the gene coding for this enzyme necessitates the designing of proper primer sets which in turn depends on the identification of high quality sequence similarity in multiple global alignments. In this experiment, a group of homologous gene sequences related to gshB gene (accession no: gi-86355669:327589-328536) of Rhizobium etli CFN 42, were extracted from NCBI nucleotide sequence databases using BLASTN and were analyzed for designing degenerate primers. However, the T-coffee multiple global alignment results did not show any block of conserved region for the above sequence set to design the primers. Therefore, we attempted to identify the location of common motif region based on multiple local alignments employing the MEME algorithm supported with MAST and Primer3. The results revealed some common motif regions that enabled us to design the primer sets for related gshB gene sequences. The result will be validated in wet lab.  相似文献   

7.
8.
Genetic programming is a technique that can be used to tackle the hugely demanding data-processing problems encountered in the natural sciences. Application of genetic programming to a problem using parasites as biological tags demonstrates its potential for developing explanatory models using data that are both complex and noisy.  相似文献   

9.
In the last few years a novel RNA folding principle called pseudoknotting has emerged. Originally discovered in noncoding regions of plant viral RNAs, pseudoknots now appear to be a widespread structural motif in a number of functionally different RNAs. These structural elements are part of tRNA-like structures and are involved in folding catalytic sites of ribozymes. They increase the efficiency of ribosomal frameshifting or can serve as specific binding sites for regulatory proteins.  相似文献   

10.
11.
Bulged-G motifs are ubiquitous internal RNA loops that provide specific recognition sites for proteins and RNAs. To establish the common and distinctive features of the motif we determined the structures of three variants and compared them with related structures. The variants are 27-nt mimics of the sarcin/ricin loop (SRL) from Escherichia coli 23S ribosomal RNA that is an essential part of the binding site for elongation factors (EFs). The wild-type SRL has now been determined at 1.04 Å resolution, supplementing data obtained before at 1.11 Å and allowing the first calculation of coordinate error for an RNA motif. The other two structures, having a viable (C2658UG2663A) or a lethal mutation (C2658G G2663C), were determined at 1.75 and 2.25 Å resolution, respectively. Comparisons reveal that bulged-G motifs have a common hydration and geometry, with flexible junctions at flanking structural elements. Six conserved nucleotides preserve the fold of the motif; the remaining seven to nine vary in sequence and alter contacts in both grooves. Differences between accessible functional groups of the lethal mutation and those of the viable mutation and wild-type SRL may account for the impaired elongation factor binding to ribosomes with the C2658GG2663C mutation and may underlie the lethal phenotype.  相似文献   

12.
Given the wealth of new RNA structures and the growing list of RNA functions in biology, it is of great interest to understand the repertoire of RNA folding motifs. The ability to identify new and known motifs within novel RNA structures, to compare tertiary structures with one another and to quantify the characteristics of a given RNA motif are major goals in the field of RNA research; however, there are few systematic ways to address these issues. Using a novel approach for visualizing and mathematically describing macromolecular structures, we have developed a means to quantitatively describe RNA molecules in order to rapidly analyze, compare and explore their features. This approach builds on the alternative eta,theta convention for describing RNA torsion angles and is executed using a new program called PRIMOS. Applying this methodology, we have successfully identified major regions of conformational change in the 50S and 30S ribosomal subunits, we have developed a means to search the database of RNA structures for the prevalence of known motifs and we have classified and identified new motifs. These applications illustrate the powerful capabilities of our new RNA structural convention, and they suggest future adaptations with important implications for bioinformatics and structural genomics.  相似文献   

13.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

14.
We present a stochastic programming framework for finding the optimal vaccination policy for controlling infectious disease epidemics under parameter uncertainty. Stochastic programming is a popular framework for including the effects of parameter uncertainty in a mathematical optimization model. The problem is initially formulated to find the minimum cost vaccination policy under a chance-constraint. The chance-constraint requires that the probability that R(*) 相似文献   

15.
16.
The kink-turn: a new RNA secondary structure motif   总被引:29,自引:0,他引:29  
Analysis of the Haloarcula marismortui large ribosomal subunit has revealed a common RNA structure that we call the kink-turn, or K-turn. The six K-turns in H.marismortui 23S rRNA superimpose with an r.m.s.d. of 1.7 A. There are two K-turns in the structure of Thermus thermophilus 16S rRNA, and the structures of U4 snRNA and L30e mRNA fragments form K-turns. The structure has a kink in the phosphodiester backbone that causes a sharp turn in the RNA helix. Its asymmetric internal loop is flanked by C-G base pairs on one side and sheared G-A base pairs on the other, with an A-minor interaction between these two helical stems. A derived consensus secondary structure for the K-turn includes 10 consensus nucleotides out of 15, and predicts its presence in the 5'-UTR of L10 mRNA, helix 78 in Escherichia coli 23S rRNA and human RNase MRP. Five K-turns in 23S rRNA interact with nine proteins. While the observed K-turns interact with proteins of unrelated structures in different ways, they interact with L7Ae and two homologous proteins in the same way.  相似文献   

17.
Large-scale association studies hold promise for discovering the genetic basis of common human disease. These studies will consist of a large number of individuals, as well as large number of genetic markers, such as single nucleotide polymorphisms (SNPs). The potential size of the data and the resulting model space require the development of efficient methodology to unravel associations between phenotypes and SNPs in dense genetic maps. Our approach uses a genetic algorithm (GA) to construct logic trees consisting of Boolean expressions involving strings or blocks of SNPs. These blocks or nodes of the logic trees consist of SNPs in high linkage disequilibrium (LD), that is, SNPs that are highly correlated with each other due to evolutionary processes. At each generation of our GA, a population of logic tree models is modified using selection, cross-over and mutation moves. Logic trees are selected for the next generation using a fitness function based on the marginal likelihood in a Bayesian regression frame-work. Mutation and cross-over moves use LD measures to pro pose changes to the trees, and facilitate the movement through the model space. We demonstrate our method and the flexibility of logic tree structure with variable nodal lengths on simulated data from a coalescent model, as well as data from a candidate gene study of quantitative genetic variation.  相似文献   

18.
Cell-free synthesis of recombinant proteins has emerged as an alternative method of protein production although protein yields still cannot compete with in vivo expression techniques. In systems based on S30 extracts of Escherichia coli unfavorable side-reactions are involved in limiting protein yields. Therefore, carrying out cell-free reactions at lower temperatures might be beneficial as side reactions should be decreased. In this study we show that by using the 5′-untranslated region of the cold-shock gene cspA from E. coli as mRNA leader in cell-free reactions, the expression temperature can be decreased and simultaneously leads to an increase in protein yields. A compensation for the lower activity of T7 RNA polymerase at lower temperatures enhances protein synthesis even further. Additionally, this 5′-untranslated region also standardizes the optimal expression temperature of different proteins.  相似文献   

19.
Analysis of the available crystal structures of the ribosome and of its subunits has revealed a new RNA motif that we call G-ribo. The motif consists of two double helices positioned side-by-side and connected by an unpaired region. The juxtaposition of the two helices is kept by a complex system of tertiary interactions spread over several layers of stacked nucleotides. In the center of this arrangement, the ribose of a nucleotide from one helix is specifically packed with the ribose and the minor-groove edge of a guanosine from the other helix. In total, we found eight G-ribo motifs in both ribosomal subunits. The location of these motifs suggests that at least some of them play an important role in the formation of the ribosome structure and/or in its function.  相似文献   

20.
C C Query  R C Bentley  J D Keene 《Cell》1989,57(1):89-101
We have defined the RNA binding domain of the 70K protein component of the U1 small nuclear ribonucleoprotein to a region of 111 amino acids. This domain encompasses an octamer sequence that has been observed in other proteins associated with RNA, but has not previously been shown to bind directly to a specific RNA sequence. Within the U1 RNA binding domain, an 80 amino acid consensus sequence that is conserved in many presumed RNA binding proteins was discerned. This sequence pattern appears to represent an RNA recognition motif (RRM) characteristic of a distinct family of proteins. By site-directed mutagenesis, we determined that the 70K protein consists of 437 amino acids (52 kd), and found that its aberrant electrophoretic migration is due to a carboxy-terminal charged domain structurally similar to two Drosophila proteins (su(wa) and tra) that may regulate alternative pre-messenger RNA splicing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号