首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The existence and identity of non-Watson-Crick base pairs (bps) within RNA bulges, internal loops, and hairpin loops cannot reliably be predicted by existing algorithms. We have developed the Isfold (Isosteric Folding) program as a tool to examine patterns of nucleotide substitutions from sequence alignments or mutation experiments and identify plausible bp interactions. We infer these interactions based on the observation that each non-Watson-Crick bp has a signature pattern of isosteric substitutions where mutations can be made that preserve the 3D structure. Isfold produces a dynamic representation of predicted bps within defined motifs in order of their probabilities. The software was developed under Windows XP, and is capable of running on PC and MAC with Matlab 7.1 (SP3) or higher. A PC stand-alone version that does not require Matlab also is available. This software and a user manual are freely available at www.ucsf.edu/frankel/isfold.  相似文献   

2.
Abstract

The existence and identity of non-Watson-Crick base pairs (bps) within RNA bulges, internal loops, and hairpin loops cannot reliably be predicted by existing algorithms. We have developed the Isfold (Isosteric Folding) program as a tool to examine patterns of nucleotide substitutions from sequence alignments or mutation experiments and identify plausible bp interactions. We infer these interactions based on the observation that each non-Watson-Crick bp has a signature pattern of isosteric substitutions where mutations can be made that preserve the 3D structure. Isfold produces a dynamic representation of predicted bps within defined motifs in order of their probabilities. The software was developed under Windows XP, and is capable of running on PC and MAC with Matlab 7.1 (SP3) or higher. A PC standalone version that does not require Matlab also is available. This software and a user manual are freely available at www.ucsf.edu/frankel/isfold.  相似文献   

3.
Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/REFINER) and will be incorporated into the next release of the Cn3D structure/alignment viewer.  相似文献   

4.
Semiautomated improvement of RNA alignments   总被引:1,自引:0,他引:1  
We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at (http://sarse.ku.dk).  相似文献   

5.
SUMMARY: Efficient RNA sequence manipulations (such as multiple alignments) need to be constrained by rules of RNA structure folding. The structural knowledge has increased dramatically in the last years with the accumulation of several large RNA structures similar to those of the bacterial ribosome subunits. However, no tool in the RNA community provides an easy way to link and integrate progress made at the sequence level using the available three-dimensional information. Sequence to Structure (S2S) proposes a framework in which an user can easily display, manipulate and interconnect heterogeneous RNA data, such as multiple sequence alignments, secondary and tertiary structures. S2S has been implemented using the Java language and has been developed and tested under UNIX systems, such as Linux and MacOSX. AVAILABILITY: S2S is available at http://bioinformatics.org/S2S/.  相似文献   

6.

Background

Structured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation. Yet, many homologous structured RNAs display most of their conservation at the secondary or tertiary structure level. As a result, strategies for structured RNA discovery rely heavily on identification of sequences sharing a common stable secondary structure. However, correctly distinguishing structured RNAs from surrounding genomic sequence remains challenging, especially during de novo discovery. RNA also has a long history as a computational model for evolution due to the direct link between genotype (sequence) and phenotype (structure). From these studies it is clear that evolved RNA structures, like protein structures, can be considered robust to point mutations. In this context, an RNA sequence is considered robust if its neutrality (extent to which single mutant neighbors maintain the same secondary structure) is greater than that expected for an artificial sequence with the same minimum free energy structure.

Results

In this work, we bring concepts from evolutionary biology to bear on the structured RNA de novo discovery process. We hypothesize that alignments corresponding to structured RNAs should consist of neutral sequences. We evaluate several measures of neutrality for their ability to distinguish between alignments of structured RNA sequences drawn from Rfam and various decoy alignments. We also introduce a new measure of RNA structural neutrality, the structure ensemble neutrality (SEN). SEN seeks to increase the biological relevance of existing neutrality measures in two ways. First, it uses information from an alignment of homologous sequences to identify a conserved biologically relevant structure for comparison. Second, it only counts base-pairs of the original structure that are absent in the comparison structure and does not penalize the formation of additional base-pairs.

Conclusion

We find that several measures of neutrality are effective at separating structured RNAs from decoy sequences, including both shuffled alignments and flanking genomic sequence. Furthermore, as an independent feature classifier to identify structured RNAs, SEN yields comparable performance to current approaches that consider a variety of features including stability and sequence identity. Finally, SEN outperforms other measures of neutrality at detecting mutational robustness in bacterial regulatory RNA structures.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-014-1203-8) contains supplementary material, which is available to authorized users.  相似文献   

7.
SUMMARY: The explosion of interest in non-coding RNAs, together with improvements in RNA X-ray crystallography, has led to a rapid increase in RNA structures at atomic resolution from 847 in 2005 to 1900 in 2010. The success of whole-genome sequencing has led to an explosive growth of unaligned homologous sequences. Consequently, there is a compelling and urgent need for user-friendly tools for producing structure-informed RNA alignments. Most alignment software considers the primary sequence alone; some specialized alignment software can also include Watson-Crick base pairs, but none adequately addresses the needs introduced by the rapid influx of both sequence and structural data. Therefore, we have developed the Boulder ALignment Editor (ALE), which is a web-based RNA alignment editor, designed for editing and assessing alignments using structural information. Some features of BoulderALE include the annotation and evaluation of an alignment based on isostericity of Watson-Crick and non-Watson-Crick base pairs, along with the collapsing (horizontally and vertically) of the alignment, while maintaining the ability to edit the alignment. AVAILABILITY: http://www.microbio.me/boulderale.  相似文献   

8.
This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedicated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the latest version of the T-Coffee package. Given a set of unaligned sequences, the server returns an evaluated multiple sequence alignment and the associated phylogenetic tree. This server also makes it possible to evaluate the local reliability of an existing alignment and to combine several alternative multiple alignments into a single new one. Tcoffee@igs can be used for aligning protein, RNA or DNA sequences. Datasets of up to 100 sequences (2000 residues long) can be processed. The server and its documentation are available from: http://igs-server.cnrs-mrs.fr/Tcoffee/.  相似文献   

9.
Multiple sequence alignments are powerful tools for understanding the structures, functions, and evolutionary histories of linear biological macromolecules (DNA, RNA, and proteins), and for finding homologs in sequence databases. We address several ontological issues related to RNA sequence alignments that are informed by structure. Multiple sequence alignments are usually shown as two-dimensional (2D) matrices, with rows representing individual sequences, and columns identifying nucleotides from different sequences that correspond structurally, functionally, and/or evolutionarily. However, the requirement that sequences and structures correspond nucleotide-by-nucleotide is unrealistic and hinders representation of important biological relationships. High-throughput sequencing efforts are also rapidly making 2D alignments unmanageable because of vertical and horizontal expansion as more sequences are added. Solving the shortcomings of traditional RNA sequence alignments requires explicit annotation of the meaning of each relationship within the alignment. We introduce the notion of “correspondence,” which is an equivalence relation between RNA elements in sets of sequences as the basis of an RNA alignment ontology. The purpose of this ontology is twofold: first, to enable the development of new representations of RNA data and of software tools that resolve the expansion problems with current RNA sequence alignments, and second, to facilitate the integration of sequence data with secondary and three-dimensional structural information, as well as other experimental information, to create simultaneously more accurate and more exploitable RNA alignments.  相似文献   

10.
SUMMARY: We present a web server that computes alignments of protein secondary structures. The server supports both performing pairwise alignments and searching a secondary structure against a library of domain folds. It can calculate global and local secondary structure element alignments. A combination of local and global alignment steps can be used to search for domains inside the query sequence or help in the discrimination of novel folds. Both the SCOP and PDB fold libraries, clustered at 95 and 40% sequence identity, are available for alignment. AVAILABILITY: The web server interface is freely accessible to academic users at http://protein.cribi.unipd.it/ssea/. The executable version and benchmarking data are available from the same web page.  相似文献   

11.
SUMMARY: The main source of hypotheses on the structure and function of new proteins is their homology to proteins with known properties. Homologous relationships are typically established through sequence similarity searches, multiple alignments and phylogenetic reconstruction. In cases where the number of potential relationships is large, for example in P-loop NTPases with many thousands of members, alignments and phylogenies become computationally demanding, accumulate errors and lose resolution. In search of a better way to analyze relationships in large sequence datasets we have developed a Java application, CLANS (CLuster ANalysis of Sequences), which uses a version of the Fruchterman-Reingold graph layout algorithm to visualize pairwise sequence similarities in either two-dimensional or three-dimensional space. AVAILABILITY: CLANS can be downloaded at http://protevo.eb.tuebingen.mpg.de/download.  相似文献   

12.
MOTIVATION: Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e. BAliBASE, OXBench, PREFAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been well received by researchers and help to quantitatively evaluate MSA programs on protein sequences. Unfortunately, analogous DNA benchmarks are not available, making evaluation of MSA programs difficult for DNA sequences. RESULTS: This work presents the first known multiple DNA sequence alignment benchmarks that are (1) comprised of protein-coding portions of DNA (2) based on biological features such as the tertiary structure of encoded proteins. These reference DNA databases contain a total of 3545 alignments, comprising of 68 581 sequences. Two versions of the database are available: mdsa_100s and mdsa_all. The mdsa_100s version contains the alignments of the data sets that TBLASTN found 100% sequence identity for each sequence. The mdsa_all version includes all hits with an E-value score above the threshold of 0.001. A primary use of these databases is to benchmark the performance of MSA applications on DNA data sets. The first such case study is included in the Supplementary Material.  相似文献   

13.
MOTIVATION: Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. RESULTS: Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.  相似文献   

14.
The signal recognition particle database (SRPDB).   总被引:5,自引:3,他引:2       下载免费PDF全文
The SRPDB (signal recognition particle database) provides annotated SRP RNA sequences from Eucaryotes and Archaea, phylogenetically ordered and aligned with their bacterial equivalents. We also make available representative RNA secondary structure diagrams, where each base pair is proven by comparative sequence analysis. New to this release are 17 SRP RNA sequences (a total of 64 sequences) and alignments of proteins SRP19 and SRP54 with their RNA binding sites.  相似文献   

15.
16.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

17.
Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2 x m2 x max(n,m)) and a space complexity of only O(n x m). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment.  相似文献   

18.
A structure-based method for protein sequence alignment   总被引:1,自引:0,他引:1  
MOTIVATION: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database. RESULTS: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family. AVAILABILITY: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer. CONTACT: bryant@ncbi.nlm.nih.gov.  相似文献   

19.
Alignment of RNA base pairing probability matrices   总被引:6,自引:0,他引:6  
MOTIVATION: Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics. RESULTS: We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment. AVAILABILITY: The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from http://www.tbi.univie.ac.at/RNA/PMcomp/. A web server is available at http://rna.tbi.univie.ac.at/cgi-bin/pmcgi.pl  相似文献   

20.

Background

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.

Results

We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software.

Conclusions

SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号