首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.  相似文献   

2.
3.
RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX.  相似文献   

4.
5.
6.
7.
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.  相似文献   

8.
Kim S  Wang Z  Dalkilic M 《Proteins》2007,66(3):671-681
The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster.  相似文献   

9.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

10.
The traditional way to infer RNA secondary structure involves an iterative process of alignment and evaluation of covariation statistics between all positions possibly involved in basepairing. Watson-Crick basepairs typically show covariations that score well when examples of two or more possible basepairs occur. This is not necessarily the case for non-Watson-Crick basepairing geometries. For example, for sheared (trans Hoogsteen/Sugar edge) pairs, one base is highly conserved (always A or mostly A with some C or U), while the other can vary (G or A and sometimes C and U as well). RNA motifs consist of ordered, stacked arrays of non-Watson-Crick basepairs that in the secondary structure representation form hairpin or internal loops, multi-stem junctions, and even pseudoknots. Although RNA motifs occur recurrently and contribute in a modular fashion to RNA architecture, it is usually not apparent which bases interact and whether it is by edge-to-edge H-bonding or solely by stacking interactions. Using a modular sequence-analysis approach, recurrent motifs related to the sarcin-ricin loop of 23S RNA and to loop E from 5S RNA were predicted in universally conserved regions of the large ribosomal RNAs (16S- and 23S-like) before the publication of high-resolution, atomic-level structures of representative examples of 16S and 23S rRNA molecules in their native contexts. This provides the opportunity to evaluate the predictive power of motif-level sequence analysis, with the goal of automating the process for predicting RNA motifs in genomic sequences. The process of inferring structure from sequence by constructing accurate alignments is a circular one. The crucial link that allows a productive iteration of motif modeling and realignment is the comparison of the sequence variations for each putative pair with the corresponding isostericity matrix to determine which basepairs are consistent both with the sequence and the geometrical data.  相似文献   

11.
MOTIVATION: Recently novel classes of functional RNAs, most prominently the miRNAs have been discovered, strongly suggesting that further types of functional RNAs are still hidden in the recently completed genomic DNA sequences. Only few techniques are known, however, to survey genomes for such RNA genes. When sufficiently similar sequences are not available for comparative approaches the only known remedy is to search directly for structural features. RESULTS: We present here efficient algorithms for computing locally stable RNA structures at genome-wide scales. Both the minimum energy structure and the complete matrix of base pairing probabilities can be computed in theta(N x L2) time and theta(N + L2) memory in terms of the length N of the genome and the size L of the largest secondary structure motifs of interest. In practice, the 100 Mb of the complete genome of Caenorhabditis elegans can be folded within about half a day on a modern PC with a search depth of L = 100. This is sufficient example for a survey for miRNAs. AVAILABILITY: The software described in this contribution will be available for download at http://www.tbi.univie.ac.at/~ivo/RNA/ as part of the Vienna RNA Package.  相似文献   

12.
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.  相似文献   

13.
SMotif is a server that identifies important structural segments or motifs for a given protein structure(s) based on conservation of both sequential as well as important structural features such as solvent inaccessibility, secondary structural content, hydrogen bonding pattern and residue packing. This server also provides three-dimensional orientation patterns of the identified motifs in terms of inter-motif distances and torsion angles. These motifs may form the common core and therefore, can also be employed to design and rationalize protein engineering and folding experiments. AVAILABILITY: SMotif server is available via the URL http://caps.ncbs.res.in/SMotif/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
RAG: RNA-As-Graphs database--concepts, analysis, and features   总被引:3,自引:0,他引:3  
MOTIVATION: Understanding RNA's structural diversity is vital for identifying novel RNA structures and pursuing RNA genomics initiatives. By classifying RNA secondary motifs based on correlations between conserved RNA secondary structures and functional properties, we offer an avenue for predicting novel motifs. Although several RNA databases exist, no comprehensive schemes are available for cataloguing the range and diversity of RNA's structural repertoire. RESULTS: Our RNA-As-Graphs (RAG) database describes and ranks all mathematically possible (including existing and candidate) RNA secondary motifs on the basis of graphical enumeration techniques. We represent RNA secondary structures as two-dimensional graphs (networks), specifying the connectivity between RNA secondary structural elements, such as loops, bulges, stems and junctions. We archive RNA tree motifs as 'tree graphs' and other RNAs, including pseudoknots, as general 'dual graphs'. All RNA motifs are catalogued by graph vertex number (a measure of sequence length) and ranked by topological complexity. The RAG inventory immediately suggests candidates for novel RNA motifs, either naturally occurring or synthetic, and thereby might stimulate the prediction and design of novel RNA motifs. AVAILABILITY: The database is accessible on the web at http://monod.biomath.nyu.edu/rna  相似文献   

15.
Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here, we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improve localization prediction on a benchmark data set of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions, we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the data set described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/.  相似文献   

16.
The Signal Recognition Particle Database (SRPDB).   总被引:1,自引:0,他引:1       下载免费PDF全文
The signal recognition particle database (SRPDB) is located at the University of Texas Health Science Center at Tyler and includes tabulations of SRP RNA, SRP protein and SRP receptor sequences. The sequences are annotated with links to the primary databases. They are ordered alphabetically or phylogenetically and are available in aligned form. As of September, 1998, there were 108 SRP RNA sequences, 83 SRP protein sequences and 28 sequences of the SRP receptor alpha subunit and its homologues. In addition, the SRPDB provides search motifs consisting of conserved amino acid and nucleotide residues, and a limited number of SRP RNA secondary structure diagrams and 3-D models. The data are available freely at the URL http://psyche.uthct.edu/dbs/SRPDB/SRPDB.++ +html  相似文献   

17.
Given the wealth of new RNA structures and the growing list of RNA functions in biology, it is of great interest to understand the repertoire of RNA folding motifs. The ability to identify new and known motifs within novel RNA structures, to compare tertiary structures with one another and to quantify the characteristics of a given RNA motif are major goals in the field of RNA research; however, there are few systematic ways to address these issues. Using a novel approach for visualizing and mathematically describing macromolecular structures, we have developed a means to quantitatively describe RNA molecules in order to rapidly analyze, compare and explore their features. This approach builds on the alternative eta,theta convention for describing RNA torsion angles and is executed using a new program called PRIMOS. Applying this methodology, we have successfully identified major regions of conformational change in the 50S and 30S ribosomal subunits, we have developed a means to search the database of RNA structures for the prevalence of known motifs and we have classified and identified new motifs. These applications illustrate the powerful capabilities of our new RNA structural convention, and they suggest future adaptations with important implications for bioinformatics and structural genomics.  相似文献   

18.

Background  

An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches.  相似文献   

19.
A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. RESULTS: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at http://meme.sdsc.edu.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号