首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Complex carbohydrates are known as mediators of complex cellular events. Concerning their structural diversity, their potential of information content is several orders of magnitude higher in a short sequence than any other biological macromolecule. SWEET-DB (http://www.dkfz.de/spec2/sweetdb/) is an attempt to use modern web techniques to annotate and/or cross-reference carbohydrate-related data collections which allow glycoscientists to find important data for compounds of interest in a compact and well-structured representation. Currently, reference data taken from three data sources can be retrieved for a given carbohydrate (sub)structure. The sources are CarbBank structures and literature references (linked to NCBI PubMed service), NMR data taken from SugaBase and 3D co-ordinates generated with SWEET-II. The main purpose of SWEET-DB is to enable an easy access to all data stored for one carbohydrate structure entering a complete sequence or parts thereof. Access to SWEET-DB contents is provided with the help of separate input spreadsheets for (sub)structures, bibliographic data, general structural data like molecular weight, NMR spectra and biological data. A detailed online tutorial is available at http://www.dkfz.de/spec2/sweetdb/nar/.  相似文献   

2.
3.
The function of many RNAs depends crucially on their structure. Therefore, the design of RNA molecules with specific structural properties has many potential applications, e.g. in the context of investigating the function of biological RNAs, of creating new ribozymes, or of designing artificial RNA nanostructures. Here, we present a new algorithm for solving the following RNA secondary structure design problem: given a secondary structure, find an RNA sequence (if any) that is predicted to fold to that structure. Unlike the (pseudoknot-free) secondary structure prediction problem, this problem appears to be hard computationally. Our new algorithm, "RNA Secondary Structure Designer (RNA-SSD)", is based on stochastic local search, a prominent general approach for solving hard combinatorial problems. A thorough empirical evaluation on computationally predicted structures of biological sequences and artificially generated RNA structures as well as on empirically modelled structures from the biological literature shows that RNA-SSD substantially out-performs the best known algorithm for this problem, RNAinverse from the Vienna RNA Package. In particular, the new algorithm is able to solve structures, consistently, for which RNAinverse is unable to find solutions. The RNA-SSD software is publically available under the name of RNA Designer at the RNASoft website (www.rnasoft.ca).  相似文献   

4.
The database, called HyPaLib (for Hybrid Pattern Library), contains annotated structural elements characteristic for certain classes of structural and/or functional RNAs. These elements are described in a language specifically designed for this purpose. The language allows convenient specification of hybrid patterns, i.e. motifs consisting of sequence features and structural elements together with sequence similarity and thermodynamic constraints. We are currently developing software tools that allow a user to search sequence databases for any pattern in HyPaLib, thus providing functionality which is similar to PROSITE, but dedicated to the more complex patterns in RNA sequences. HyPaLib is available at http://bibiserv. techfak.uni-bielefeld.de/HyPa/.  相似文献   

5.
Due to large sizes and complex nature, few large macromolecular complexes have been solved to atomic resolution. This has lead to an under-representation of these structures, which are composed of novel and/or homologous folds, in the library of known structures and folds. While it is often difficult to achieve a high-resolution model for these structures, X-ray crystallography and electron cryomicroscopy are capable of determining structures of large assemblies at low to intermediate resolutions. To aid in the interpretation and analysis of such structures, we have developed two programs: helixhunter and foldhunter. Helixhunter is capable of reliably identifying helix position, orientation and length using a five-dimensional cross-correlation search of a three-dimensional density map followed by feature extraction. Helixhunter's results can in turn be used to probe a library of secondary structure elements derived from the structures in the Protein Data Bank (PDB). From this analysis, it is then possible to identify potential homologous folds or suggest novel folds based on the arrangement of alpha helix elements, resulting in a structure-based recognition of folds containing alpha helices. Foldhunter uses a six-dimensional cross-correlation search allowing a probe structure to be fitted within a region or component of a target structure. The structural fitting therefore provides a quantitative means to further examine the architecture and organization of large, complex assemblies. These two methods have been successfully tested with simulated structures modeled from the PDB at resolutions between 6 and 12 A. With the integration of helixhunter and foldhunter into sequence and structural informatics techniques, we have the potential to deduce or confirm known or novel folds in domains or components within large complexes.  相似文献   

6.
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.  相似文献   

7.
We have written a programming language OCL (Object Command Language) to solve, in a general way, two recurring problems that arise during the construction of molecular models and during the geometrical characterization of macromolecules: how to move precisely and reproducibly any part of a molecular model in any user-defined local reference axes; and how to calculate standard or userdefined structural parameters that characterize the complex geometries of any macromolecule. OCL endows the user with three main capabilities: the definition of subsets of the macromolecule, called objects in OCL, with a formalism from elementary set theory or lexical analysis; the definition of sequences of elementary geometrical operations, called procedures in OCL, enabling one to build arbitrary three-dimensional (3D) orthonormal reference frames, to be associated with previously defined objects; and the transmission of these definitions to programs that allow one to display, to modify and to analyze interactively the molecular structure, or to programs that perform energy minimizations or molecular dynamics. Several applications to nucleic acids are presented.  相似文献   

8.
To address many challenges in RNA structure/function prediction, the characterization of RNA''s modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.  相似文献   

9.
RNAMotif, an RNA secondary structure definition and search algorithm   总被引:26,自引:7,他引:19       下载免费PDF全文
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base–base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.  相似文献   

10.
DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Cα deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein.  相似文献   

11.
Reinhardt A  Eisenberg D 《Proteins》2004,56(3):528-538
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.  相似文献   

12.
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein''s three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.  相似文献   

13.
The FSSP database of structurally aligned protein fold families.   总被引:17,自引:0,他引:17       下载免费PDF全文
L Holm  C Sander 《Nucleic acids research》1994,22(17):3600-3609
FSSP (families of structurally similar proteins) is a database of structural alignments of proteins in the Protein Data Bank (PDB). The database currently contains an extended structural family for each of 330 representative protein chains. Each data set contains structural alignments of one search structure with all other structurally significantly similar proteins in the representative set (remote homologs, < 30% sequence identity), as well as all structures in the Protein Data Bank with 70-30% sequence identity relative to the search structure (medium homologs). Very close homologs (above 70% sequence identity) are excluded as they rarely have marked structural differences. The alignments of remote homologs are the result of pairwise all-against-all structural comparisons in the set of 330 representative protein chains. All such comparisons are based purely on the 3D co-ordinates of the proteins and are derived by automatic (objective) structure comparison programs. The significance of structural similarity is estimated based on statistical criteria. The FSSP database is available electronically from the EMBL file server and by anonymous ftp (file transfer protocol).  相似文献   

14.
RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at http://www.daimi.au.dk/~compbio/pfold.  相似文献   

15.
MOTIVATION: The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. RESULTS: We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. AVAILABILITY: www.bioinf.uni-freiburg.de?Subpages/software.html.  相似文献   

16.
We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.  相似文献   

17.
本文介绍欧洲分子生物学开放软件包EMBOSS序列分析程序应用实例.第1节简单介绍EMBOSS软件包的概况和基本用法.第2节介绍格式转换、序列提取、序列变换和序列显示等常用序列处理程序.第3节介绍序列比对程序,包括双序列比对、多序列比对和点阵图程序.第4节介绍常用核酸序列分析程序,可用于核苷酸组分统计、开放读码框分析、C...  相似文献   

18.
The increased interest in chemical cross-linking for probing protein structure and interaction has led to a large increase in literature describing new cross-linkers and search programs. However, this has not led to a corresponding increase in the analysis of large and complex proteins. A major obstacle is that the new cross-linkers are either not readily available and/or have a low reactivity. In combination with aging search programs that are slow and have low sensitivity, or new search programs that are described but not released, these efforts do little to advance the field of cross-linking. Here we present a method pipeline for chemical cross-linking, using two standard cross-linkers, BS3 and BS2G, combined with our freely available CrossWork search program. By this approach we generate cross-link data sufficient to derive structural information for large and complex proteins. CrossWork searches batches of tandem mass-spectrometric data, and identifies cross-linked and non-cross-linked peptides using a standard PC. We tested CrossWork by searching mass-spectrometric datasets of cross-linked complement factor C3 against small (1 protein) and large (1000 proteins) search spaces, and show that the resulting distance constraints agree with the established structures. We further investigated the structure of the multi-domain ERp72, and combined the individual domains of ERp72 into a single structure.  相似文献   

19.

Background  

Identifying structurally similar proteins with different chain topologies can aid studies in homology modeling, protein folding, protein design, and protein evolution. These include circular permuted protein structures, and the more general cases of non-cyclic permutations between similar structures, which are related by non-topological rearrangement beyond circular permutation. We present a method based on an approximation algorithm that finds sequence-order independent structural alignments that are close to optimal. We formulate the structural alignment problem as a special case of the maximum-weight independent set problem, and solve this computationally intensive problem approximately by iteratively solving relaxations of a corresponding integer programming problem. The resulting structural alignment is sequence order independent. Our method is also insensitive to insertions, deletions, and gaps.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号