首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
To provide support for the analysis of biochemical pathways a database system based on a model that represents the characteristics of the domain is needed. This domain has proven to be difficult to model by using conventional data modelling techniques. We are building an ontology for biochemical pathways, which acts as the basis for the generation of a database on the same domain, allowing the definition of complex queries and complex data representation. The ontology is used as a modelling and analysis tool which allows the expression of complex semantics based on a first-order logic representation language. The induction capabilities of the system can help the scientist in formulating and testing research hypotheses that are difficult to express with the standard relational database mechanisms. An ontology representing the shared formalisation of the knowledge in a scientific domain can also be used as data integration tool clarifying the mapping of concepts to the developers of different databases. In this paper we describe the general structure of our system, concentrating on the ontology-based database as the key component of the system.  相似文献   

A relational database of protein structure has been developed to enable rapid and flexible enquiries about the occurrence of many aspects of protein architecture. The coordinates of 294 proteins from the Brookhaven Data Bank have been processed by standard computer programs to generate many additional terms that quantify aspects of protein structure. These terms include solvent accessibility, main-chain and side-chain dihedral angles, and secondary structure. In a relational database, the information is stored in tables with columns holding the different terms and rows holding the different entries for the terms. The different relational base tables store the information about the protein coordinate set, the different chains in the protein, the amino acid residues and ligands, the atomic coordinates, the salt bridges, the hydrogen bonds, the disulphide bridges and the close tertiary contacts. The database was established under ORACLE management system. Enquiries are constructed in ORACLE using SQL (structured query language) which is simple to use and alleviates the need for extensive computer programs. A single table can be searched for entries that meet various criteria, e.g. all protein solved to better than a given resolution. The power of the database occurs when several tables, or the entries in a single table, are cross-correlated. For example the dihedral angles of proline in the fourth position in an alpha-helix in high resolution structures can be rapidly obtained. The structural database provides a powerful tool to obtain empirical rules about protein conformation. This database of protein structures is part of a joint project between Birkbeck College and Leeds University to establish an integrated data resource of protein sequences and structures (ISIS) that encodes the complex patterns of residues and coordinates that define protein conformation. The entire data resource (ISIS) will provide a system to guide all areas of protein modelling including structure prediction, site-directed mutagenesis and de novo protein design. The availability of ISIS is described in the paper.  相似文献   

A procedure for finding clusters of adjacent residues in proteinhydrophobic cores—hydrophobic microdomains—has beenproposed by Plochocka et al. A program is presented that findshydrophobic microdomains, making use of protein structure datastored in an object-oriented database and the list-processingfeatures of Prolog. Alternative definitions for hydrophobicmicrodomains are explored. Results are presented for haemoglobin. Received on January 15, 1990; accepted on June 28, 1990  相似文献   

A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a largely automated set of processes in which redundant and trivially different entries are eliminated. A modular approach has been adopted to allow scientific judgement to be used at each stage of database processing and amalgamation. Source databases are assigned a priority depending on the quality of sequence validation and commenting. Rejection of entries from the lower priority database, in each pairwise comparison of databases, is carried out according to optionally defined redundancy criteria based on sequence segment mismatches. Efficient algorithms for this methodology are embodied in the COMPO software system. COMPO has been applied for over 2 years in construction and regular updating of the OWL composite protein sequence database from the source databases NBRF-PIR, SWISS-PROT, a GenBank translation retrieved from the feature tables, NBRF-NEW, NEWAT86, PSD-KYOTO and the sequences contained in the Brookhaven protein structure databank. OWL is part of the ISIS integrated data resource of protein sequence and structure [Akrigg et al. (1988) Nature, 335, 745-746]. The modular nature of the integration process greatly facilitates the frequent updating of OWL following releases of the source databases. The extent of redundancy in these sources is revealed by the comparison process. The advantages of a robust composite database for sequence similarity searching and information retrieval are discussed.  相似文献   

We describe a series of databases and tools that directly or indirectly support biomedical research on macromolecules, with focus on their applicability in protein structure bioinformatics research. DSSP, that determines secondary structures of proteins, has been updated to work well with extremely large structures in multiple formats. The PDBREPORT database that lists anomalies in protein structures has been remade to remove many small problems. These reports are now available as PDF‐formatted files with a computer‐readable summary. The VASE software has been added to analyze and visualize HSSP multiple sequence alignments for protein structures. The Lists collection of databases has been extended with a series of databases, most noticeably with a database that gives each protein structure a grade for usefulness in protein structure bioinformatics projects. The PDB‐REDO collection of reanalyzed and re‐refined protein structures that were solved by X‐ray crystallography has been improved by dealing better with sugar residues and with hydrogen bonds, and adding many missing surface loops. All academic software underlying these protein structure bioinformatics applications and databases are now publicly accessible, either directly from the authors or from the GitHub software repository.  相似文献   

MOTIVATION: Circular dichroism (CD) spectroscopy has become established as a key method for determining the secondary structure contents of proteins which has had a significant impact on molecular biology. Many excellent mathematical protocols have been developed for this purpose and their quality is above question. However, reference database sets of proteins, with CD spectra matched to secondary structure components derived from X-ray structures, provide the key resource for this task. These databases were created many years ago, before most CD spectrophotometers became standardized and before it was commonplace to validate X-ray structures prior to publication. The analyses presented here were undertaken to investigate the overall quality of these reference databases in light of their extensive usage in determining protein secondary structure content from CD spectra. RESULTS: The analyses show that there are a number of significant problems associated with the CD reference database sets in current use. There are disparities between CD spectra for the same protein collected by different groups. These include differences in magnitudes, peak positions or both. However, many current reference sets are now amalgamations of spectra from these groups, introducing inconsistencies that can lead to inaccuracies in the determination of secondary structure components from the CD spectra. A number of the X-ray structures used fall short on the validation criteria now employed as standard for structure determination. Many have substantial percentages of residues in the disallowed regions of the Ramachandran plot. Hence their calculated secondary structure components, used as a foundation for the reference databases, are likely to be in error. Additionally, the coverage of secondary structure space in the reference datasets is poorly correlated to the secondary structure components found in the Protein Data Bank. A conclusion is that a new reference CD database with cross-correlated, machine-independent CD spectra and validated X-ray structures that cover more secondary structure components, including diverse protein folds, is now needed. However, that reasonably accurate values for the secondary structure content of proteins can be determined from spectra is a testament to CD spectroscopy being a very powerful technique.  相似文献   

BACKGROUND: Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures. RESULTS: Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented. CONCLUSIONS: Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP.  相似文献   

With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.  相似文献   

Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein–DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has considerable internal flexibility’s; (b) the three dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N–H as well as C=O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500–530 D (1 D=3.33×10−30 Cm) at pH 7.5 and 600–630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.  相似文献   

GOBASE: the organelle genome database   总被引:3,自引:1,他引:2  

We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases.  相似文献   

A relational database structure based on MS-Access and MySQL to store and manage proteomics data was established. This system may be used to publish two-dimensional electrophoretic proteomics data, and also may be accessed by external users who want to compare their own data with those in the databases. The maintenance of the database is managed centrally. The producers of proteomics data do not need to construct a database themselves. Users can introduce mass spectra into the database, which allows the searching of peptide mass fingerprints against their own protein sequence databases. The first release published in January 2002 contains data from Mycobacterium tuberculosis, Helicobacter pylori, Borrelia garinii, Francisella tularensis, Chlamydia pneumoniae, Mycoplasma pneumoniae, Jurkat T-cells and mouse mammary gland projects (http://www.mpiib-berlin. mpg.de/2D-PAGE/).  相似文献   

MOTIVATION: A large body of experimental and theoretical evidence suggests that local structural determinants are frequently encoded in short segments of protein sequence. Although the local structural information, once recognized, is particularly useful in protein structural and functional analyses, it remains a difficult problem to identify embedded local structural codes based solely on sequence information. RESULTS: In this paper, we describe a local structure prediction method aiming at predicting the backbone structures of nine-residue sequence segments. Two elements are the keys for this local structure prediction procedure. The first key element is the LSBSP1 database, which contains a large number of non-redundant local structure-based sequence profiles for nine-residue structure segments. The second key element is the consensus approach, which identifies a consensus structure from a set of hit structures. The local structure prediction procedure starts by matching a query sequence segment of nine consecutive amino acid residues to all the sequence profiles in the local structure-based sequence profile database (LSBSP1). The consensus structure, which is at the center of the largest structural cluster of the hit structures, is predicted to be the native state structure adopted by the query sequence segment. This local structure prediction method is assessed with a large set of random test protein structures that have not been used in constructing the LSBSP1 database. The benchmark results indicate that the prediction capacities of the novel local structure prediction procedure exceed the prediction capacities of the local backbone structure prediction methods based on the I-sites library by a significant margin. AVAILABILITY: All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/.  相似文献   

A database for cell signaling networks.   总被引:3,自引:0,他引:3  
We developed a data and knowledge base for cellular signal transduction in human cells, to make this rapidly growing information available. The database includes all the biological properties of cellular signal transduction, including biological reactions that transfer cellular signals and molecular attributes characterized by sequences, structures, and functions. Since the database is based on the object-oriented technique, highly flexible methods of data definition and modification are necessary to handle this diverse and complex biological information. The database includes attractive graphical representations of signaling cascades and the three-dimensional structure of molecules. The database is a novel application of ACEDB, which was the database originally developed to store the C. elegans genome. The database can be accessed through the Internet at http://geo.nihs.go.jp/csndb.html.  相似文献   

The Protein Mutant Database.   总被引:3,自引:0,他引:3       下载免费PDF全文
Currently the protein mutant database (PMD) contains over 81 000 mutants, including artificial as well as natural mutants of various proteins extracted from about 10 000 articles. We recently developed a powerful viewing and retrieving system (http://pmd.ddbj.nig.ac.jp), which is integrated with the sequence and tertiary structure databases. The system has the following features: (i) mutated sequences are displayed after being automatically generated from the information described in the entry together with the sequence data of wild-type proteins integrated. This is a convenient feature because it allows one to see the position of altered amino acids (shown in a different color) in the entire sequence of a wild-type protein; (ii) for those proteins whose 3D structures have been experimentally determined, a 3D structure is displayed to show mutation sites in a different color; (iii) a sequence homology search against PMD can be carried out with any query sequence; (iv) a summary of mutations of homologous sequences can be displayed, which shows all the mutations at a certain site of a protein, recorded throughout the PMD.  相似文献   

B Billoud  M Kontic    A Viari 《Nucleic acids research》1996,24(8):1395-1403
At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified.  相似文献   

EpoDB is a database of genes expressed in vertebrate red blood cells. It is also a prototype for the creation of cell and tissue-specific databases from multiple external sources. The information in EpoDB obtained from GenBank, SWISS-PROT, Transfac, TRRD and GERD is curated to provide high quality data for sequence analysis aimed at understanding gene regulation during erythropoiesis. New protocols have been developed for data integration and updating entries. Using a BLAST-based algorithm, we have grouped GenBank entries representing the same gene together. This sequence similarity protocol was also used to identify new entries to be included in EpoDB. We have recently implemented our database in Sybase (relational tables) in addition to SICStus Prolog to provide us with greater flexibility in asking complex queries that utilize information from multiple sources. New additions to the public web site (http://www.cbil.upenn.edu/epodb) for accessing EpoDB are the ability to retrieve groups of entries representing different variants of the same gene and to retrieve gene expression data. The BLAST query has been enhanced by incorporating BLASTView, an interactive and graphical display of BLAST results. We have also enhanced the queries for retrieving sequence from specified genes by the addition of MEME, a motif discovery tool, to the integrated analysis tools which include CLUSTALW and TESS.  相似文献   

Motif-based searching in TOPS protein topology databases.   总被引:1,自引:0,他引:1  
MOTIVATION: TOPS cartoons are a schematic ion of protein three-dimensional structures in two dimensions, and are used for understanding and manual comparison of protein folds. Recently, an algorithm that produces the cartoons automatically from protein structures has been devised and cartoons have been generated to represent all the structures in the structural databank. There is now a need to be able to define target topological patterns and to search the database for matching domains. RESULTS: We have devised a formal language for describing TOPS diagrams and patterns, and have designed an efficient algorithm to match a pattern to a set of diagrams. A pattern-matching system has been implemented, and tested on a database derived from all the current entries in the Protein Data Bank (15,000 domains). Users can search on patterns selected from a library of motifs or, alternatively, they can define their own search patterns. AVAILABILITY: The system is accessible over the Web at http://tops.ebi.ac.uk/tops  相似文献   

An algorithm for predicting protein alpha/beta-sheet topologies from secondary structure and topological folding rules (constraints) has been developed and implemented in Prolog. This algorithm (CBS1) is based on constraint satisfaction and employs forward pruned breadth-first search and rotational invariance. CBS1 showed a 37-fold increase in efficiency over an exhaustive generate and test algorithm giving the same solution for a typical sheet of five strands whose topology was predicted from secondary structure with four topological folding constraints. Prolog specifications of a range of putative protein folding rules were then used to (i) replicate published protein topology predictions and (ii) validate these rules against known protein structures of nucleotide-binding domains. This demonstrated that (i) manual techniques for topology prediction can lead to non-exhaustive search and (ii) most of these protein folding principles were violated by specific proteins. Various extensions to the algorithm are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号