首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automated assembly of protein blocks for database searching.   总被引:52,自引:7,他引:45       下载免费PDF全文
A system is described for finding and assembling the most highly conserved regions of related proteins for database searching. First, an automated version of Smith's algorithm for finding motifs is used for sensitive detection of multiple local alignments. Next, the local alignments are converted to blocks and the best set of non-overlapping blocks is determined. When the automated system was applied successively to all 437 groups of related proteins in the PROSITE catalog, 1764 blocks resulted; these could be used for very sensitive searches of sequence databases. Each block was calibrated by searching the SWISS-PROT database to obtain a measure of the chance distribution of matches, and the calibrated blocks were concatenated into a database that could itself be searched. Examples are provided in which distant relationships are detected either using a set of blocks to search a sequence database or using sequences to search the database of blocks. The practical use of the blocks database is demonstrated by detecting previously unknown relationships between oxidoreductases and by evaluating a proposed relationship between HIV Vif protein and thiol proteases.  相似文献   

2.
A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [2]. SEMEDA can be found at http://www-bm.cs.uni-magdeburg.de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.  相似文献   

3.
A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.  相似文献   

4.
Jens Allmer 《Amino acids》2010,38(4):1075-1087
Determining the differential expression of proteins under different conditions is of major importance in proteomics. Since mass spectrometry-based proteomics is often used to quantify proteins, several labelling strategies have been developed. While these are generally more precise than label-free quantitation approaches, they imply specifically designed experiments which also require knowledge about peptides that are expected to be measured and need to be modified. We recently designed the 2DB database which aids storage, analysis, and publication of data from mass spectrometric experiments to identify proteins. This database can aid identifying peptides which can be used for quantitation. Here an extension to the database application, named MSMAG, is presented which allows for more detailed analysis of the distribution of peptides and their associated proteins over the fractions of an experiment. Furthermore, given several biological samples in the database, label-free quantitation can be performed. Thus, interesting proteins, which may warrant further investigation, can be identified en passant while performing high-throughput proteomics studies.  相似文献   

5.
YPL.db: the Yeast Protein Localization database   总被引:2,自引:1,他引:2       下载免费PDF全文
The Yeast Protein Localization database (YPL.db) contains information about the localization patterns of yeast proteins resulting from microscopic analyses. The data and parameters of the experiments to obtain the localization information, together with images from confocal or video microscopy, are stored in a relational database, building an archive of, and the documentation for, all experiments. The database can be queried based on gene name, protein localization, growth conditions and a number of additional parameters. All experiment parameters are selectable from predefined lists to ensure database integrity and conformity across different investigators. The database provides a structure reference resource to allow for better characterization of unknown or ambiguous localization patterns. Links to MIPS, YPD and SGD databases are provided to allow fast access to further information not contained in the localization database itself. YPL.db is available at http://ypl.tugraz.at.  相似文献   

6.
7.
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.  相似文献   

8.
9.
根据蛋白质的氨基酸组成实现其快速鉴定   总被引:1,自引:0,他引:1  
常规进行蛋白质鉴定的方法是测定其氨基酸顺序,它需要蛋白质顺序分析仪,对蛋白质的纯度要求高,费时和花费大,与之相比,蛋白质的氨基酸组成和分子量是容易实验测定的。本文描述了一个基于蛋白质的组成和分子量进行其快速鉴定的方法。其基本出发点是,通过统计蛋白质序列数据库中每个序列的氨基酸组成和分子量,得到一个含蛋白质长度、组成和分子量的数据库,将靶蛋白质的组成等数据与该数据库进行对比,可以检出组成和分子量与之接近的蛋白质。从而对该蛋白质进行初步鉴定。在有些情况下,甚至能相当准确地确定靶蛋白质与数据库中的某个(些)蛋白质相关。根据这一原理本文设计了根据氨基酸组成检索蛋白质组成数据库的程序,通过对胰岛素原、细胞肿瘤抗原P53和泛肽等多种蛋白质的组成分析,证实根据氨基酸组成能较好地进行蛋白质鉴定。  相似文献   

10.
We have improved an existing clone database management systemwritten in FORTRAN 77 and adapted it to our software environment.Improvements are that the database can be interrogated for anytype of information, not just keywords. Also, recombinant DNAconstructions can be represented in a simplified ‘shorthand’,whereafter a program assembles the full nucleotide sequencefrom the contributing fragments, which may be obtained fromnucleotide sequence databases. Another improvement is the replacementof the database manager by programs, running in batch to maintainthe databank and verify its consistency automatically. Finally,graphic extensions are written in Graphical Kernel System, todraw linear and circular restriction maps of recombinants. Besidesrestriction sites, recombinant features can be presented fromthe feature lines of recombinant database entries, or from thefeature tables of nucleotide databases. The clone database managementsystem is fully integrated into the sequence analysis softwarepackage from the Pasteur Institute, Paris, and is made accessiblethrough the same menu. As a result, recombinant DNA sequencescan directly be analysed by the sequence analysis programs. Received on March 17, 1986; accepted on June 16, 1986  相似文献   

11.
Arnason V 《Bioethics》2004,18(1):27-49
A major moral problem in relation to the deCODE genetics database project in Iceland is that the heavy emphasis placed on technical security of healthcare information has precluded discussion about the issue of consent for participation in the database. On the other hand, critics who have emphasised the issue of consent have most often demanded that informed consent for participation in research be obtained. While I think that individual consent is of major significance, I argue that this demand for informed consent is neither suitable nor desirable in this case. I distinguish between three aspects of the database and show that different types of consent are appropriate for each. In particular, I describe the idea of a written authorisation based on general information about the database as an alternative to informed consent and presumed consent in database research.  相似文献   

12.
13.
Shi L  Zhang Q  Rui W  Lu M  Jing X  Shang T  Tang J 《Regulatory peptides》2004,120(1-3):1-3
Bioactive peptide database (BioPD) is a web-based knowledge base that contains more than 1100 protein sequences from human, mouse and rat, which are putative or are known to be bioactive peptides. In addition to peptide sequences and the annotation, the database also contains gene sequences with annotation, protein interaction and disease data related to the peptides. Each entry has as many references as possible to support the information represented. BioPD consists of six parts: PROTEIN, GENE, DISEASE, LINKS, INTERACTION, and REFERENCE. The database is searchable through keyword, gene and protein name, receptor name, etc. The links to PDB, InterPro, Pfam, OMIM, etc. are provided in each entry. Thus BioPD is formed as an information center for the bioactive peptide and serves as a gateway for exploration of bioactive peptides. The database can be accessed at http://biopd.bjmu.edu.cn.  相似文献   

14.
The MolMod database is presented, which is openly accessible at http://molmod.boltzmann-zuse.de and contains intermolecular force fields for over 150 pure fluids at present. It was developed and is maintained by the Boltzmann-Zuse Society for Computational Molecular Engineering (BZS). The set of molecular models in the MolMod database provides a coherent framework for molecular simulations of fluids. The molecular models in the MolMod database consist of Lennard-Jones interaction sites, point charges, and point dipoles and quadrupoles, which can be equivalently represented by multiple point charges. The force fields can be exported as input files for the simulation programmes ms2 and ls1 mardyn, GROMACS, and LAMMPS. To characterise the semantics associated with the numerical database content, a force field nomenclature is introduced that can also be used in other contexts in materials modelling at the atomistic and mesoscopic levels. The models of the pure substances that are included in the database were generally optimised such as to yield good representations of experimental data of the vapour–liquid equilibrium with a focus on the vapour pressure and the saturated liquid density. In many cases, the models also yield good predictions of caloric, transport, and interfacial properties of the pure fluids. For all models, references to the original works in which they were developed are provided. The models can be used straightforwardly for predictions of properties of fluid mixtures using established combination rules. Input errors are a major source of errors in simulations. The MolMod database contributes to reducing such errors.  相似文献   

15.
The KEGG databases at GenomeNet   总被引:30,自引:0,他引:30       下载免费PDF全文
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).  相似文献   

16.
17.
PaVESy: Pathway Visualization and Editing System   总被引:1,自引:0,他引:1  
A data managing system for editing and visualization of biological pathways is presented. The main component of PaVESy (Pathway Visualization and Editing System) is a relational SQL database system. The database design allows storage of biological objects, such as metabolites, proteins, genes and respective relations, which are required to assemble metabolic and regulatory biological interactions. The database model accommodates highly flexible annotation of biological objects by user-defined attributes. In addition, specific roles of objects are derived from these attributes in the context of user-defined interactions, e.g. in the course of pathway generation or during editing of the database content. Furthermore, the user may organize and arrange the database content within a folder structure and is free to group and annotate database objects of interest within customizable subsets. Thus, we allow an individualized view on the database content and facilitate user customization. A JAVA-based class library was developed, which serves as the database programming interface to PaVESy. This API provides classes, which implement the concepts of object persistence in SQL databases, such as entries, interactions, annotations, folders and subsets. We created editing and visualization tools for navigation in and visualization of the database content. User approved pathway assemblies are stored and may be retrieved for continued modification, annotation and export. Data export is interfaced with a range of network visualization programs, such as Pajek or other software allowing import of SBML or GML data format. AVAILABILITY: http://pavsey.mpimp-golm.mpg.de  相似文献   

18.
A Computer program is presented that models the binding of selected chemical groups to a protein surface. The groups are successively incorporated at energetically favourable positions to build up a pharmacophore pattern that may be used as the basis for a database search for possible ligands. The ability to predict known binding points in a trypsin–inhibitor complex is demonstrated, and the results from a run on dihydrofolate reductase are shown to be usable as a pharmacophore pattern for a database search.  相似文献   

19.
The aim of this article is to describe a newly created open access database of archeological human remains collections from Flanders, Belgium. The MEMOR database ( www.memor.be ) was created to provide an overview of the current practices of loans, reburial, and the research potential of human skeletons from archeological sites currently stored in Flanders. In addition, the project aimed to provide a legal and ethical framework for the handling of human remains and was created around stakeholder involvement from anthropologists, geneticists, contract archeologists, the local, regional and national government agencies, local and national government, universities, and representatives of the major religions. The project has resulted in the creation of a rich database with many collections available for study. The database was created using the open-source Arches data management platform that is freely available for organizations worldwide to configure in accordance with their individual needs and without restrictions on its use. Each collection is linked to information about the excavation and the site the remains originate from, its size and time period. In addition, a research potential tab reveals whether any analyses were performed, and whether excavation notes are available with the assemblage. The database currently contains 742 collections, ranging in size from 1 to over 1000 individuals. New collections will continue to be added when new assemblages are excavated and studied. The database can also be expanded to include human remains collections from other regions and other material categories, such as archaeozoological collections.  相似文献   

20.
The Z curve database: a graphic representation of genome sequences   总被引:7,自引:0,他引:7  
MOTIVATION: Genome projects for many prokaryotic and eukaryotic species have been completed and more new genome projects are being underway currently. The availability of a large number of genomic sequences for researchers creates a need to find graphic tools to study genomes in a perceivable form. The Z curve is one of such tools available for visualizing genomes. The Z curve is a unique three-dimensional curve representation for a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z curve database for more than 1000 genomes have been established here. RESULTS: The database contains the Z curves for archaea, bacteria, eukaryota, organelles, phages, plasmids, viroids and viruses, whose genomic sequences are currently available. All the 3-dimensional Z curves and their three component curves are stored in the database. The applications of the Z curve database on comparative genomics, gene prediction, computation of G+C content with a windowless technique, prediction of replication origins and terminations of bacterial and archaeal genomes and study of local deviations from the Chargaff Parity Rule 2 etc. are presented in detail. The Z curve database reported here is a treasure trove in which biologists could find useful biological knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号