首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Improvements to the GDB Human Genome Data Base.   总被引:5,自引:2,他引:3       下载免费PDF全文
Version 6.0 of the Human Genome Data Base introduces a number of significant improvements over previous releases of GDB. The most important of these are revised data representations for genes and genomic maps and a new curatorial model for the database. GDB 6.0 is the first major genomic database to provide read/write access directly to the scientific community, including capabilities for third-party annotation. The revised database can represent all major categories of genetic and physical maps, along with the underlying order and distance information used to construct them. The improved representation permits more sophisticated map queries to be posed and supports the graphical display of maps. In addition the new GDB has a richer model for gene information, better suited for supporting cross-references to databases describing gene function, structure, products, expression and associated phenotypes.  相似文献   

2.
Boehm AM  Sickmann A 《Proteomics》2006,6(15):4223-4226
In mass spectrometry-based proteomics, protein identification results usually consist of peptide sequences and database-dependent accession identifiers of the matching proteins. Often certain annotations are only available in particular databases that in turn must be queried by a certain identifier. In order to simplify and unify the tracing of identified proteins back to their original annotation information, a system capable of set-oriented mapping the different accession identifiers of proteins derived from multiple sequence database sources has been developed. This allows unification of the access to protein information and tracing to other online resources providing additional information as well as resolving cross-references of protein identifications. The interface of seqDB is available via http://www.protein-ms.de following the link to seqDB.  相似文献   

3.
The eukaryotic promoter database (EPD)   总被引:8,自引:0,他引:8  
  相似文献   

4.
Large volumes of genomic data have been generated for several plant species over the past decade, including structural sequence data and functional annotation at the genome level. Various technologies such as expressed sequence tags (ESTs), massively parallel signature sequencing (MPSS) and microarrays have been used to study gene expression and to provide functional data for many genes simultaneously. This review focuses on recent advances in the application of microarrays in plant genomic research and in gene expression databases available for plants. Large sets of Arabidopsis microarray data are publicly available. Recently developed array platforms are currently being used to generate genome-wide expression profiles for several crop species. Coupled to these platforms are public databases that provide access to these large-scale expression data, which can be used to aid the functional discovery of gene function.  相似文献   

5.
The protein kinase superfamily is an important group of enzymes controlling cellular signaling cascades. The increasing amount of available experimental data provides a foundation for deeper understanding of details of signaling systems and the underlying cellular processes. Here, we describe the Protein Kinase Resource, an integrated online service that provides access to information relevant to cell signaling and enables kinase researchers to visualize and analyze the data directly in an online environment. The data set is synchronized with Uniprot and Protein Data Bank (PDB) databases and is regularly updated and verified. Additional annotation includes interactive display of domain composition, cross-references between orthologs and functional mapping to OMIM records. The Protein Kinase Resource provides an integrated view of the protein kinase superfamily by linking data with their visual representation. Thus, human kinases can be mapped onto the human kinome tree via an interactive display. Sequence and structure data can be easily displayed using applications developed for the PKR and integrated with the website and the underlying database. Advanced search mechanisms, such as multiparameter lookup, sequence pattern, and blast search, enable fast access to the desired information, while statistics tools provide the ability to analyze the relationships among the kinases under study. The integration of data presentation and visualization implemented in the Protein Kinase Resource can be adapted by other online providers of scientific data and should become an effective way to access available experimental information.  相似文献   

6.
Associating phenotypic traits and quantitative trait loci (QTL) to causative regions of the underlying genome is a key goal in agricultural research.InterStoreDB is a suite of integrated databases designed to assist in this process.The individual databases are species independent and generic in design,providing access to curated datasets relating to plant populations,phenotypic traits,genetic maps,marker loci and QTL,with links to functional gene annotation and genomic sequence data.Each component database provides access to associated metadata,including data provenance and parameters used in analyses,thus providing users with information to evaluate the relative worth of any associations identified.The databases include CropStoreDB,for management of population,genetic map,QTL and trait measurement data,SeqStoreDB for sequence-related data and AlignStoreDB,which stores sequence alignment information,and allows navigation between genetic and genomic datasets.Genetic maps are visualized and compared using the CMAP tool,and functional annotation from sequenced genomes is provided via an EnsEMBL-based genome browser.This framework facilitates navigation of the multiple biological domains involved in genetics and genomics research in a transparent manner within a single portal.We demonstrate the value of InterStoreDB as a tool for Brassica research.InterStoreDB is available from:http://www.interstoredb.org  相似文献   

7.
The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.  相似文献   

8.
Psoriasis is a systemic hyperproliferative inflammatory skin disorder, although rarely fatal but significantly reduces quality of life. Understanding the full genetic component of the disease association may provide insight into biological pathways as well as targets and biomarkers for diagnosis, prognosis and therapy. Studies related to psoriasis associated genes and genetic markers are scattered and not easily amendable to data-mining. To alleviate difficulties, we have developed dbGAPs an integrated knowledgebase representing a gateway to psoriasis associated genomic data. The database contains annotation for 202 manually curated genes associated with psoriasis and its subtypes with cross-references. Functional enrichment of these genes, in context of Gene Ontology and pathways, provide insight into their important role in psoriasis etiology and pathogenesis. The dbGAPs interface is enriched with an interactive search engine for data retrieval along with unique customized tools for Single Nucleotide Polymorphism (SNP)/indel detection and SNP/indel annotations. dbGAPs is accessible at http://www.bmicnip.in/dbgaps/.  相似文献   

9.
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200,000 non-redundant PIR and SWISS-PROT proteins organized with more than 28,000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetown.edu/iproclass/.  相似文献   

10.
The increasing popularity of DNA chip technology for the study of gene expression is producing, for each experiment, a sizable quantity of numerical data to analyse and an accompanying large number of gene identifiers that should be associated with the relevant biological annotation. We describe here a website at IFOM (FIRC Institute of Molecular Oncology) where we release regularly updated annotation tables for the most used Affymetrix oligonucleotide DNA chips and for the whole Research Genetics 46K clone collection for cDNA arrays. These tables are synchronised with every new release of the mouse and human UniGene databases (NCBI; National Center for Biotechnology Information), allowing fast and easy preliminary annotation of DNA array experiments. We also report some comparative evidence about the importance of biological database synchronisation and cross-references in the process of generating annotation tables for DNA chips.  相似文献   

11.
Rother K  Michalsky E  Leser U 《Proteins》2005,60(4):571-576
We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.  相似文献   

12.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

13.
The Prostate Gene Database (PGDB: http://www.ucsf.edu/pgdb) is a curated and integrated database of genes or genomic loci related to the human prostate and prostatic diseases. Currently, PGDB covers genes involved in a number of molecular and genetic events of the prostate including gene amplification, mutation, gross deletion, methylation, polymorphism, linkage and over-expression, as published in the literature. Genes that are specifically expressed in prostate, as evidenced by analysis of data from expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE), are also included. There are a total of 165 unique entries in the database. Users can either browse or query the PGDB through a web interface. For each gene, in addition to basic gene information and rich cross-references to other databases, inclusive and relevant literature references are provided to support the inclusion of the gene in the database. Detailed expression data calculated from the UniGene and SAGEmap databases are also presented.  相似文献   

14.
SUMMARY: The Gandr (gene annotation data representation) knowledgebase is an ontological framework for laboratory-specific gene annotation. Gandr uses Protege 2000 for editing, querying and visualizing microarray data and annotations. Genes can be annotated with provided, newly created or imported ontological concepts. Annotated genes can inherit assigned concept properties and can be related to each other. The resulting knowledgebase can be visualized as interactive network of nodes and edges representing genes and their functional relationships. This allows for immediate and associative gene context exploration. Ontological query techniques allow for powerful data access.  相似文献   

15.
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.  相似文献   

16.
MOTIVATION: With the increase in submission of sequences to public databases, the curators of these are not able to cope with the amount of information. The motivation of this work is to generate a system for automated annotation of data we are particularly interested in, namely proteins related to the Mycoplasmataceae family. Following previous works on automatic annotation using symbolic machine learning techniques, the present work proposes a method of automatic annotation of keywords (a part of the SWISS-PROT annotation procedure), and the validation, by an expert, of the annotation rules generated. The aim of this procedure is twofold: to complete the annotation of keywords of those proteins which is far from adequate, and to produce a prototype of the validation environment, which is aimed at an expert who does not have a deep knowledge of the structure of the current databases containing the necessary information s/he needs. RESULTS: As for the first objective, a rate of correct keywords annotation of 60% is reported in the literature. Our preliminary results show that with a slightly different method, applied this method to data related to Mycoplasmataceae only, we are able to increase that rate of correct annotation.  相似文献   

17.
《Genomics》2019,111(6):1923-1928
An online portal, accessible at URL: http://mail.nbfgr.res.in/FisOmics/, was developed that features different genomic databases and tools. The portal, named as FisOmics, acts as a platform for sharing fish genomic sequences and related information in addition to facilitating the access of high-performance computational resources for genome and proteome data analyses. It provides the ability for quarrying, analysing and visualizing genomic sequences and related information. The featured databases in FisOmics are in the World Wide Web domain already. The aim to develop portal was to provide a nodal point to access the featured databases and work conveniently. Presently, FisOmics includes databases on barcode sequences, microsatellite markers, mitogenome sequences, hypoxia-responsive genes and karyology of fishes. Besides, it has a link to other molecular resources and reports on the on-going activities and research achievements.  相似文献   

18.
19.
The domesticated silkworm, Bombyx mori serves as an ideal representative of lepidopteran species for a variety of scientific studies. As a result, databases have been created to organize information pertaining to the silkworm genome that is subject to constant updating. Of these, four main databases are important for store nucleotide information in the form of genomic data, ESTs and microsatelites. These databases also store data related to other lepidoptera and important insects, which help in insect biological research. Though a considerable amount of nucleotide data is currently available, there is a paucity of data related to silkworm and other lepidopteran proteins. Hence, the focus of this article is to present the current status of nucleotide databases of silkworm, avenues for improvement and possibilities for databases that could be created in the future.  相似文献   

20.
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号