首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
随着“蛋白质组学”的蓬勃发展和人类对生物大分子功能机制的知识积累,涌现出海量的蛋白质相互作用数据。随之,研究者开发了300多个蛋白质相互作用数据库,用于存储、展示和数据的重利用。蛋白质相互作用数据库是系统生物学、分子生物学和临床药物研究的宝贵资源。本文将数据库分为3类:(1)综合蛋白质相互作用数据库;(2)特定物种的蛋白质相互作用数据库;(3)生物学通路数据库。重点介绍常用的蛋白质相互作用数据库,包括BioGRID、STRING、IntAct、MINT、DIP、IMEx、HPRD、Reactome和KEGG等。  相似文献   

2.
Orchard S 《Proteomics》2012,12(10):1656-1662
Molecular interaction databases are playing an ever more important role in our understanding of the biology of the cell. An increasing number of resources exist to provide these data and many of these have adopted the controlled vocabularies and agreed-upon standardised data formats produced by the Molecular Interaction workgroup of the Human Proteome Organization Proteomics Standards Initiative (HUPO PSI-MI). Use of these standards allows each resource to establish PSI Common QUery InterfaCe (PSICQUIC) service, making data from multiple resources available to the user in response to a single query. This cooperation between databases has been taken a stage further, with the establishment of the International Molecular Exchange (IMEx) consortium which aims to maximise the curation power of numerous data resources, and provide the user with a non-redundant, consistently annotated set of interaction data.  相似文献   

3.
The Annual Spring workshop of the HUPO-PSI was this year held at the EMBL International Centre for Advanced Training (EICAT) in Heidelberg, Germany. Delegates briefly reviewed the successes of the group to date. These include the wide spread implementation of the molecular interaction data exchange formats, PSI-MI XML2.5 and MITAB, and also of mzML, the standard output format for mass spectrometer output data. These successes have resulted in enhanced accessibility to published data, for example the development of the PSICQUIC common query interface for interaction data and the development of databases such as PRIDE to act as public repositories for proteomics data and increased biosharing, through the development of consortia, for example IMEx and ProteomeXchange which will both share the burden of curating the increasing amounts of data being published and work together to make this more accessible to the bench scientist. Work then started over the three days of the workshop, with a focus on advancing the draft format for handling quantitative mass spectrometry data (mzQuantML) and further developing TraML, a standardized format for the exchange and transmission of transition lists for SRM experiments.  相似文献   

4.
The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.  相似文献   

5.
MOTIVATION: Microsatellites, also known as simple sequence repeats, are the tandem repeats of nucleotide motifs of the size 1-6 bp found in every genome known so far. Their importance in genomes is well known. Microsatellites are associated with various disease genes, have been used as molecular markers in linkage analysis and DNA fingerprinting studies, and also seem to play an important role in the genome evolution. Therefore, it is of importance to study distribution, enrichment and polymorphism of microsatellites in the genomes of interest. For this, the prerequisite is the availability of a computational tool for extraction of microsatellites (perfect as well as imperfect) and their related information from whole genome sequences. Examination of available tools revealed certain lacunae in them and prompted us to develop a new tool. RESULTS: In order to efficiently screen genome sequences for microsatellites (perfect as well as imperfect), we developed a new tool called IMEx (Imperfect Microsatellite Extractor). IMEx uses simple string-matching algorithm with sliding window approach to screen DNA sequences for microsatellites and reports the motif, copy number, genomic location, nearby genes, mutational events and many other features useful for in-depth studies. IMEx is more sensitive, efficient and useful than the available widely used tools. IMEx is available in the form of a stand-alone program as well as in the form of a web-server. AVAILABILITY: A World Wide Web server and the stand-alone program are available for free access at http://203.197.254.154/IMEX/ or http://www.cdfd.org.in/imex.  相似文献   

6.
7.
No-salt flowthrough hydrophobic interaction chromatography (HIC) has been shown to effectively remove process and product-related impurities from bioprocess streams. In this publication, a panel of six antibodies has been used to demonstrate operating principles for the application of no-salt flowthrough HIC in antibody purification processes. The results indicate that no-salt flowthrough HIC provides robust aggregate clearance across operating conditions including flow rate, and variations in resin ligand density. Additionally, HMW reduction has an optimal pH range relative to the isoelectric point of each molecule and high molecular weight (HMW) reduction can be improved by altering the total protein load and/or HMW concentration to drive binding of high molecular weight species to the resin.  相似文献   

8.
The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparision, exchange and verification. To this end, a Level 1 Molecular Interaction XML data exchange format has been developed which has been accepted for publication and is freely available at the PSI website (http.//psidev.sf.net/). Several major protein interaction databases are already making data available in this format. A draft XML interchange format for mass spectrometry data has been written and is currently undergoing evaluation whilst work is ongoing to develop a proteomics data integration model, MIAPE.  相似文献   

9.
Because they house large biodiversity collections and are also research centres with sequencing facilities, natural history museums are well placed to develop DNA barcoding best practices. The main difficulty is generally the vouchering system: it must ensure that all data produced remain attached to the corresponding specimen, from the field to publication in articles and online databases. The Museum National d'Histoire Naturelle in Paris is one of the leading laboratories in the Marine Barcode of Life (MarBOL) project, which was used as a pilot programme to include barcode collections for marine molluscs and crustaceans. The system is based on two relational databases. The first one classically records the data (locality and identification) attached to the specimens. In the second one, tissue-clippings, DNA extractions (both preserved in 2D barcode tubes) and PCR data (including primers) are linked to the corresponding specimen. All the steps of the process [sampling event, specimen identification, molecular processing, data submission to Barcode Of Life Database (BOLD) and GenBank] are thus linked together. Furthermore, we have developed several web-based tools to automatically upload data into the system, control the quality of the sequences produced and facilitate the submission to online databases. This work is the result of a joint effort from several teams in the Museum National d'Histoire Naturelle (MNHN), but also from a collaborative network of taxonomists and molecular systematists outside the museum, resulting in the vouchering so far of ~41,000 sequences and the production of ~11,000 COI sequences.  相似文献   

10.

Purpose

The paper introduces the publication on “Global Guidance Principles for Life Cycle Assessment Databases”; it focuses on the development of training material and other implementation activities on the publication.

Methods

The document is the output of the “Shonan Guidance Principles” workshop. The publication provides guidance principles for life cycle assessment (LCA) databases; this includes how to collect raw data, how to develop datasets, and how to manage databases. The publication also addresses questions concerning data documentation and review, coordination among databases, capacity building, and future scenarios. As a next step, the publication is used to prepare training material and other implementation activities.

Results

The publication was launched at the LCM 2011 Conference. Since then outreach activities have been organized in particular in emerging economies. Further developments with regard to the guidance principles are foreseen as part of a flagship project within phase 3 of the Life Cycle Initiative. Training material is being developed that will include how to set up databases and develop datasets. The topic has been taken up by United Nations Environment Programme (UNEP) in its Rio?+?20 Voluntary Commitments: UNEP and Society of Environmental Toxicology and Chemistry (SETAC) through the UNEP/SETAC Life Cycle Initiative commit to facilitate improved access to good quality life cycle data and databases as well as expanded use of key environmental indicators that allows the measurement and monitoring of progress towards the environmental sustainability of selected product chains.

Conclusions

The adoption of the “Global Guidance Principles” publication as a de facto global standard is expected to facilitate the work of database teams, especially, in developing countries, and the collaboration in regional networks. These efforts are supported by the development of training material and other implementation activities.  相似文献   

11.
A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.  相似文献   

12.
This study is the first molecular and biochemical analysis conducted on Pompia, a plant of unknown origin that is endemic to Sardinia; this plant is thought to belong to the Citrus genus. Here, genes coding for the enzymes superoxide dismutase (SOD, EC 1.15.1.1), catalase (CAT, EC 1.11.1.6), peroxidase (POD, EC 1.11.1.7), and polyphenol oxidase (PPO, EC 1.14.18.1) were identified. We detected the aforementioned enzymes in fresh leaf tissue and assessed the catalytic activity of each to support the molecular and biochemical data. This was the first molecular study to define the primary structure of proteins with antioxidant activity in Pompia. The study also contributed to the enrichment of gene databases and created the basis for molecular phylogenetic studies, which is important because this plant currently has no taxonomic or phylogenetic classification.  相似文献   

13.
Microsatellites are ubiquitous short tandem repeats found in all known genomes and are known to play a very important role in various studies and fields including DNA fingerprinting, paternity studies, evolutionary studies, virulence and adaptation of certain bacteria and viruses etc. Due to the sequencing of several genomes and the availability of enormous amounts of sequence data during the past few years, computational studies of microsatellites are of interest for many researchers. In this context, we developed a software tool called Imperfect Microsatellite Extractor (IMEx), to extract perfect, imperfect and compound microsatellites from genome sequences along with their complete statistics. Recently we developed a user-friendly graphical-interface using JAVA for IMEx to be used as a stand-alone software named G-IMEx. G-IMEx takes a nucleotide sequence as an input and the results are produced in both html and text formats. The Linux version of G-IMEx can be downloaded for free from http://www.cdfd.org.in/imex.  相似文献   

14.
Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focuses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.  相似文献   

15.
16.
The discovery of an abundance of copy number variants (CNVs; gains and losses of DNA sequences >1 kb) and other structural variants in the human genome is influencing the way research and diagnostic analyses are being designed and interpreted. As such, comprehensive databases with the most relevant information will be critical to fully understand the results and have impact in a diverse range of disciplines ranging from molecular biology to clinical genetics. Here, we describe the development of bioinformatics resources to facilitate these studies. The Database of Genomic Variants (http://projects.tcag.ca/variation/) is a comprehensive catalogue of structural variation in the human genome. The database currently contains 1,267 regions reported to contain copy number variation or inversions in apparently healthy human cases. We describe the current contents of the database and how it can serve as a resource for interpretation of array comparative genomic hybridization (array CGH) and other DNA copy imbalance data. We also present the structure of the database, which was built using a new data modeling methodology termed Cross-Referenced Tables (XRT). This is a generic and easy-to-use platform, which is strong in handling textual data and complex relationships. Web-based presentation tools have been built allowing publication of XRT data to the web immediately along with rapid sharing of files with other databases and genome browsers. We also describe a novel tool named eFISH (electronic fluorescence in situ hybridization) (http://projects.tcag.ca/efish/), a BLAST-based program that was developed to facilitate the choice of appropriate clones for FISH and CGH experiments, as well as interpretation of results in which genomic DNA probes are used in hybridization-based experiments.  相似文献   

17.

Background

In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.

Results

To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.

Conclusion

The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
  相似文献   

18.
19.
Jacobs PP  Sackstein R 《FEBS letters》2011,585(20):3148-3158
Despite great strides in our knowledge of the genetic and epigenetic changes underlying malignancy, we have limited information on the molecular basis of metastasis. Over 90% of cancer deaths are caused by spread of tumor cells from a primary site to distant organs and tissues, highlighting the pressing need to define the molecular effectors of cancer metastasis. Mounting evidence suggests that circulating tumor cells (CTCs) home to specific tissues by hijacking the normal leukocyte trafficking mechanisms. Cancer cells characteristically express CD44, and there is increasing evidence that hematopoietic cell E-/L-selectin ligand (HCELL), a sialofucosylated glycoform of CD44, serves as the major selectin ligand on cancer cells, allowing interaction of tumor cells with endothelium, leukocytes, and platelets. Here, we review the structural biology of CD44 and of HCELL, and present current data on the function of these molecules in mediating organ-specific homing/metastasis of CTCs.  相似文献   

20.

Background  

Phylogenetic trees resulting from molecular phylogenetic analysis are available in Newick format from specialized databases but when it comes to phylogenetic networks, which provide an explicit representation of reticulate evolutionary events such as recombination, hybridization or lateral gene transfer, the lack of a standard format for their representation has hindered the publication of explicit phylogenetic networks in the specialized literature and their incorporation in specialized databases. Two different proposals to represent phylogenetic networks exist: as a single Newick string (where each hybrid node is splitted once for each parent) or as a set of Newick strings (one for each hybrid node plus another one for the phylogenetic network).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号