首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.  相似文献   

2.
The Comparative Toxicogenomics Database is a public resource that promotes understanding about the effects of environmental chemicals on human health. Currently, CTD describes over 184,000 molecular interactions for more than 5,100 chemicals and 16,300 genes/proteins. We have leveraged this dataset of chemical-gene relationships to compute similarity indices following the statistical method of the Jaccard index. These scores are used to produce lists of comparable genes (“GeneComps”) or chemicals (“ChemComps”) based on shared toxicogenomic profiles. GeneComps and ChemComps are now provided for every curated gene and chemical in CTD. ChemComps are particularly significant because they provide a way to group chemicals based upon their biological effects, instead of their physical or structural properties. These metrics provide a novel way to view and classify genes and chemicals and will help advance testable hypotheses about environmental chemical-genedisease networks.

Availability

CTD is freely available at http://ctd.mdibl.org/  相似文献   

3.
Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.  相似文献   

4.

Background

Many common diseases arise from an interaction between environmental and genetic factors. Our knowledge regarding environment and gene interactions is growing, but frameworks to build an association between gene-environment interactions and disease using preexisting, publicly available data has been lacking. Integrating freely-available environment-gene interaction and disease phenotype data would allow hypothesis generation for potential environmental associations to disease.

Methods

We integrated publicly available disease-specific gene expression microarray data and curated chemical-gene interaction data to systematically predict environmental chemicals associated with disease. We derived chemical-gene signatures for 1,338 chemical/environmental chemicals from the Comparative Toxicogenomics Database (CTD). We associated these chemical-gene signatures with differentially expressed genes from datasets found in the Gene Expression Omnibus (GEO) through an enrichment test.

Results

We were able to verify our analytic method by accurately identifying chemicals applied to samples and cell lines. Furthermore, we were able to predict known and novel environmental associations with prostate, lung, and breast cancers, such as estradiol and bisphenol A.

Conclusions

We have developed a scalable and statistical method to identify possible environmental associations with disease using publicly available data and have validated some of the associations in the literature.  相似文献   

5.
Linking networks of molecular interactions to cellular functions and phenotypes is a key goal in systems biology. Here, we adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. As a general association measure, SANTA can (i) functionally annotate experimentally derived networks using a collection of curated gene sets and (ii) annotate experimentally derived gene sets using a collection of curated networks, as well as (iii) prioritize genes for follow-up analyses. We exemplify the efficacy of SANTA in several case studies using the S. cerevisiae genetic interaction network and genome-wide RNAi screens in cancer cell lines. Our theory, simulations, and applications show that SANTA provides a principled statistical way to quantify the association between molecular networks and cellular functions and phenotypes. SANTA is available from http://bioconductor.org/packages/release/bioc/html/SANTA.html.  相似文献   

6.
The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease.Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: rf.uma-vinu@toduab.siana  相似文献   

7.
Transient receptor potential (TRP) channels are a family of Ca2+-permeable cation channels that play a crucial role in biological and disease processes. To advance TRP channel research, we previously created the TRIP (TRansient receptor potential channel-Interacting Protein) Database, a manually curated database that compiles scattered information on TRP channel protein-protein interactions (PPIs). However, the database needs to be improved for information accessibility and data utilization. Here, we present the TRIP Database 2.0 (http://www.trpchannel.org) in which many helpful, user-friendly web interfaces have been developed to facilitate knowledge acquisition and inspire new approaches to studying TRP channel functions: 1) the PPI information found in the supplementary data of referred articles was curated; 2) the PPI summary matrix enables users to intuitively grasp overall PPI information; 3) the search capability has been expanded to retrieve information from ‘PubMed’ and ‘PIE the search’ (a specialized search engine for PPI-related articles); and 4) the PPI data are available as sif files for network visualization and analysis using ‘Cytoscape’. Therefore, our TRIP Database 2.0 is an information hub that works toward advancing data-driven TRP channel research.  相似文献   

8.
9.
10.
The Comparative Toxicogenomics Database (CTD) is a free resource that describes chemical-gene-disease networks to help understand the effects of environmental exposures on human health. The database contains more than 13,500 chemical-disease and 14,200 gene-disease interactions. In CTD, chemicals and genes are associated with a disease via two types of relationships: as a biomarker or molecular mechanism for the disease (M-type) or as a real or putative therapy for the disease (T-type). We leveraged these curated datasets to compute similarity indices that can be used to produce lists of comparable diseases ("DiseaseComps") based upon shared toxicogenomic profiles. This new metric now classifies diseases with common molecular characteristics, instead of the traditional approach of using histology or tissue of origin to define the disorder. In the dawning era of "personalized medicine", this feature provides a new way to view and describe diseases and will help develop testable hypotheses about chemical-gene-disease networks. AVAILABILITY: The database is available for free at http://ctd.mdibl.org/  相似文献   

11.

Background  

The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage.  相似文献   

12.
13.
Differential network analysis provides a framework for examining if there is sufficient statistical evidence to conclude that the structure of a network differs under two experimental conditions or if the structures of two networks are different. The R package dna provides tools and procedures for differential network analysis of genomic data. The focus of this package is on gene-gene networks, but the methods are easily adaptable for more general biological processes. This package includes preprocessing tools for simultaneously preparing a pair of networks for analysis, procedures for computing connectivity scores between pairs of genes based on many available statistical techniques, and tools for handling modules of genes based on these scores. Also, procedures are provided for performing permutation tests based on these scores to determine if the connectivity of a gene differs between the two networks, to determine if the connectivity of a particular set of important genes differs between the two networks, and to determine if the overall module structure differs between the two networks. Several built-in options are available for the types of scores and distances used in the testing procedures, and additionally, the procedures provide flexible methods that allow the user to define custom scores and distances.

Availability

dna is freely available at The Comprehensive R Archive Network, http://CRAN.R-project.org/package=dna  相似文献   

14.
Programmed cell death (PCD) is a critical biological process involved in many important processes, and defects in PCD have been linked with numerous human diseases. In recent years, the protein architecture in different PCD subroutines has been explored, but our understanding of the global network organization of the noncoding RNA (ncRNA)-mediated cell death system is limited and ambiguous. Hence, we developed the comprehensive bioinformatics resource (ncRDeathDB, www.rna-society.org/ncrdeathdb) to archive ncRNA-associated cell death interactions. The current version of ncRDeathDB documents a total of more than 4600 ncRNA-mediated PCD entries in 12 species. ncRDeathDB provides a user-friendly interface to query, browse and manipulate these ncRNA-associated cell death interactions. Furthermore, this resource will help to visualize and navigate current knowledge of the noncoding RNA component of cell death and autophagy, to uncover the generic organizing principles of ncRNA-associated cell death systems, and to generate valuable biological hypotheses.  相似文献   

15.
16.
It has been a challenge in systems biology to unravel relationships between structural properties and dynamic behaviors of biological networks. A Cytoscape plugin named NetDS was recently proposed to analyze the robustness-related dynamics and feed-forward/feedback loop structures of biological networks. Despite such a useful function, limitations on the network size that can be analyzed exist due to high computational costs. In addition, the plugin cannot verify an intrinsic property which can be induced by an observed result because it has no function to simulate the observation on a large number of random networks. To overcome these limitations, we have developed a novel software tool, PANET. First, the time-consuming parts of NetDS were redesigned to be processed in parallel using the OpenCL library. This approach utilizes the full computing power of multi-core central processing units and graphics processing units. Eventually, this made it possible to investigate a large-scale network such as a human signaling network with 1,609 nodes and 5,063 links. We also developed a new function to perform a batch-mode simulation where it generates a lot of random networks and conducts robustness calculations and feed-forward/feedback loop examinations of them. This helps us to determine if the findings in real biological networks are valid in arbitrary random networks or not. We tested our plugin in two case studies based on two large-scale signaling networks and found interesting results regarding relationships between coherently coupled feed-forward/feedback loops and robustness. In addition, we verified whether or not those findings are consistently conserved in random networks through batch-mode simulations. Taken together, our plugin is expected to effectively investigate various relationships between dynamics and structural properties in large-scale networks. Our software tool, user manual and example datasets are freely available at http://panet-csc.sourceforge.net/.  相似文献   

17.
18.

Background

The analysis of high-throughput data in biology is aided by integrative approaches such as gene-set analysis. Gene-sets can represent well-defined biological entities (e.g. metabolites) that interact in networks (e.g. metabolic networks), to exert their function within the cell. Data interpretation can benefit from incorporating the underlying network, but there are currently no optimal methods that link gene-set analysis and network structures.

Results

Here we present Kiwi, a new tool that processes output data from gene-set analysis and integrates them with a network structure such that the inherent connectivity between gene-sets, i.e. not simply the gene overlap, becomes apparent. In two case studies, we demonstrate that standard gene-set analysis points at metabolites regulated in the interrogated condition. Nevertheless, only the integration of the interactions between these metabolites provides an extra layer of information that highlights how they are tightly connected in the metabolic network.

Conclusions

Kiwi is a tool that enhances interpretability of high-throughput data. It allows the users not only to discover a list of significant entities or processes as in gene-set analysis, but also to visualize whether these entities or processes are isolated or connected by means of their biological interaction. Kiwi is available as a Python package at http://www.sysbio.se/kiwi and an online tool in the BioMet Toolbox at http://www.biomet-toolbox.org.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0408-9) contains supplementary material, which is available to authorized users.  相似文献   

19.

Background

Reconstruction of protein-protein interaction or metabolic networks based on expression data often involves in silico predictions, while on the other hand, there are unspecific networks of in vivo interactions derived from knowledge bases.We analyze networks designed to come as close as possible to data measured in vivo, both with respect to the set of nodes which were taken to be expressed in experiment as well as with respect to the interactions between them which were taken from manually curated databases

Results

A signaling network derived from the TRANSPATH database and a metabolic network derived from KEGG LIGAND are each filtered onto expression data from breast cancer (SAGE) considering different levels of restrictiveness in edge and vertex selection.We perform several validation steps, in particular we define pathway over-representation tests based on refined null models to recover functional modules. The prominent role of the spindle checkpoint-related pathways in breast cancer is exhibited. High-ranking key nodes cluster in functional groups retrieved from literature. Results are consistent between several functional and topological analyses and between signaling and metabolic aspects.

Conclusions

This construction involved as a crucial step the passage to a mammalian protein identifier format as well as to a reaction-based semantics of metabolism. This yielded good connectivity but also led to the need to perform benchmark tests to exclude loss of essential information. Such validation, albeit tedious due to limitations of existing methods, turned out to be informative, and in particular provided biological insights as well as information on the degrees of coherence of the networks despite fragmentation of experimental data.Key node analysis exploited the networks for potentially interesting proteins in view of drug target prediction.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号