首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote and can be found on Figshare.  相似文献   

2.
High-throughput screening data repositories, such as PubChem, represent valuable resources for the development of small-molecule chemical probes and can serve as entry points for drug discovery programs. Although the loose data format offered by PubChem allows for great flexibility, important annotations, such as the assay format and technologies employed, are not explicitly indexed. The authors have previously developed a BioAssay Ontology (BAO) and curated more than 350 assays with standardized BAO terms. Here they describe the use of BAO annotations to analyze a large set of assays that employ luciferase- and β-lactamase-based technologies. They identified promiscuous chemotypes pertaining to different subcategories of assays and specific mechanisms by which these chemotypes interfere in reporter gene assays. Results show that the data in PubChem can be used to identify promiscuous compounds that interfere nonspecifically with particular technologies. Furthermore, they show that BAO is a valuable toolset for the identification of related assays and for the systematic generation of insights that are beyond the scope of individual assays or screening campaigns.  相似文献   

3.
The research of the spider fauna of Slovenia dates back to the very beginning of binomial nomenclature, and has gone through more and less prolific phases with authors concentrating on taxonomy, faunistics, ecology and zoogeographic reviews. Although the body of published works is remarkable for a small nation, the faunistic data has remained too scattered for a thorough understanding of regional biotic diversity, for comparative and ecological research, and for informed conservation purposes. A national checklist is long overdue. Here, a critical review of all published records in any language is provided. The species list currently comprises 738 species, is published online at http://www.bioportal.si/katalog/araneae.php under the title Araneae Sloveniae, and will be updated in due course. This tool will fill the void in cataloguing regional spider faunas and will facilitate further araneological research in central and southern Europe.  相似文献   

4.
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.  相似文献   

5.
We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.  相似文献   

6.
Over the last decade, we have witnessed an incredible growth in the amount of available genotype data due to high throughput sequencing (HTS) techniques. This information may be used to predict phenotypes of medical relevance, and pave the way towards personalized medicine. Blood phenotypes (e.g. ABO and Rh) are a purely genetic trait that has been extensively studied for decades, with currently over thirty known blood groups. Given the public availability of blood group data, it is of interest to predict these phenotypes from HTS data which may translate into more accurate blood typing in clinical practice. Here we propose BOOGIE, a fast predictor for the inference of blood groups from single nucleotide variant (SNV) databases. We focus on the prediction of thirty blood groups ranging from the well known ABO and Rh, to the less studied Junior or Diego. BOOGIE correctly predicted the blood group with 94% accuracy for the Personal Genome Project whole genome profiles where good quality SNV annotation was available. Additionally, our tool produces a high quality haplotype phase, which is of interest in the context of ethnicity-specific polymorphisms or traits. The versatility and simplicity of the analysis make it easily interpretable and allow easy extension of the protocol towards other phenotypes. BOOGIE can be downloaded from URL http://protein.bio.unipd.it/download/.  相似文献   

7.
Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: “FingeRprinting Ontology of Genomic variations” is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies). FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog.  相似文献   

8.
The XML-based Real-Time PCR Data Markup Language (RDML) has been developed by the RDML consortium (http://www.rdml.org) to enable straightforward exchange of qPCR data and related information between qPCR instruments and third party data analysis software, between colleagues and collaborators and between experimenters and journals or public repositories. We here also propose data related guidelines as a subset of the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) to guarantee inclusion of key data information when reporting experimental results.  相似文献   

9.
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.  相似文献   

10.
11.
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
This is a PLOS Computational Biology Software paper.
  相似文献   

12.
Frequent hitters are compounds that are detected as a "hit" in multiple high-throughput screening (HTS) assays. Such behavior is specific (e.g., target family related) or unspecific (e.g., reactive compounds) or can result from a combination of such behaviors. Detecting such hits while predicting the underlying reason behind their promiscuous behavior is desirable because it provides valuable information not only about the compounds themselves but also about the assay methodology and target classes at hand. This information can also greatly reduce cost and time during HTS hit profiling. The present study exemplifies how to mine large HTS data repositories, such as the one at Boehringer Ingelheim, to identify frequent hitters, gain further insights into the causes of promiscuous behavior, and generate models for predicting promiscuous compounds. Applications of this approach are demonstrated using two recent large-scale HTS assays. The authors believe this analysis and its concrete applications are valuable tools for streamlining and accelerating decision-making processes during the course of hit discovery.  相似文献   

13.
The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers’ community. To achieve this, we have developed a strategy for materials’ curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at http://bioinformatics.upsc.se/htmr.  相似文献   

14.
PlantMetabolomics.org (PM) is a web portal and database for exploring, visualizing, and downloading plant metabolomics data. Widespread public access to well-annotated metabolomics datasets is essential for establishing metabolomics as a functional genomics tool. PM integrates metabolomics data generated from different analytical platforms from multiple laboratories along with the key visualization tools such as ratio and error plots. Visualization tools can quickly show how one condition compares to another and which analytical platforms show the largest changes. The database tries to capture a complete annotation of the experiment metadata along with the metabolite abundance databased on the evolving Metabolomics Standards Initiative. PM can be used as a platform for deriving hypotheses by enabling metabolomic comparisons between genetically unique Arabidopsis (Arabidopsis thaliana) populations subjected to different environmental conditions. Each metabolite is linked to relevant experimental data and information from various annotation databases. The portal also provides detailed protocols and tutorials on conducting plant metabolomics experiments to promote metabolomics in the community. PM currently houses Arabidopsis metabolomics data generated by a consortium of laboratories utilizing metabolomics to help elucidate the functions of uncharacterized genes. PM is publicly available at http://www.plantmetabolomics.org.In the post genomics era, metabolomics is fast emerging as a vital source of information to aid in solving systems biology puzzles with an emphasis on metabolic solutions. Metabolomics is the science of measuring the pool sizes of metabolites (small molecules of Mr ≤ 1,000 D), which collectively define the metabolome of a biological sample (Fiehn et al., 2000; Hall et al., 2002). Coverage of the entire plant metabolome is a daunting task as it is estimated that there are over 200,000 different metabolites within the plant kingdom (Goodacre et al., 2004). Although technology is rapidly advancing, there are still large gaps in our knowledge of the plant metabolome.Despite this lack of complete knowledge and the immense metabolic diversity among plants, metabolomics has become a key analytical tool in the plant community (Hall et al., 2002). This has led to the emergence of multiple experimental and analytical platforms that collectively generate millions of metabolite data points. Because of this vast amount of data, the development of public databases to capture information from metabolomics experiments is vital to provide the scientific community with comprehensive knowledge about metabolite data generation, annotation, and integration with metabolic pathway data. Some examples of these public databases are given below. The Human Metabolome Project contains comprehensive data for more than 2,000 metabolites found within the human body (Wishart et al., 2007). The Golm Database is a repository that provides access to mass spectrometry (MS) libraries, metabolite profiling experiments, and related information from gas chromatography (GC)-MS experimental platforms, along with tools to integrate this information with other systems biology knowledge (Kopka et al., 2005). The Madison Metabolomics Consortium Database contains primarily NMR spectra for Arabidopsis (Arabidopsis thaliana) and features thorough NMR search tools (Cui et al., 2008). SetupX and Binbase provide a framework that combines MS data and biological metadata for steering laboratory work flows and employs automated metabolite annotation (Scholz and Fiehn, 2007).A single analytical technique cannot identify and quantify all the metabolites found in plants. Thus, PlantMetabolomics.org (PM) was developed to provide a portal for accessing publicly available MS-based plant metabolomics experimental results from multiple analytical and separation techniques. PM also follows the emerging metabolomics standards for experiment annotation. PM has extensive annotation links between the identified metabolites and metabolic pathways in AraCyc (Mueller et al., 2003) at The Arabidopsis Information Resource (Rhee et al., 2003) and the Plant Metabolic Network (www.plantcyc.org), the Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa et al., 2004), and MetNetDB (Wurtele et al., 2007).Standards for the annotation of metabolomics experiments are still under active development and the metadata types collected in PM are based on the recommendations of the Metabolomics Standards Initiative (MSI; Fiehn et al., 2007a) and the Minimal Information for a Metabolomic Experiment (Bino et al., 2004) standards. MSI attempts to capture the complete annotation of metabolomics experiments and includes metadata of the experiments along with the metabolite abundance data. The initial database schema design was guided by the schema proposed in the Architecture for Metabolomics project (Jenkins et al., 2004).  相似文献   

15.
16.
In a morphological ontology the expert’s knowledge is represented in terms, which describe morphological structures and how these structures relate to each other. With the assistance of ontologies this expert knowledge is made processable by machines, through a formal and standardized representation of terms and their relations to each other. The red flour beetle Tribolium castaneum, a representative of the most species rich animal taxon on earth (the Coleoptera), is an emerging model organism for development, evolution, physiology, and pest control. In order to foster Tribolium research, we have initiated the Tribolium Ontology (TrOn), which describes the morphology of the red flour beetle. The content of this ontology comprises so far most external morphological structures as well as some internal ones. All modeled structures are consistently annotated for the developmental stages larva, pupa and adult. In TrOn all terms are grouped into three categories: Generic terms represent morphological structures, which are independent of a developmental stage. In contrast, downstream of such terms are concrete terms which stand for a dissectible structure of a beetle at a specific life stage. Finally, there are mixed terms describing structures that are only found at one developmental stage. These terms combine the characteristics of generic and concrete terms with features of both. These annotation principles take into account the changing morphology of the beetle during development and provide generic terms to be used in applications or for cross linking with other ontologies and data resources. We use the ontology for implementing an intuitive search function at the electronic iBeetle-Base, which stores morphological defects found in a genome wide RNA interference (RNAi) screen. The ontology is available for download at http://ibeetle-base.uni-goettingen.de.  相似文献   

17.
Functional context for biological sequence is provided in the form of annotations. However, within a group of similar sequences there can be annotation heterogeneity in terms of coverage and specificity. This in turn can introduce issues regarding the interpretation of actual functional similarity and overall functional coherence of such a group. One way to mitigate such issues is through the use of visualization and statistical techniques. Therefore, in order to help interpret this annotation heterogeneity we created a web application that generates Gene Ontology annotation graphs for protein sets and their associated statistics from simple frequencies to enrichment values and Information Content based metrics. The publicly accessible website http://xldb.di.fc.ul.pt/gryfun/ currently accepts lists of UniProt accession numbers in order to create user-defined protein sets for subsequent annotation visualization and statistical assessment. GRYFUN is a freely available web application that allows GO annotation visualization of protein sets and which can be used for annotation coherence and cohesiveness analysis and annotation extension assessments within under-annotated protein sets.  相似文献   

18.
《PloS one》2016,11(4)
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.  相似文献   

19.

Background

In recent years, increasing amounts of genomic and clinical cancer data have become publically available through large-scale collaborative projects such as The Cancer Genome Atlas (TCGA). However, as long as these datasets are difficult to access and interpret, they are essentially useless for a major part of the research community and their scientific potential will not be fully realized. To address these issues we developed MEXPRESS, a straightforward and easy-to-use web tool for the integration and visualization of the expression, DNA methylation and clinical TCGA data on a single-gene level (http://mexpress.be).

Results

In comparison to existing tools, MEXPRESS allows researchers to quickly visualize and interpret the different TCGA datasets and their relationships for a single gene, as demonstrated for GSTP1 in prostate adenocarcinoma. We also used MEXPRESS to reveal the differences in the DNA methylation status of the PAM50 marker gene MLPH between the breast cancer subtypes and how these differences were linked to the expression of MPLH.

Conclusions

We have created a user-friendly tool for the visualization and interpretation of TCGA data, offering clinical researchers a simple way to evaluate the TCGA data for their genes or candidate biomarkers of interest.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1847-z) contains supplementary material, which is available to authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号