首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Structuring an event ontology for disease outbreak detection   总被引:1,自引:0,他引:1  
BACKGROUND: This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is designed to support timely detection of disease outbreaks and rapid judgment of their alerting status by 1) bridging a gap between layman's language used in disease outbreak reports and public health experts' deep knowledge, and 2) making multi-lingual information available. CONSTRUCTION AND CONTENT: This event ontology integrates a model of experts' knowledge for disease surveillance, and at the same time sets of linguistic expressions which denote disease-related events, and formal definitions of events. In this ontology, rather general event classes, which are suitable for application to language-oriented tasks such as recognition of event expressions, are placed on the upper-level, and more specific events of the experts' interest are in the lower level. Each class is related to other classes which represent participants of events, and linked with multi-lingual synonym sets and axioms. CONCLUSIONS: We consider that the design of the event ontology and the methodology introduced in this paper are applicable to other domains which require integration of natural language information and machine support for experts to assess them. The first version of the ontology, with about 40 concepts, will be available in March 2008.  相似文献   

2.
3.
MOTIVATION: The development of an integrated genetic and physical map for the maize genome involves the generation of an enormous amount of data. Managing this data requires a system to aid in genotype scoring for different types of markers coming from both local and remote users. In addition, researchers need an efficient way to interact with genetic mapping software and with data files from automated DNA sequencing. They also need ways to manage primer data for mapping and sequencing and provide views of the integrated physical and genetic map and views of genetic map comparisons. RESULTS: The MMP-LIMS system has been used successfully in a high-throughput mapping environment. The genotypes from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel markers have been entered and verified via MMP-LIMS. The system is flexible, and can be easily modified to manage data for other species. The software is freely available. AVAILABILITY: To receive a copy of the iMap or cMap software, please fill out the form on our website. The other MMP-LIMS software is freely available at http://www.maizemap.org/bioinformatics.htm.  相似文献   

4.
CYTOMER is a relational database of organs/tissues, cell types, physiological systems and developmental stages that currently focuses on the human system. From this database, we have derived an ontology for anatomical and morphological structures for the human organism which includes all embryonal stages and the cell types constituting these structures. The ontology has been transferred to the OWL format and is freely available for download at http://cytomer/bioinf.med.uni-goettingen.de.  相似文献   

5.
MOTIVATION: Protein family databases provide a central focus for scientific communities as well as providing useful resources to aide research. However, such resources require constant curation and often become outdated and discontinued. We have developed an ontology-driven system for capturing and managing protein family data that addresses the problems of maintenance and sustainability. RESULTS: Using protein phosphatases and ABC transporters as model protein families, we constructed two protein family database resources around a central DAML+OIL ontology. Each resource contains specialist information about each protein family, providing specialized domain-specific resources based on the same template structure. The formal structure, combined with the extraction of biological data using GO terms, allows for automated update strategies. Despite the functional differences between the two protein families, the ontology model was equally applicable to both, demonstrating the generic nature of the system. AVAILABILITY: The protein phosphatase resource, PhosphaBase, is freely available on the internet (http://www.bioinf.man.ac.uk/phosphabase). The DAML+OIL ontology for the protein phosphatases and the ABC transporters is available on request from the authors. CONTACT: kwolstencroft@cs.man.ac.uk.  相似文献   

6.

Background

Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.

Method

We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.

Results

We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.

Conclusion

We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels.  相似文献   

7.
MPID-T     
  相似文献   

8.
9.
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.  相似文献   

10.
SUMMARY: A brief overview of Tree-Maps provides the basis for understanding two new implementations of Tree-Map methods. TreeMapClusterView provides a new way to view microarray gene expression data, and GenePlacer provides a view of gene ontology annotation data. We also discuss the benefits of Tree-Maps to visualize complex hierarchies in functional genomics. AVAILABILITY: Java class files are freely available at http://mendel.mc.duke.edu/bioinformatics/ CONTACT: mccon012@mc.duke.edu SUPPLEMENTARY INFORMATION: For more information on TreeMapClusterView (see http://mendel.mc.duke.edu/bioinformatics/software/boxclusterview/), and http://mendel.mc.duke.edu/bioinformatics/software/geneplacer/).  相似文献   

11.
We describe an ontology for cell types that covers the prokaryotic, fungal, animal and plant worlds. It includes over 680 cell types. These cell types are classified under several generic categories and are organized as a directed acyclic graph. The ontology is available in the formats adopted by the Open Biological Ontologies umbrella and is designed to be used in the context of model organism genome and other biological databases. The ontology is freely available at http://obo.sourceforge.net/ and can be viewed using standard ontology visualization tools such as OBO-Edit and COBrA.  相似文献   

12.
The recognition and normalization of gene mentions in biomedical literature are crucial steps in biomedical text mining. We present a system for extracting gene names from biomedical literature and normalizing them to gene identifiers in databases. The system consists of four major components: gene name recognition, entity mapping, disambiguation and filtering. The first component is a gene name recognizer based on dictionary matching and semi-supervised learning, which utilizes the co-occurrence information of a large amount of unlabeled MEDLINE abstracts to enhance feature representation of gene named entities. In the stage of entity mapping, we combine the strategies of exact match and approximate match to establish linkage between gene names in the context and the EntrezGene database. For the gene names that map to more than one database identifiers, we develop a disambiguation method based on semantic similarity derived from the Gene Ontology and MEDLINE abstracts. To remove the noise produced in the previous steps, we design a filtering method based on the confidence scores in the dictionary used for NER. The system is able to adjust the trade-off between precision and recall based on the result of filtering. It achieves an F-measure of 83% (precision: 82.5% recall: 83.5%) on BioCreative II Gene Normalization (GN) dataset, which is comparable to the current state-of-the-art.  相似文献   

13.

Background

Ontologies represent powerful tools in information technology because they enhance interoperability and facilitate, among other things, the construction of optimized search engines. To address the need to expand the toolbox available for the control and prevention of vector-borne diseases we embarked on the construction of specific ontologies. We present here IDODEN, an ontology that describes dengue fever, one of the globally most important diseases that are transmitted by mosquitoes.

Methodology/Principal Findings

We constructed IDODEN using open source software, and modeled it on IDOMAL, the malaria ontology developed previously. IDODEN covers all aspects of dengue fever, such as disease biology, epidemiology and clinical features. Moreover, it covers all facets of dengue entomology. IDODEN, which is freely available, can now be used for the annotation of dengue-related data and, in addition to its use for modeling, it can be utilized for the construction of other dedicated IT tools such as decision support systems.

Conclusions/Significance

The availability of the dengue ontology will enable databases hosting dengue-associated data and decision-support systems for that disease to perform most efficiently and to link their own data to those stored in other independent repositories, in an architecture- and software-independent manner.  相似文献   

14.
Ontologies have emerged as a fast growing research topic in the area of semantic web during last decade. Currently there are 204 ontologies that are available through OBO Foundry and BioPortal. Several excellent tools for navigating the ontological structure are available, however most of them are dedicated to a specific annotation data or integrated with specific analysis applications, and do not offer flexibility in terms of general-purpose usage for ontology exploration. We developed OntoVisT, a web based ontological visualization tool. This application is designed for interactive visualization of any ontological hierarchy for a specific node of interest, up to the chosen level of children and/or ancestor. It takes any ontology file in OBO format as input and generates output as DAG hierarchical graph for the chosen query. To enhance the navigation capabilities of complex networks, we have embedded several features such as search criteria, zoom in/out, center focus, nearest neighbor highlights and mouse hover events. The application has been tested on all 72 data sets available in OBO format through OBO foundry. The results for few of them can be accessed through OntoVisT-Gallery. AVAILABILITY: The database is available for free at http://ccbb.jnu.ac.in/OntoVisT.html.  相似文献   

15.
The Human BAC Ends database includes all non-redundant human BAC end sequences (BESs) generated by The Institute for Genomic Research (TIGR), the University of Washington (UW) and California Institute of Technology (CalTech). It incorporates the available BAC mapping data from different genome centers and the annotation results of each end sequence for the contents of repeats, ESTs and STS markers. For each BAC end the database integrates the sequence, the phred quality scores, the map and the annotation, and provides links to sites of the library information, the reports of GenBank, dbGSS and GDB, and other relevant data. The database is freely accessible via the web and supports sequence or clone searches and anonymous FTP. The relevant sites and resources are described at http://www.tigr.org/ tdb/humgen/bac_end_search/bac_end_intro.html  相似文献   

16.
SUMMARY: Tracker is a web-based email alert system for monitoring protein database searches using HMMER and Blast-P, nucleotide searches using Blast-N and literature searches of the PubMed database. Users submit searches via a web-based interface. Searches are saved and run against updated databases to alert users about new information. If there are new results from the saved searches, users will be notified by email and will then be able to access results and link to additional information on the NCBI website. Tracker supports Boolean AND/OR operations on HMMER and BLASTP result sets to allow users to broaden or narrow protein searches. AVAILABILITY: The server is located at http://jay.bioinformatics.ku.edu/tracker/index.html. A distribution package including detailed installation procedure is freely available from http://jay.bioinformatics.ku.edu/download/tracker/.  相似文献   

17.
Shu W  Liu M  Chen H  Bo X  Wang S 《Journal of biotechnology》2010,150(4):466-473
RNA molecules play vital informational, structural, and functional roles in molecular biology, making them ideal targets for synthetic biology. However, several challenges remain for engineering novel allosteric RNA molecules, and the development of efficient computational design techniques is vitally needed. Here we describe the development of Allosteric RNA Designer (ARDesigner), a user-friendly and freely available web-based system for allosteric RNA design that incorporates mutational robustness in the design process. The system output includes detailed design information in a graphical HTML format. We used ARDesigner to engineer a temperature-sensitive AR, and found that the resulting design satisfied the prescribed properties/input. ARDesigner provides a simple means for researchers to design allosteric RNAs with specific properties. With its versatile framework and possibilities for further enhancement, ARDesigner may serve as a useful tool for synthetic biologists and therapeutic design. ARDesigner and its executable version are freely available at http://biotech.bmi.ac.cn/ARDesigner.  相似文献   

18.
MPSS: an integrated database system for surveying a set of proteins   总被引:3,自引:0,他引:3  
SUMMARY: We design and implement an integrated database system called 'multi-protein survey system' (MPSS), which provides a platform to retrieve information about many proteins at a time. This system integrates several important and widely used databases including SwissProt, TrEMBL, PDB and InterPro, plus useful references such as GO and KEGG to other databases. Users may submit a group of protein IDs, entry names, SwissProt/TrEMBL accession numbers or GenBank GIs through MPSS' web interface, and obtain protein annotation information from public databases and pre-computed molecular properties speedily. MPSS can also supply comprehensive information about query proteins, including 3D structures, domains, pathway, gene ontology and visual presentation of mapping to the GO tree and KEGG pathway, to provide an up-to-date view of available knowledge with regard to the structures and molecular functions of proteins under study. AVAILABILITY: MPSS is freely accessible at http://www.scbit.org/mpss/  相似文献   

19.
We describe an ontology for cell types that covers the prokaryotic, fungal, animal and plant worlds. It includes over 680 cell types. These cell types are classified under several generic categories and are organized as a directed acyclic graph. The ontology is available in the formats adopted by the Open Biological Ontologies umbrella and is designed to be used in the context of model organism genome and other biological databases. The ontology is freely available at and can be viewed using standard ontology visualization tools such as OBO-Edit and COBrA.  相似文献   

20.
Recognizing names in biomedical texts: a machine learning approach   总被引:9,自引:0,他引:9  
MOTIVATION: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the knowledge encoded in text documents. In order to make organized and structured information available, automatically recognizing biomedical entity names becomes critical and is important for information retrieval, information extraction and automated knowledge acquisition. RESULTS: In this paper, we present a named entity recognition system in the biomedical domain, called PowerBioNE. In order to deal with the special phenomena of naming conventions in the biomedical domain, we propose various evidential features: (1) word formation pattern; (2) morphological pattern, such as prefix and suffix; (3) part-of-speech; (4) head noun trigger; (5) special verb trigger and (6) name alias feature. All the features are integrated effectively and efficiently through a hidden Markov model (HMM) and a HMM-based named entity recognizer. In addition, a k-Nearest Neighbor (k-NN) algorithm is proposed to resolve the data sparseness problem in our system. Finally, we present a pattern-based post-processing to automatically extract rules from the training data to deal with the cascaded entity name phenomenon. From our best knowledge, PowerBioNE is the first system which deals with the cascaded entity name phenomenon. Evaluation shows that our system achieves the F-measure of 66.6 and 62.2 on the 23 classes of GENIA V3.0 and V1.1, respectively. In particular, our system achieves the F-measure of 75.8 on the "protein" class of GENIA V3.0. For comparison, our system outperforms the best published result by 7.8 on GENIA V1.1, without help of any dictionaries. It also shows that our HMM and the k-NN algorithm outperform other models, such as back-off HMM, linear interpolated HMM, support vector machines, C4.5, C4.5 rules and RIPPER, by effectively capturing the local context dependency and resolving the data sparseness problem. Moreover, evaluation on GENIA V3.0 shows that the post-processing for the cascaded entity name phenomenon improves the F-measure by 3.9. Finally, error analysis shows that about half of the errors are caused by the strict annotation scheme and the annotation inconsistency in the GENIA corpus. This suggests that our system achieves an acceptable F-measure of 83.6 on the 23 classes of GENIA V3.0 and in particular 86.2 on the "protein" class, without help of any dictionaries. We think that a F-measure of 90 on the 23 classes of GENIA V3.0 and in particular 92 on the "protein" class, can be achieved through refining of the annotation scheme in the GENIA corpus, such as flexible annotation scheme and annotation consistency, and inclusion of a reasonable biomedical dictionary. AVAILABILITY: A demo system is available at http://textmining.i2r.a-star.edu.sg/NLS/demo.htm. Technology license is available upon the bilateral agreement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号